Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction xtset and sort

    Hi guys,

    I guess this is a problem of general interest, that many Stata-beginners face.

    Let's assume I have panel data.

    1. At the beginning, I xtset the data (e.g. xtset company_id month_id)
    2. Later on, I have to re-sort the data (e.g. sort month_id company_id)
    3. Re-sort the data again, back to the initial sorting (e.g. sort company_id month_id)


    The issue is, that at step 3, Stata does not get that data is panel data identified by company_id month_id, just as in step 1. Or am I wrong?
    This would mean, that e.g. for lag operators (e.g. l1.month_id / lag1.month_id) Stata might ignore the panel data structure and refer to a previous month of a different company_id.

    So essentially, do I have to use xtset before each lag command?



    Maybe a better example:
    Code:
    xtset company_id month_id
    sort company_id month_id
    generate var2 = l1.var1
    Would my lag-operator now refer to a previous month of a different company,
    or would it understand not to go beyond company borders?




    Best regards,
    Carlos
    Last edited by Carlos Teigimiz; 31 May 2016, 16:12.

  • #2
    Have you actually encountered this difficulty? My understanding is that it would not happen. Here's an example using the on-line Grunfeld data set:

    Code:
    . webuse grunfeld, clear
    
    . xtset company year
           panel variable:  company (strongly balanced)
            time variable:  year, 1935 to 1954
                    delta:  1 year
    
    . gen lag_after_xtset = l1.invest
    (10 missing values generated)
    
    . sort company year
    
    . gen lag_after_sort = l1.invest
    (10 missing values generated)
    
    . assert lag_after_xtset == lag_after_sort
    What is true is that if you have a comment that sorts the data differently from the natural -xtset- sort, the lag operators will not work. Thus:

    Code:
    . by company (invest), sort: gen lag_mvalue = sum(l1.mvalue)
    not sorted
    r(5);
    Here the somewhat cryptic "not sorted" error message means that the sort order imposed, invest within company, is not the correct sort order for calculating lages.

    If you have actually encountered a situation where you needed to re-run -xtset- when the data were already sorted by timevar within panelvar in order to get lags to work correctly, I would be interested to see it. Would you post a usable example?

    Added: it is hard to imagine how -sort panelvar timevar- could leave you in a situation where lags would not work correctly. Think of this: in order for you to even run -xtset panelvar timevar- without an error message, the variables panelvar and timevar must uniquely identify the observations in the data. If you had two instances of panelvar with the same value of timevar, -xtset- would complain and interrupt execution. So panelvar timevar defines a unique sort order of the data. Consequently, anytime you go back to that order with -sort panelvar timevar- you inevitably have the same sort order that corresponded to that used by -xtset-.
    Last edited by Clyde Schechter; 31 May 2016, 18:10.

    Comment


    • #3
      L. is a time series operator and therefore only assigns lags within id. You can verify this yourself by browsing the data and comparing values of -var2- with -var1-. By the way, you don't have to generate the variable -var2- as you can just use the operators directly in your regressions. See - help tsvarlist- for more. Finally, you just need to -xtset- your data once.

      Comment

      Working...
      X