Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating lagged terms without using xtset

    Dear all,
    I am using a heavily unbalanced dataset and want to generate the yearly lag term of X1.
    For example:

    ID Year X1 L.X1(should be like this)
    1 2001 1 0
    1 2003 3 2
    1 2005 5 4
    2 2000 0 .
    2 2002 2 1
    2 2004 4 3
    3 2003 3 2
    3 2004 4 3
    3 2005 5 4

    As xtset is impossible for my case, how do I generate L.X?


    Last edited by Zhaohui Li; 01 Apr 2023, 05:40.

  • #2
    Both -xtset- and -tsset- are possible in this case, as they will enable you to use the lag operator. However, you seem to want something different. Lags (and leads) are for calling up observed values. You seem to want linear interpolation to fill in (or impute) years that are not observed. Please do tell us the rules or logic you are using to fill in the gaps, and also what should happen for the "first" observation for each -id-.

    Are these your real data? If they are, this is a trivial example that will give you a not very useful solution. If they are not, you should post back realistic looking data.

    It would appear that your data can only begin in the year 2000 (because the wanted output for this year leads to a missing value).

    Code:
    gen x = year - 2001
    replace x = . if x < 0

    Comment


    • #3
      Here's some more general technique for linear interpolation. There may be better ways.

      Code:
      clear *
      cls
      
      input byte id int year byte x1 byte want
      1 2001 1 0
      1 2003 3 2
      1 2005 5 4
      2 2000 0 .
      2 2002 2 1
      2 2004 4 3
      3 2003 3 2
      3 2004 4 3
      3 2005 5 4
      end
      drop want
      
      // Start here
      
      gen byte obs = 1 // flag true observations
      
      // use a reshape trick to have all years for each id
      reshape wide obs x1, i(id) j(year)
      reshape long
      replace obs = 0 if mi(obs) // fill in flag for non-observed data
      
      // fill in your own interpolation / regression equation
      gen interp_x1 = year - 2000
      
      // grab lagged values for the observed values, then keep the original set of observed years
      xtset id year
      gen want = L.interp_x1 if obs
      keep if obs
      drop obs interp_x1
      
      list
      Result

      Code:
      . list
      
           +-----------------------+
           | id   year   x1   want |
           |-----------------------|
        1. |  1   2001    1      0 |
        2. |  1   2003    3      2 |
        3. |  1   2005    5      4 |
        4. |  2   2000    0      . |
        5. |  2   2002    2      1 |
           |-----------------------|
        6. |  2   2004    4      3 |
        7. |  3   2003    3      2 |
        8. |  3   2004    4      3 |
        9. |  3   2005    5      4 |
           +-----------------------+

      Comment


      • #4
        Originally posted by Leonardo Guizzetti View Post
        Here's some more general technique for linear interpolation. There may be better ways.

        Code:
        clear *
        cls
        
        input byte id int year byte x1 byte want
        1 2001 1 0
        1 2003 3 2
        1 2005 5 4
        2 2000 0 .
        2 2002 2 1
        2 2004 4 3
        3 2003 3 2
        3 2004 4 3
        3 2005 5 4
        end
        drop want
        
        // Start here
        
        gen byte obs = 1 // flag true observations
        
        // use a reshape trick to have all years for each id
        reshape wide obs x1, i(id) j(year)
        reshape long
        replace obs = 0 if mi(obs) // fill in flag for non-observed data
        
        // fill in your own interpolation / regression equation
        gen interp_x1 = year - 2000
        
        // grab lagged values for the observed values, then keep the original set of observed years
        xtset id year
        gen want = L.interp_x1 if obs
        keep if obs
        drop obs interp_x1
        
        list
        Result

        Code:
        . list
        
        +-----------------------+
        | id year x1 want |
        |-----------------------|
        1. | 1 2001 1 0 |
        2. | 1 2003 3 2 |
        3. | 1 2005 5 4 |
        4. | 2 2000 0 . |
        5. | 2 2002 2 1 |
        |-----------------------|
        6. | 2 2004 4 3 |
        7. | 3 2003 3 2 |
        8. | 3 2004 4 3 |
        9. | 3 2005 5 4 |
        +-----------------------+

        Dear Leonardo,
        Thanks for your quick reply. I am really appreciated for your help.

        The data I use include nearly 200k observations. It is housing market data, the id is the houseid, and variable X is actually the housing price volatility.
        I am not sure that filling in missing variables may cause computational issues, as I need to run bootstrap for it.

        Actually, I need to use the lag of the market price volatility in my regression.


        Here is my code:

        /*generate market volatility for current quarter*/
        sort Close_Qtime
        bys Close_Qtime: egen volasample=mean(stdsample)

        sort Close_Qtime
        bys Close_Qtime: gen dup=[_n]

        /*generate lagged market volatility only for one time series*/
        sort dup Close_Qtime
        bys dup (Close_Qtime) :gen volalagsample=volasample[_n-1] if dup==1

        /*fill in the orthers*/
        bys Close_Qtime: egen vollag2sample=sum(volalagsample)
        replace vollag2sample=. if vollag2sample==0


        The indexlag2sample is my final result.
        During the normal OLS, it sounds good. However, the bootstrap results look ugly. I guess there must be something wrong here.

        Comment


        • #5
          Sorry, but I don’t really understand as this has evolved considerably from the original question and this is now beyond my scope of interest or time. Maybe someone else can help you further.

          Comment

          Working...
          X