Generating lagged terms without using xtset

Zhaohui Li

Join Date: Oct 2016

Posts: 15
#1

Generating lagged terms without using xtset

01 Apr 2023, 05:35

Dear all,
I am using a heavily unbalanced dataset and want to generate the yearly lag term of X1.
For example:

ID Year X1 L.X1(should be like this)
1 2001 1 0
1 2003 3 2
1 2005 5 4
2 2000 0 .
2 2002 2 1
2 2004 4 3
3 2003 3 2
3 2004 4 3
3 2005 5 4

As xtset is impossible for my case, how do I generate L.X？

Last edited by Zhaohui Li; 01 Apr 2023, 05:40.
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#2

01 Apr 2023, 07:20

Both -xtset- and -tsset- are possible in this case, as they will enable you to use the lag operator. However, you seem to want something different. Lags (and leads) are for calling up observed values. You seem to want linear interpolation to fill in (or impute) years that are not observed. Please do tell us the rules or logic you are using to fill in the gaps, and also what should happen for the "first" observation for each -id-.

Are these your real data? If they are, this is a trivial example that will give you a not very useful solution. If they are not, you should post back realistic looking data.

It would appear that your data can only begin in the year 2000 (because the wanted output for this year leads to a missing value).

Code:

gen x = year - 2001 replace x = . if x < 0
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2400

01 Apr 2023, 07:39

Here's some more general technique for linear interpolation. There may be better ways.

Code:

clear *
cls

input byte id int year byte x1 byte want
1 2001 1 0
1 2003 3 2
1 2005 5 4
2 2000 0 .
2 2002 2 1
2 2004 4 3
3 2003 3 2
3 2004 4 3
3 2005 5 4
end
drop want

// Start here

gen byte obs = 1 // flag true observations

// use a reshape trick to have all years for each id
reshape wide obs x1, i(id) j(year)
reshape long
replace obs = 0 if mi(obs) // fill in flag for non-observed data

// fill in your own interpolation / regression equation
gen interp_x1 = year - 2000

// grab lagged values for the observed values, then keep the original set of observed years
xtset id year
gen want = L.interp_x1 if obs
keep if obs
drop obs interp_x1

list

Result

Code:

. list

     +-----------------------+
     | id   year   x1   want |
     |-----------------------|
  1. |  1   2001    1      0 |
  2. |  1   2003    3      2 |
  3. |  1   2005    5      4 |
  4. |  2   2000    0      . |
  5. |  2   2002    2      1 |
     |-----------------------|
  6. |  2   2004    4      3 |
  7. |  3   2003    3      2 |
  8. |  3   2004    4      3 |
  9. |  3   2005    5      4 |
     +-----------------------+

Comment

Zhaohui Li

Join Date: Oct 2016

Posts: 15
#4

01 Apr 2023, 08:15

Originally posted by Leonardo Guizzetti View Post

Here's some more general technique for linear interpolation. There may be better ways.

Code:

clear * cls input byte id int year byte x1 byte want 1 2001 1 0 1 2003 3 2 1 2005 5 4 2 2000 0 . 2 2002 2 1 2 2004 4 3 3 2003 3 2 3 2004 4 3 3 2005 5 4 end drop want // Start here gen byte obs = 1 // flag true observations // use a reshape trick to have all years for each id reshape wide obs x1, i(id) j(year) reshape long replace obs = 0 if mi(obs) // fill in flag for non-observed data // fill in your own interpolation / regression equation gen interp_x1 = year - 2000 // grab lagged values for the observed values, then keep the original set of observed years xtset id year gen want = L.interp_x1 if obs keep if obs drop obs interp_x1 list

Result

Code:

. list +-----------------------+ | id year x1 want | |-----------------------| 1. | 1 2001 1 0 | 2. | 1 2003 3 2 | 3. | 1 2005 5 4 | 4. | 2 2000 0 . | 5. | 2 2002 2 1 | |-----------------------| 6. | 2 2004 4 3 | 7. | 3 2003 3 2 | 8. | 3 2004 4 3 | 9. | 3 2005 5 4 | +-----------------------+

Dear Leonardo,
Thanks for your quick reply. I am really appreciated for your help.

The data I use include nearly 200k observations. It is housing market data, the id is the houseid, and variable X is actually the housing price volatility.
I am not sure that filling in missing variables may cause computational issues, as I need to run bootstrap for it.

Actually, I need to use the lag of the market price volatility in my regression.

Here is my code:

/*generate market volatility for current quarter*/
sort Close_Qtime
bys Close_Qtime: egen volasample=mean(stdsample)

sort Close_Qtime
bys Close_Qtime: gen dup=[_n]

/*generate lagged market volatility only for one time series*/
sort dup Close_Qtime
bys dup (Close_Qtime) :gen volalagsample=volasample[_n-1] if dup==1

/*fill in the orthers*/
bys Close_Qtime: egen vollag2sample=sum(volalagsample)
replace vollag2sample=. if vollag2sample==0

The indexlag2sample is my final result.
During the normal OLS, it sounds good. However, the bootstrap results look ugly. I guess there must be something wrong here.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#5

01 Apr 2023, 09:59

Sorry, but I don’t really understand as this has evolved considerably from the original question and this is now beyond my scope of interest or time. Maybe someone else can help you further.
Comment

Announcement

Generating lagged terms without using xtset

Comment

Comment

Comment

Comment