lagged variables

Hale Isaac

Join Date: Jan 2019
Posts: 60

06 Mar 2019, 10:34

Hi,
I have a variable for gdp and for survey wave. I have three survey waves and for each survey wave its corresponding gdp (a 3-year average of preceding years i.e. if survey wave is 1999/2001, then gdp for that survey wave is the average of years 96, 97, 98 gdp). The assumption is the effect of x (gdp) on y (yes/no outcome variable) depends on the current and previous (year/years gdp). I want to lag the gdp variable.

I have tried the code below but I am not sure if it achieves what I need.

Code:

sort SW
by SW: gen lag1 = gdp[_n-1]
by SW: gen lag2 = gdp[_n-2]
by SW: gen lag3 = gdp[_n-3]

Here is an example of my data after running the command:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte SW float(gdp lag1 lag2 lag3)
0 5.2   .   .   .
0 5.2 5.2   .   .
0 5.2 5.2 5.2   .
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
0 5.2 5.2 5.2 5.2
end
label values SW per
label def per 0 "1999/2001", modify

(I have three survey waves but dataex doesn’t show enough of the data: 0 "1999/2001" 1 "2002/2005" 2 "2006/2009")
My data is a repeated cross section.

Last edited by Hale Isaac; 06 Mar 2019, 11:11.

Tags: data, lagged, repeated cross section

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

06 Mar 2019, 11:42

Well, it's hard to say but I would wager that what you have done is not correct. Your example data, frankly, aren't very helpful. Not only is SW always 0, but the value of gdp is the same in every observation.

It is not clear from your description, nor from your example, why you have more than once observation per SW, especially if the value of gdp is the same in all of them. I imagine that within each survey wave you have observations on a bunch of countries, and that each country has its own gdp value. And while your data may be completely balanced, in real world data sets, it is common for there to be gaps in the data--which your code would handle incorrectly. So, assuming that there is some other variable, call it country, that identifies a country in each observation, and that each country has, at most, one observation in each wave, I would do this as follows:

Code:

xtset country SW forvalues i = 1/3 { gen lag`i'_gdp = L`i'.gdp }

The use of the lag operator (L) assures that gaps in the data will be handled correctly.

Now, actually, I probably wouldn't really do that. In fact, I probably wouldn't create these variables at all unless you need them for some purpose other than just using them as predictors in a regression model. If the reason you want these variables is just to include them as predictors in a regression analysis you can just referred to the lagged values directly in your regression, e.g.:

Code:

regression_command outcome_variable L(0/3).gdp other_variables

will give you a regression analysis that includes the current and first three lagged values of gdp all as predictors.
Comment
Hale Isaac

Join Date: Jan 2019

Posts: 60
#3

06 Mar 2019, 11:57

Thank you for your response Clyde.

Apologies for the unclear example (I could not get dataex to show enough of my data) - I have a repeated cross sectional data set with observations on different individuals at three different survey waves. I have three survey waves and gdp takes on three values (5.2 which corresponds to 1999/2001; 3.1 which corresponds to 2002/2005 and 6.3 for 2006/2009). That is the variables lag1, lag2 and lag3 do take on values from 3.1,5.2,6.3.

I would like to include the lagged variables as predictors in a regression model, would using the L(0/3).gdp notation suffice in this case?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

06 Mar 2019, 12:13

So, it seems what you have is data on individuals, and they all live in the same country, and you want to use gdp as an ecological covariate. OK. Then you just change the -xtset- command to use the individual ID variable instead of the country.

But here's another problem you face. You only have three survey waves. In the most recent one, only the current gdp and the first two lags of that are defined. The third lag would be prior to the start of the first wave. In the middle wave, only the current gdp and first lag are defined. And in the first wave, only the current gdp is defined: you have no data for any lags. So in every observation, some or all of the lagged GDPs will have only missing values--which means that every observation will be omitted from the estimation sample, and you will not be able to do any regression.

So I think you have two choices:

1. Forget about lagged GDP and just use current GDP in your model, or,

2. Get three additional waves of the survey going back earlier and append those to your existing data. Then you will be able to build a model that includes current and lagged GDP for the years you currently have (though not for the additional years).

The curse of lagged variables is that each lag you include chops off an early period's data.
1 like
Comment
Hale Isaac

Join Date: Jan 2019

Posts: 60
#5

06 Mar 2019, 12:29

Thank you for the explanation Clyde, I have opted to exclude the lagged gdp from the analysis.
Comment

Announcement

lagged variables

Comment

Comment

Comment

Comment