Panel data with lagged variable "outside" of panel timeframe

Fynn Froboese

Join Date: Jun 2021

Posts: 5
#1

Panel data with lagged variable "outside" of panel timeframe

16 Jun 2021, 11:43

Dears,

how would you, and is it desirable or am I on the wrong track, include an "outbound" lagged variable in a panel?
I have 4 years of data for all variables that I want to use for my model, but some control variables should have a lagged effect (causal). Thus I would like to use a lag for some variables, without including the year in general (when -xtset-ing). For my example, I would like to include 4 observations (years) per group for my regression model, but reference to a 5th year (t-1).

Thank you in advance.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30152
#2

16 Jun 2021, 12:33

I don't understand. If you only have four years of data, then where is the "reference to a 5th year" supposed to come from? If you have four years of data, any analysis using a lagged variable necessarily uses only the last three years and excludes the observation from the first year. (You could partly overcome this by including the first year's outcome value as a covariate in the analysis, providing that isn't problematic for your analysis in other respects.) If data from the year preceding the first year in your study is available from some other source, then, go ahead and retrieve it and bring it into your data set. Otherwise,...
1 like
Comment
Fynn Froboese

Join Date: Jun 2021

Posts: 5
#3

16 Jun 2021, 13:44

Thank you for taking the time to help! It is hard to explain, but I will try again.

I have data for ~15 years (survey). Of those 15 years, 4 years have had some additional survey questions that are of interest for my analysis. My current approach uses those 4 years for the panel data analysis, as most of the variables, and especially those of interest, are only available for those 4 years. One of my control variables should have a lagged effect. For my understanding, this would mean that if the variable only is available for the 4 years of data, the first year would have no data as t-1 does not exist. In my case the data would exist, as the control variable is part of the data that is collected for all the 15 years.

Does this add sense?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30152
#4

16 Jun 2021, 13:57

Yes, now it makes perfect sense. It's a perfectly reasonable thing to do, and tit is not difficult. You -xtset- your panel data, and then you can use the lag operator to include it in your model. So, not having any example data to work with, the following is pseudo-code close to Stata syntax for how you would do it:

Code:

use all_my_data xtset panel_id year panel_regression_command dep_var ind_vars L1.control

Since the "control" variable has values in the year before the first year where your other variables have values, you will be able to include all four years of your data in the regression, and Stata will look back to the year before for the values of control. And you don't have to put an -if- condition on the regression command to limit it to the four years where your main variables are defined--that happens automatically because any observation with missing data on a variable mentioned in the regression command is automatically excluded.

If you are not familiar with Stata's lag operator, read -help tsvarlist- for more information.
1 like
Comment
Fynn Froboese

Join Date: Jun 2021

Posts: 5
#5

16 Jun 2021, 14:07

Thanks a lot! Last think to clarify for me: I would -xtset- my data with some variables having one more year of data (t-1), thus Stata would identify 5 years of panel data, correct?

Code:

panel variable: ID (strongly balanced) time variable: Years, 9 to 13 delta: 1 unit

... where year 9 would be t-1 in my example.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30152
#6

16 Jun 2021, 14:17

Correct.
Comment

Announcement

Panel data with lagged variable "outside" of panel timeframe

Comment

Comment

Comment

Comment

Comment