Lag independent variable in fixed effects regression using repeated cross section data

Stephanie Hong

Join Date: Jul 2019

Posts: 17
#1

Lag independent variable in fixed effects regression using repeated cross section data

27 Jul 2020, 22:42

Hello Statalist,

I'm currently analyzing repeated cross-section data (different individuals were surveyed over time). These individuals are nested within "counties." Thus, counties are repeated over time but different individuals are shown across years. Using this data set, I would like to use county fixed-effects to analyze the effect of a policy change by lagging the policy by 1 year. However, I have been struggling to find out the right codes on how to "xtset" the repeated cross-section data. When I used the following code,

Code:

xtset county year, yearly

an error occurred, saying "repeated time values within panel." This makes sense because the combination of county and year is not uniquely identified. Given this problem, how can I xtset this data set?

In addition, can you give me insights on how I can lag the independent variable (policy variable) by one year in the fixed effects regression that uses repeated cross section data? I can include l1.policy in the FE regression, but I should add the year (time) variable when I xtset the dataset in order to use leads/lags function. But I was not sure if I can xtset the repeated cross section data set using the year variable.

Would there be a way to xtset the repeated cross-section data and then lag the independent variable by a year?

Any help would be very much appreciated!
Tags: fixed effects, panel data, repeated cross section
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

28 Jul 2020, 01:36

To lag the policy variable, lets say it is called pvar, try

Code:

bysort county (year): gen laggedpvar = pvar[_n-1] if year!=year[_n-1] by couny year: replace laggedpvar = laggedpvar[1]

To estimate the model, use -areg, absorb(county)-.
Comment
Stephanie Hong

Join Date: Jul 2019

Posts: 17
#3

28 Jul 2020, 20:38

Thank you Dr. Kolev for your suggestions! I forgot to include that I have to use the "svy" command since I'm using a survey dataset. I use the Jacknife method of estimating survey data. But it looks like areg command is not compatible with svy. Could you suggest other ways that I can estimate the FE model including the lagged independent variable in a cross-sectional data set?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#4

28 Jul 2020, 21:45

Hi Stephanie, on my Stata 15.1 it seems that -svy: areg y x i.a, absorb(cvar)- is allowed. You cannot see it in the syntax diagram of the help for areg, but when you click on (View complete PDF manual entry) it says that it is allowed now. I think Stata Corp just forgot to update the help file, because back in the days areg indeed was not working with svy.

How about you try firstly -svy: areg y x i.a, absorb(cvar)-? I think it should work. If it does not, we will think of something. E.g., we can bypass the svy prefix and just use the survey weights.

Originally posted by Stephanie Hong View Post

Thank you Dr. Kolev for your suggestions! I forgot to include that I have to use the "svy" command since I'm using a survey dataset. I use the Jacknife method of estimating survey data. But it looks like areg command is not compatible with svy. Could you suggest other ways that I can estimate the FE model including the lagged independent variable in a cross-sectional data set?
Comment
Stephanie Hong

Join Date: Jul 2019

Posts: 17
#5

29 Jul 2020, 02:26

Hello Dr. Kolev,

Thanks for your response. I tried the command, but I keep getting this error: "areg is not supported by svy with vce(jackknife)." I'm using STATA 16. Do you think areg does not work with vce(jackknife) survey weights in particular? The command I use to survey set the data is:

Code:

svyset [pw=pweightvar], jkrw(var1-var400, multiplier(1)) vce(jack) mse

If we just use the survey weights, what would be the correct codes to correctly account for the survey design?

Also, do you think the following code can be an alternative way to estimate the fixed effects model?

Code:

svy, reg: outcome lag_iv i.county i.year

My last question is, since I'm adding county fixed-effects, I need to cluster the standard errors at the county level. Given that I'm using survey design, how should I incorporate the cluster-robust standard errors?

Thank you for your help!
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

29 Jul 2020, 05:50

Hi Stephanie,

Yes, from the error message it seems that Stata complains because you are trying to use both survey design and jackknife variance.

It seems to me that you are not properly survey setting your data. In particular I cannot see what is your sampling unit in your -svyset-. Here is an example from the -svy- help file:

Code:

. webuse nhanes2f, clear

. svyset psuid [pweight=finalwgt], strata(stratid) vce(jack) mse

      pweight: finalwgt
          VCE: jackknife
          MSE: on
  Single unit: missing
     Strata 1: stratid
         SU 1: psuid
        FPC 1: <zero>

. svy: reg bpsystol i.sex
(running regress on estimation sample)

Jackknife replications (62)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
............

Survey: Linear regression

Number of strata   =        31                Number of obs     =       10,337
Number of PSUs     =        62                Population size   =  117,023,659
                                              Replications      =           62
                                              Design df         =           31
                                              F(   1,     31)   =        90.14
                                              Prob > F          =       0.0000
                                              R-squared         =       0.0178

------------------------------------------------------------------------------
             |              Jknife *
    bpsystol |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     Female  |   -5.71288   .6017183    -9.49   0.000    -6.940093   -4.485668
       _cons |   129.9207    .644102   201.71   0.000     128.6071    131.2344
------------------------------------------------------------------------------

.

In particular in the above code note that they told Stata what is the sampling unit by -svyset psuid-.

Yes, you can simply use regress to account for the fixed effect, e.g., you just do i.county instead of i.sex in the above regression and you replace all the variables with your variables. However this assumes that you have correctly -svyset- your data. Stata will automatically calculate correct standard errors for the clustered error structure if you have correctly -svyset- your data, which I do not think you have.

Finally, if you just want to use sampling weights, you can do it without -svyset-ing your data and -areg- works then, see below

Code:

. sysuse auto, clear
(1978 Automobile Data)

. areg price weight length [pw= headroom ], absorb(rep78) vce(jackknife, nodots cluster(rep78))

Linear regression, absorbing indicators         Number of obs     =         69
Absorbed variable: rep78                        No. of categories =          5
                                                Replications      =          5
                                                F(   2,      4)   =      45.18
                                                Prob > F          =     0.0018
                                                R-squared         =     0.4573
                                                Adj R-squared     =     0.4048
                                                Root MSE          =  2289.7401

                                   (Replications based on 5 clusters in rep78)
------------------------------------------------------------------------------
             |              Jackknife
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |    6.23817   .7339756     8.50   0.001     4.200328    8.276013
      length |  -134.9517   29.43262    -4.59   0.010    -216.6697   -53.23363
       _cons |   12518.73   6178.177     2.03   0.113    -4634.633     29672.1
------------------------------------------------------------------------------

Originally posted by Stephanie Hong View Post

Hello Dr. Kolev,

Thanks for your response. I tried the command, but I keep getting this error: "areg is not supported by svy with vce(jackknife)." I'm using STATA 16. Do you think areg does not work with vce(jackknife) survey weights in particular? The command I use to survey set the data is:

Code:

svyset [pw=pweightvar], jkrw(var1-var400, multiplier(1)) vce(jack) mse

If we just use the survey weights, what would be the correct codes to correctly account for the survey design?

Also, do you think the following code can be an alternative way to estimate the fixed effects model?

Code:

svy, reg: outcome lag_iv i.county i.year

My last question is, since I'm adding county fixed-effects, I need to cluster the standard errors at the county level. Given that I'm using survey design, how should I incorporate the cluster-robust standard errors?

Thank you for your help!

Comment

Stephanie Hong

Join Date: Jul 2019

Posts: 17
#7

29 Jul 2020, 08:33

Hello Dr. Kolev,

Thank you for your helpful responses! I think my svyset code follows the example 2 on pg. 6 in the Stata manual. It says that jackknife replicate weights (which are var1 to var400) replace strata and psu. As the survey dataset recommends researchers to use the jackknife replicate weights, I'm not sure if I can use psu and strata. So I was wondering if the svyset code I wrote (shown below) would correctly estimate the clustered standard errors when the code is correct. Since I don't see in the code that I am clustering my standard errors (e.g. vce(cluster county)), I wanted to make sure if this error structure is taken into account.

Code:

svyset [pw=pweight], jkrw(var1-var400, multiplier(1)) vce(jack) mse

And to clarify, could you confirm if the following statement was what you meant? "If this svyset code is correct,

Code:

svy, reg: outcome lag_iv i.county i.year

will correctly estimate the fixed-effects regression."

Thank you so much for your insights on this!
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

30 Jul 2020, 01:51

Hi Stephanie,

I read the manual for -svset- and you are right, jackknife replicate weights that you have supplied with jkrweight() and "(which are var1 to var400) replace strata and psu. " You do not need to use psu and strata because you have already supplied to Stata the necessary information through jkrweight().

What is 400? Is this the number of counties you have?

I think that once you have correctly -svyset- your data, you do not need to worry anymore about clustering, because Stata will appropriately take care of the survey design in calculating the variance of the estimators. So you should just type one of the two (just try them and see what works, I think that both should work and give the same results)

Code:

svy jackknife: reg outcome lag_iv i.county i.year svy: reg outcome lag_iv i.county i.year, vce(jackknife)

Originally posted by Stephanie Hong View Post

Hello Dr. Kolev,

Thank you for your helpful responses! I think my svyset code follows the example 2 on pg. 6 in the Stata manual. It says that jackknife replicate weights (which are var1 to var400) replace strata and psu. As the survey dataset recommends researchers to use the jackknife replicate weights, I'm not sure if I can use psu and strata. So I was wondering if the svyset code I wrote (shown below) would correctly estimate the clustered standard errors when the code is correct. Since I don't see in the code that I am clustering my standard errors (e.g. vce(cluster county)), I wanted to make sure if this error structure is taken into account.

Code:

svyset [pw=pweight], jkrw(var1-var400, multiplier(1)) vce(jack) mse

And to clarify, could you confirm if the following statement was what you meant? "If this svyset code is correct,

Code:

svy, reg: outcome lag_iv i.county i.year

will correctly estimate the fixed-effects regression."

Thank you so much for your insights on this!
Comment
Stephanie Hong

Join Date: Jul 2019

Posts: 17
#9

14 Aug 2020, 15:38

Hello Dr. Kolev,

I apologize for my late reply. The first code worked for me. After survey setting, I used:

Code:

svy: reg outcome lag_iv i.county i.year

I really appreciate your advice on this problem!
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#10

15 Aug 2020, 00:05

I am glad that it worked, Stephanie !

Originally posted by Stephanie Hong View Post

Hello Dr. Kolev,

I apologize for my late reply. The first code worked for me. After survey setting, I used:

Code:

svy: reg outcome lag_iv i.county i.year

I really appreciate your advice on this problem!
Comment

Announcement