Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lag independent variable in fixed effects regression using repeated cross section data

    Hello Statalist,

    I'm currently analyzing repeated cross-section data (different individuals were surveyed over time). These individuals are nested within "counties." Thus, counties are repeated over time but different individuals are shown across years. Using this data set, I would like to use county fixed-effects to analyze the effect of a policy change by lagging the policy by 1 year. However, I have been struggling to find out the right codes on how to "xtset" the repeated cross-section data. When I used the following code,

    Code:
    xtset county year, yearly
    an error occurred, saying "repeated time values within panel." This makes sense because the combination of county and year is not uniquely identified. Given this problem, how can I xtset this data set?

    In addition, can you give me insights on how I can lag the independent variable (policy variable) by one year in the fixed effects regression that uses repeated cross section data? I can include l1.policy in the FE regression, but I should add the year (time) variable when I xtset the dataset in order to use leads/lags function. But I was not sure if I can xtset the repeated cross section data set using the year variable.

    Would there be a way to xtset the repeated cross-section data and then lag the independent variable by a year?

    Any help would be very much appreciated!

  • #2
    To lag the policy variable, lets say it is called pvar, try
    Code:
    bysort county (year): gen laggedpvar = pvar[_n-1] if year!=year[_n-1]
    by couny year: replace laggedpvar = laggedpvar[1]
    To estimate the model, use -areg, absorb(county)-.

    Comment


    • #3
      Thank you Dr. Kolev for your suggestions! I forgot to include that I have to use the "svy" command since I'm using a survey dataset. I use the Jacknife method of estimating survey data. But it looks like areg command is not compatible with svy. Could you suggest other ways that I can estimate the FE model including the lagged independent variable in a cross-sectional data set?

      Comment


      • #4
        Hi Stephanie, on my Stata 15.1 it seems that -svy: areg y x i.a, absorb(cvar)- is allowed. You cannot see it in the syntax diagram of the help for areg, but when you click on (View complete PDF manual entry) it says that it is allowed now. I think Stata Corp just forgot to update the help file, because back in the days areg indeed was not working with svy.

        How about you try firstly -svy: areg y x i.a, absorb(cvar)-? I think it should work. If it does not, we will think of something. E.g., we can bypass the svy prefix and just use the survey weights.

        Originally posted by Stephanie Hong View Post
        Thank you Dr. Kolev for your suggestions! I forgot to include that I have to use the "svy" command since I'm using a survey dataset. I use the Jacknife method of estimating survey data. But it looks like areg command is not compatible with svy. Could you suggest other ways that I can estimate the FE model including the lagged independent variable in a cross-sectional data set?

        Comment


        • #5
          Hello Dr. Kolev,

          Thanks for your response. I tried the command, but I keep getting this error: "areg is not supported by svy with vce(jackknife)." I'm using STATA 16. Do you think areg does not work with vce(jackknife) survey weights in particular? The command I use to survey set the data is:

          Code:
          svyset [pw=pweightvar], jkrw(var1-var400, multiplier(1)) vce(jack) mse
          If we just use the survey weights, what would be the correct codes to correctly account for the survey design?

          Also, do you think the following code can be an alternative way to estimate the fixed effects model?

          Code:
          svy, reg: outcome lag_iv i.county i.year
          My last question is, since I'm adding county fixed-effects, I need to cluster the standard errors at the county level. Given that I'm using survey design, how should I incorporate the cluster-robust standard errors?

          Thank you for your help!

          Comment


          • #6
            Hi Stephanie,

            Yes, from the error message it seems that Stata complains because you are trying to use both survey design and jackknife variance.

            It seems to me that you are not properly survey setting your data. In particular I cannot see what is your sampling unit in your -svyset-. Here is an example from the -svy- help file:

            Code:
            . webuse nhanes2f, clear
            
            . svyset psuid [pweight=finalwgt], strata(stratid) vce(jack) mse
            
                  pweight: finalwgt
                      VCE: jackknife
                      MSE: on
              Single unit: missing
                 Strata 1: stratid
                     SU 1: psuid
                    FPC 1: <zero>
            
            . svy: reg bpsystol i.sex
            (running regress on estimation sample)
            
            Jackknife replications (62)
            ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
            ..................................................    50
            ............
            
            Survey: Linear regression
            
            Number of strata   =        31                Number of obs     =       10,337
            Number of PSUs     =        62                Population size   =  117,023,659
                                                          Replications      =           62
                                                          Design df         =           31
                                                          F(   1,     31)   =        90.14
                                                          Prob > F          =       0.0000
                                                          R-squared         =       0.0178
            
            ------------------------------------------------------------------------------
                         |              Jknife *
                bpsystol |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     sex |
                 Female  |   -5.71288   .6017183    -9.49   0.000    -6.940093   -4.485668
                   _cons |   129.9207    .644102   201.71   0.000     128.6071    131.2344
            ------------------------------------------------------------------------------
            
            .
            In particular in the above code note that they told Stata what is the sampling unit by -svyset psuid-.

            Yes, you can simply use regress to account for the fixed effect, e.g., you just do i.county instead of i.sex in the above regression and you replace all the variables with your variables. However this assumes that you have correctly -svyset- your data. Stata will automatically calculate correct standard errors for the clustered error structure if you have correctly -svyset- your data, which I do not think you have.

            Finally, if you just want to use sampling weights, you can do it without -svyset-ing your data and -areg- works then, see below

            Code:
            . sysuse auto, clear
            (1978 Automobile Data)
            
            . areg price weight length [pw= headroom ], absorb(rep78) vce(jackknife, nodots cluster(rep78))
            
            Linear regression, absorbing indicators         Number of obs     =         69
            Absorbed variable: rep78                        No. of categories =          5
                                                            Replications      =          5
                                                            F(   2,      4)   =      45.18
                                                            Prob > F          =     0.0018
                                                            R-squared         =     0.4573
                                                            Adj R-squared     =     0.4048
                                                            Root MSE          =  2289.7401
            
                                               (Replications based on 5 clusters in rep78)
            ------------------------------------------------------------------------------
                         |              Jackknife
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                  weight |    6.23817   .7339756     8.50   0.001     4.200328    8.276013
                  length |  -134.9517   29.43262    -4.59   0.010    -216.6697   -53.23363
                   _cons |   12518.73   6178.177     2.03   0.113    -4634.633     29672.1
            ------------------------------------------------------------------------------
            Originally posted by Stephanie Hong View Post
            Hello Dr. Kolev,

            Thanks for your response. I tried the command, but I keep getting this error: "areg is not supported by svy with vce(jackknife)." I'm using STATA 16. Do you think areg does not work with vce(jackknife) survey weights in particular? The command I use to survey set the data is:

            Code:
            svyset [pw=pweightvar], jkrw(var1-var400, multiplier(1)) vce(jack) mse
            If we just use the survey weights, what would be the correct codes to correctly account for the survey design?

            Also, do you think the following code can be an alternative way to estimate the fixed effects model?

            Code:
            svy, reg: outcome lag_iv i.county i.year
            My last question is, since I'm adding county fixed-effects, I need to cluster the standard errors at the county level. Given that I'm using survey design, how should I incorporate the cluster-robust standard errors?

            Thank you for your help!

            Comment


            • #7
              Hello Dr. Kolev,

              Thank you for your helpful responses! I think my svyset code follows the example 2 on pg. 6 in the Stata manual. It says that jackknife replicate weights (which are var1 to var400) replace strata and psu. As the survey dataset recommends researchers to use the jackknife replicate weights, I'm not sure if I can use psu and strata. So I was wondering if the svyset code I wrote (shown below) would correctly estimate the clustered standard errors when the code is correct. Since I don't see in the code that I am clustering my standard errors (e.g. vce(cluster county)), I wanted to make sure if this error structure is taken into account.

              Code:
              svyset [pw=pweight], jkrw(var1-var400, multiplier(1)) vce(jack) mse
              And to clarify, could you confirm if the following statement was what you meant? "If this svyset code is correct,
              Code:
              svy, reg: outcome lag_iv i.county i.year
              will correctly estimate the fixed-effects regression."

              Thank you so much for your insights on this!

              Comment


              • #8
                Hi Stephanie,

                I read the manual for -svset- and you are right, jackknife replicate weights that you have supplied with jkrweight() and "(which are var1 to var400) replace strata and psu. " You do not need to use psu and strata because you have already supplied to Stata the necessary information through jkrweight().

                What is 400? Is this the number of counties you have?

                I think that once you have correctly -svyset- your data, you do not need to worry anymore about clustering, because Stata will appropriately take care of the survey design in calculating the variance of the estimators. So you should just type one of the two (just try them and see what works, I think that both should work and give the same results)

                Code:
                svy jackknife: reg outcome lag_iv i.county i.year
                
                svy: reg outcome lag_iv i.county i.year, vce(jackknife)


                Originally posted by Stephanie Hong View Post
                Hello Dr. Kolev,

                Thank you for your helpful responses! I think my svyset code follows the example 2 on pg. 6 in the Stata manual. It says that jackknife replicate weights (which are var1 to var400) replace strata and psu. As the survey dataset recommends researchers to use the jackknife replicate weights, I'm not sure if I can use psu and strata. So I was wondering if the svyset code I wrote (shown below) would correctly estimate the clustered standard errors when the code is correct. Since I don't see in the code that I am clustering my standard errors (e.g. vce(cluster county)), I wanted to make sure if this error structure is taken into account.

                Code:
                svyset [pw=pweight], jkrw(var1-var400, multiplier(1)) vce(jack) mse
                And to clarify, could you confirm if the following statement was what you meant? "If this svyset code is correct,
                Code:
                svy, reg: outcome lag_iv i.county i.year
                will correctly estimate the fixed-effects regression."

                Thank you so much for your insights on this!

                Comment


                • #9
                  Hello Dr. Kolev,


                  I apologize for my late reply. The first code worked for me. After survey setting, I used:


                  Code:
                   svy: reg outcome lag_iv i.county i.year

                  I really appreciate your advice on this problem!

                  Comment


                  • #10
                    I am glad that it worked, Stephanie !


                    Originally posted by Stephanie Hong View Post
                    Hello Dr. Kolev,


                    I apologize for my late reply. The first code worked for me. After survey setting, I used:


                    Code:
                     svy: reg outcome lag_iv i.county i.year

                    I really appreciate your advice on this problem!

                    Comment

                    Working...
                    X