Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • newey2 with force

    Hello,
    I am working with a very unbalanced panel data set, where my id variable is an individual, and the time variable is quarters. I am currently trying to run separate regressions for each individual in my panel, using Newey-West standard errors to control for what I know to be serial correlation in my data. When I read in the data, I --xtset-- my data as well as use --tsfill, full--. My code looks like the following:
    Code:
     levelsof person, local(levels) 
    
     foreach l of local levels {
    
     newey2   depvar   mean_depvar  if person == `l', force lag(4) 
    
     }
    where --mean_depvar-- is just the mean, by quarter, of the dependent variable of my regression. I am using force because I get the following error:
    Code:
    yq is not regularly spaced -- use the force option to override
    I understand that because I have missing observations for some individuals in the panel as the reason for this error. However, I just want to ensure that --force-- is not interpreting the correlation structure of the residuals incorrectly. If I have a residual for an individual at time t and time t+4, but nothing for t+2 and t+3, I do not want Stata to think that the correlation between the residuals at time t and t+4 are only one period apart. The documentation for --newey2-- says the following:

    newey2 handles missing observations differently for time series and panel data sets. Consider the example of a time series data set containing gaps, which is then recast using tsset as a panel data set with one group. newey and newey2 will not run on the time series version without force; with force they treat available observations as equally spaced. After the set is cast as a panel, newey2 will run without , force, and will assume zero serial correlation with missing observations.

    Is the reason that I cannot run --newey2-- without --force-- is because now I am running separate regressions, which are essentially using unbalanced time-series for each individual? Any thoughts or guidance on this would be much appreciated!


  • #2
    newey2 is from SSC, authored by David Roodman. I think that there is an error in the documentation. In both cases, you must specify the -force- option. However, as the documentation states, assumptions differ depending on whether the data is declared as a time series or a panel (compare the standard errors in the example below).

    Code:
    webuse grunfeld, clear
    keep if company==1
    drop if inrange(year, 1939, 1941)
    tsset year
    *DATA DECLARED AS A TIME SERIES
    newey2 invest mvalue kstock, lag(3) force
    xtset company year
    *DATA DECLARED AS A PANEL
    newey2 invest mvalue kstock, lag(3) force

    Code:
    .
    . tsset year
            time variable:  year, 1935 to 1954, but with a gap
                    delta:  1 year
    
    .
    . *DATA DECLARED AS A TIME SERIES
    
    .
    . newey2 invest mvalue kstock, lag(3) force
    No panel variable found. Time series assumed.
    
    
    Regression with Newey-West standard errors          Number of obs  =        17
    maximum lag : 3                                     F(  2,    14)  =     67.24
                                                        Prob > F       =    0.0000
    
    ------------------------------------------------------------------------------
                 |             Newey-West
          invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          mvalue |   .1228159   .0374329     3.28   0.005     .0425303    .2031015
          kstock |   .3608092   .0589307     6.12   0.000     .2344155     .487203
           _cons |  -151.1098   136.4013    -1.11   0.287    -443.6614    141.4418
    ------------------------------------------------------------------------------
    
    .
    . tset company year
           panel variable:  company (strongly balanced)
            time variable:  year, 1935 to 1954, but with a gap
                    delta:  1 year
    
    .
    . *DATA DECLARED AS A PANEL
    
    .
    . newey2 invest mvalue kstock, lag(3) force
    
    
    Regression with Newey-West standard errors          Number of obs  =        17
    maximum lag : 3                                     F(  2,    14)  =     66.46
                                                        Prob > F       =    0.0000
    
    ------------------------------------------------------------------------------
                 |             Newey-West
          invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          mvalue |   .1228159   .0355181     3.46   0.004     .0466372    .1989945
          kstock |   .3608092   .0599266     6.02   0.000     .2322794    .4893391
           _cons |  -151.1098   128.2852    -1.18   0.258    -426.2541    124.0345
    ------------------------------------------------------------------------------

    Comment


    • #3
      Andrew, thank for you the example. So in the second case, when the data is declared as a panel, it appears that newey2 does need force to run, and is assuming zero serial correlation between missing observations; however, it is not assuming that the observations are equally spaced, as in the case when we declare the data as a time series, correct?
      Thanks again for this response! This has been helpful.

      Comment


      • #4
        I agree with your assessment. Assuming that observations are equally spaced, it does not matter whether the data is declared as a panel or a time series. Here, we can replicate the -tsset force- result by defining a new time variable.

        Code:
        webuse grunfeld, clear
        keep if company==1
        drop if inrange(year, 1939, 1941)
        gen time2= _n
        tsset time2
        *TIME SERIES
        newey2 invest mvalue kstock, lag(3)
        xtset company time2
        *PANEL
        newey2 invest mvalue kstock, lag(3)
        Code:
        . tsset time2
                time variable:  time2, 1 to 17
                        delta:  1 unit
        
        .
        . *TIME SERIES
        
        .
        . newey2 invest mvalue kstock, lag(3)
        No panel variable found. Time series assumed.
        
        
        Regression with Newey-West standard errors          Number of obs  =        17
        maximum lag : 3                                     F(  2,    14)  =     67.24
                                                            Prob > F       =    0.0000
        
        ------------------------------------------------------------------------------
                     |             Newey-West
              invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              mvalue |   .1228159   .0374329     3.28   0.005     .0425303    .2031015
              kstock |   .3608092   .0589307     6.12   0.000     .2344155     .487203
               _cons |  -151.1098   136.4013    -1.11   0.287    -443.6614    141.4418
        ------------------------------------------------------------------------------
        
        .
        . xtset company time2
               panel variable:  company (strongly balanced)
                time variable:  time2, 1 to 17
                        delta:  1 unit
        
        .
        . *PANEL
        
        .
        . newey2 invest mvalue kstock, lag(3)
        
        
        Regression with Newey-West standard errors          Number of obs  =        17
        maximum lag : 3                                     F(  2,    14)  =     67.24
                                                            Prob > F       =    0.0000
        
        ------------------------------------------------------------------------------
                     |             Newey-West
              invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              mvalue |   .1228159   .0374329     3.28   0.005     .0425303    .2031015
              kstock |   .3608092   .0589307     6.12   0.000     .2344155     .487203
               _cons |  -151.1098   136.4013    -1.11   0.287    -443.6614    141.4418
        ------------------------------------------------------------------------------

        Comment


        • #5
          Hi Andrew, I had a question regarding Newey estimation. How do you decide the lag length? DO you take it to be the integer part of T^(1/4)?

          Edit: I actually looked at your time-length after posting. There are 17 periods, so are you using a different way to determine the lag length?
          Last edited by Saunok Chakrabarty; 23 Oct 2020, 02:27.

          Comment


          • #6
            There are basically three approaches to choosing lag length.

            1. Some automatic data driven procedure, and I think neither newey nor newey2 have such an option. -ivregress- does have it.

            2. Starting from a model with more lags, and testing down to your preferred specification, either by Wald tests, or by using information criteria such as BIC and AIC.

            3. Using a formula of the type that you are showing. For my own research when I use a formula, I use the one in Stock and Watson textbook. But yours (where did you take it from?) looks reasonable enough. E.g.,

            Code:
            . dis 100^(1/4)
            3.1622777
            so your formula gives 3 lags for 100 data points, seems reasonable enough.


            Originally posted by Saunok Chakrabarty View Post
            Hi Andrew, I had a question regarding Newey estimation. How do you decide the lag length? DO you take it to be the integer part of T^(1/4)?

            Edit: I actually looked at your time-length after posting. There are 17 periods, so are you using a different way to determine the lag length?

            Comment


            • #7
              Hi Andrew, I had a question regarding Newey estimation. How do you decide the lag length? DO you take it to be the integer part of T^(1/4)?

              Edit: I actually looked at your time-length after posting. There are 17 periods, so are you using a different way to determine the lag length?

              Disregard the lag length in #2, I just chose 3 to illustrate technique. Joro Kolev gives good advice on the possible approaches. Your rule of thumb is from Greene's textbook. I also provide the formula from Stock and Watson and full references in #4 of the following link:

              https://www.statalist.org/forums/for...lation-command

              Comment


              • #8
                Hi Joro Kolev , I actually took the lag-length after going through some of the answers in this link: https://stats.stackexchange.com/ques...-west-or-hac-s

                Apparently, the integer part of T^(0.25) -- this seems to be indicated by Greene. I haven't gone through the book myself. Thank you for all your answers.

                Andrew Musau, I went through the link you gave. That is very helpful. Thanks a lot.

                I also had a different question. (I may have posted this question separately just a few minutes ago; please ignore it. My internet was not working properly and I wasn't seeing all of your replies here, so I thought my original post here got deleted somehow.) Is it possible to incorporate newey-west with time- fixed effects? Or both time and cross-section fixed effects?

                Comment


                • #9
                  If you have standard panel data with \(\text{N}>\text{T}\) and want to estimate a fixed effects model, then just use cluster robust standard errors which are robust to heteroscedasticity and arbitrary forms of serial correlation. Alternatively, you can look at xtregar which fits fixed and random-effects linear models with an AR(1) disturbance. newey and newey2 (SSC) produce Newey-West standard errors for coefficients estimated by OLS or 2SLS.

                  Comment


                  • #10
                    Saunok, if you want to get more useful advice, you should be more explicit regarding what your data looks like, and what you are trying to achive.

                    E.g., do you assume that your N goes to infinity? Or that your T goes to infinity? And what kind of (in)dependence in your errors you are assuming?

                    What are you regressing on what?

                    Comment

                    Working...
                    X