Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtewreg and demeaning variables on two dimensions

    I have two issues related to running xtewreg (Erickson, Jiang, Whited, 2014, errors-in-variables remedy) that I cannot seem to resolve:

    1. At the bottom of the xtewreg help file (http://www.haghish.com/statistics/st...d/xtewreg.html) is the following disclaimer:
    "As XTEWreg requires de-meaned data (and does not compute fixed effects internally), the researcher must de-mean the data appropriately before using XTEWreg."
    I have no problem demeaning the data on one dimension (i.e., id), but I also need to demean the data by another dimension (i.e., time). I've read this post (https://www.statalist.org/forums/for...s-do-not-match) on demeaning variables, and I can obtain the correct coefficients if I follow #3. I also need to obtain an accurate within r-squared (rho-squared in the case of xtewreg), but adding demeaned year dummies biases the rho-squared upwards. Essentially I need to demean my data on two dimensions without adding dummy variables in my regression that will bias the within r-squared upwards.

    2. The second issue is with respect to clustering standard errors. xtewreg seems to only allow clustering errors on one dimension, depending on how the panel is defined (either by id and time or just by time), but I would really prefer to cluster on both dimensions. Does anyone have any suggestions on how to accomplish this? Does xtewreg even have that functionality?

    Thanks,
    Mike

  • #2

    I have no problem demeaning the data on one dimension (i.e., id), but I also need to demean the data by another dimension (i.e., time)
    Here is how you demean by firm and year

    Code:
    webuse grunfeld
    *GEN MEAN DEVIATED VARS (BY FIRM AND YEAR)
    
    local vars "invest mvalue kstock"
    foreach x of local vars{
    bys company: egen `x'fm= mean(`x')
    bys year: egen `x'ym= mean(`x')
    }
    
    foreach x of local vars{
    gen `x'd= `x'-`x'fm - `x'ym
    }
    Running OLS on the above is equivalent to the two-way fixed effects estimator. Compare

    Code:
    regress investd mvalued kstockd
    *ssc install reghdfe
    reghdfe invest mvalue kstock, a(company year)

    The second issue is with respect to clustering standard errors. xtewreg seems to only allow clustering errors on one dimension, depending on how the panel is defined (either by id and time or just by time), but I would really prefer to cluster on both dimensions. Does anyone have any suggestions on how to accomplish this? Does xtewreg even have that functionality?
    Think about it. With panel data, you have 1 obervation per firm-year combination. Therefore, clustering by firm-year is the same as clustering by observation. In this case, the implication would be that observations are independent which is (arguably) incorrect given that have repeated observations of the same firm across time. If you refer to the xtreg manual, there is a reference to a paper by Stock and Watson which argues for clustering by firm in panel data.


    EDIT: Upon re-reading your post, it appears that your question is whether you can cluster on two dimensions and the answer is no. The available commands only allow clustering on one dimension.
    Last edited by Andrew Musau; 28 Aug 2018, 00:09.

    Comment


    • #3
      Thank you for your response, Andrew.

      With respect to your code to demean by firm and year, I had also already found the same code on a Stata blog post somewhere and tried that methodology. Unfortunately, it did not work in my dataset, and I think I know why. In the 'grunfeld' dataset, each firm has the same number of annual observations (20). In my dataset, each firm does not have the same number of annual observations. If you run the code you provided but remove one observation from one firm the results are no longer identical between reg and reghdfe.

      Code:
      webuse grunfeld, clear
      
      drop if company==1 & year==1935
      
      local vars "invest mvalue kstock"
      foreach x of local vars{
      bys company: egen `x'fm= mean(`x')
      bys year: egen `x'ym= mean(`x')
      }
      
      foreach x of local vars{
      gen `x'd= `x'-`x'fm - `x'ym
      }
      
      regress investd mvalued kstockd
      
           investd |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           mvalued |   .1186382   .0125303     9.47   0.000     .0939267    .1433497
           kstockd |   .3608052   .0212538    16.98   0.000     .3188898    .4027207
             _cons |   82.12631   13.63245     6.02   0.000      55.2412    109.0114
      
      
      reghdfe invest mvalue kstock, a(company year)
      
            invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
            mvalue |   .1202035   .0140381     8.56   0.000     .0924897    .1479173
            kstock |   .3607151   .0229472    15.72   0.000     .3154131    .4060171
      If you know of another way to demean the variables without adding any dummy variables in the specification, that would be great. If not, understandable.

      As for the second issue, there is also a paper by Petersen (2009) that focuses on estimating standard errors in finance panel data sets, which is exactly what I am doing. That paper points out that in the presence of a time effect, White standard errors (i.e., clustering by firm) can still be biased when the residuals are not independent. Furthermore, if I cluster SEs by year rather than firm, the standard error goes up, so I know that there is in fact cross-sectional correlation in the errors. To be clear, none of my main results change when clustering errors by year, but I just want to be as conservative, and as correct, as possible when presenting my results.

      Comment


      • #4
        I should have added that the within transformation is equivalent to the dummy variable approach in the case of a balanced panel (both for one-way FE and two-way FE). If you are interested in the details of how to compute the unbalanced two-way fixed effects estimator, see Baltagi's book, it has a full chapter on that. Here is an old Statalist post illustrating how you can implement the estimator using Mata, Stata's matrix language. I have not looked carefully at it but in case it does not give you what you want, you can report back with a specific issue.

        Reference

        Baltagi, Badi H. Econometric Analysis of Panel Data. 5th Edition. 2008.

        Comment

        Working...
        X