Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed-effects, group-mean-centering and interaction terms

    Hi,
    I have a country-year-panel dataset with T=5 and N=130 and I want to estimate a lagged dependent variable (LDV) model and compare it to an autoregressive distributive lag model (ARDL) and an autoregressive distributive lag model with a second lag of the LDV (ARDL_LDV2). Following Beck/Katz (2011) I want to mean-center all explanatory variables by year and country (in order to allow for year-and country specific intercepts) by simultaneously applying panel corrected standard errors.
    To apply this, I first mean-center all variables (dependent and independent variables) by country and include year dummies in the OLS-regression on the deviations without an intercept.
    According to
    William Goulds post on "Interpreting the intercept in the fixed-effects model":
    „(…) removing within-group means and estimating a regression on the deviations without an intercept (as given in equation 3) produces the same coefficients but different standard errors.“ [compared to xtreg, fe]
    Code:
    egen double ybar = mean(y), by(ccode)
    egen double x1bar = mean(x), by(ccode)
    egen double x2bar = mean(x), by(ccode)
    gen yd = y-ybar
    gen x1d = x-x1bar
    gen x2d = x-x2bar
    
    xtreg y  x1  x2, fe
    reg    yd  x1d, noconstant
    reg    yd  x1d  x2d, noconstant
    reg    yd  x1d  i.year, noconstant
    Comparing these group-mean centered OLS results (without constant) with the results of Stata’s official xtreg,fe command should according William Goulds post lead to the same estimates but different standard errors, because of the difference in equation 3 (group-mean-centering) and equation 5 (which is applied by Stata’s xtreg,fe-command) (see Gould's post).
    Including one (exogenous) predictor indeed leads to the same coefficient but different standard errors. However by including another country-mean-centered predictor the coefficients of both variables do not equal the results from fixed effects estimation anymore. Similarly, replacing x2 with i.year also leads to these differences in coefficients and standard errors between the results of the fe-command and the country-mean-centered results.

    Why is that the case?

    My second question relates to the application of group-mean versus grand-mean-centering to estimate interaction effects:
    Following Aiken/West (1991) I test interaction effects, which I grand-mean-center before entering into my regression model in order to reduce the issue of multicollinearity and make interpretation easier. Therefore, how can I combine removing unit heterogeneity by group-mean-centering (as suggested by Beck/Katz) and reduce multicollinearity by grand-mean-centering at the same time?

    Any comments or suggestions are welcome!
    Thanks a lot in advance!
    Last edited by Steve Johnson; 19 Aug 2015, 10:07.

  • #2
    Dear Steve,
    I do not quite understand your results. You should try to post what you are obtaining in your post, to make it easier to see the problem, partcular the one regarding to " However by including another country-mean-centered predictor the coefficients of both variables do not equal the results from fixed effects estimation anymore."

    In other words.
    xtreg y x1 x2, fe
    should be equal to
    reg dy dx1 dx2

    as long as you do not have any missing information in your variables.
    Now including the year fixed effect will give you different results because you would also need to demean all the year dummies.

    xtreg y x1 i.year, fe
    should be equal to
    reg dy dx1 dyear1 dyear2 dyear3....etc

    Now regarding your arguments for multicollinearity. I would suggest treat both steps independent. Meaning
    1. estimate your variables and interactions that you wish to
    2. Estimate them using the grand means as you describe
    3. Demean all variables respect to the fixed effects groups.
    4. Estimate the model
    This should work for what you have in mind.
    HTH
    Fernando

    Comment


    • #3
      Dear Fernando,

      thanks a lot for your quick reply and your advice. I really appreciate your help!

      I tried to replicate the problem (i.e. that the coefficients differ) using an example data set, however, there the problem did not occur, i.e. obviously you are right:
      xtreg y x1 x2, fe should be equal to reg dy dx1 dx2
      So I guess, I did a mistake generating the group-mean variables, but I cannot find it. Therefore, below I attach the respective commands:

      Code:
      foreach var of varlist log_wdi_mort log_health_aidpc GOVERNANCE3 {
                  egen double `var'_gmean = mean(`var'), by(ccode)
                  gen `var'_w = (`var'-`var'_gmean)
                  label var `var'_w "`var' group-mean centered"
                  label var `var'_gmean "group-mean"       
                  drop `var'_gmean
      
      xtreg log_wdi_mort log_health_aidpc GOVERNANCE3, fe
      reg    log_wdi_mort_w log_health_aidpc_w GOVERNANCE3_w, noconstant
                  }
      (1) (2)
      FE OLS_group_mean_centered noconstant
      VARIABLES log_wdi_mort log_wdi_mort_w
      log_health_aidpc 0.0203*
      (0.0115)
      GOVERNANCE3 -0.0720
      (0.0555)
      log_health_aidpc_w 0.0139
      (0.0101)
      GOVERNANCE3_w -0.0819
      (0.0516)
      Constant 3.481***
      (0.0289)
      Observations 506 506
      R-squared 0.012 0.008
      Number of ccode 131

      The number of observations is similar but neither the coefficients for log_health_aidpc in the FE-model and log_health_aidpc_w in the group-mean-model nor for Governance are equal.

      as long as you do not have any missing information in your variables.
      I do not see any difference regarding missing information in the summary statistics.

      Variable | Obs Mean Std. Dev. Min Max
      -------------+---------------------------------------------------------
      log_wdi_mort | 655 3.544712 .8371916 1.098612 5.078294
      log_wdi_mo~w | 655 -4.07e-18 .2544643 -.8283019 .771646 /// group mean centered variable

      log_healt~pc | 629 .3327553 1.678597 -6.277211 5.008636
      log_health~w | 629 1.21e-17 1.020603 -3.552298 4.214557 /// group mean centered variable

      GOVERNANCE3 524 -.4451592 .6630121 -2.319006 1.155119
      GOVERNANC~w 524 7.44e-18 .1847843 -.7830873 .7734927 /// group mean centered variable


      GOVERNANC~c 524 2.21e-17 .6630121 -1.873847 1.600278 /// grand mean centered variable
      log_healt~_c | 629 -2.40e-17 1.678597 -6.609966 4.675881 /// grand mean centered variable



      Now regarding your arguments for multicollinearity. I would suggest treat both steps independent. Meaning
      1. estimate your variables and interactions that you wish to
      2. Estimate them using the grand means as you describe
      3. Demean all variables respect to the fixed effects groups.
      4. Estimate the model
      As far as I understand, you suggest to estimate the model three times with different specifications: 1) without centering at all, 2) with grand mean centering of the 2 explanatory variables, 3) group-mean-centering. However, one might expect the differences in the coefficient of the interaction term and the main effects between these specifications to be quite big, especially regarding the 3 model controlling for unit heterogeneity. Based on what criteria can I choose among those results?
      And is the interaction and the main effects similarly interpretable as within a grand-mean interaction model? At least the grand-means and the group-means show that they are both close to zero.

      Now including the year fixed effect will give you different results because you would also need to demean all the year dummies.
      Code:
      xtreg y x1 i.year, fe
      should be equal to
      reg dy dx1 dyear1 dyear2 dyear3....etc
      How is dyear1, dyear2 etc calculated?
      Isn't it also possible to just include i.year in the group-mean-centered model in order to control for country and year fixed effects (which would still allow for panel corrected standard errors)?
      Code:
      reg dy dx1 i.year
      Once again, thank you very much for your support!!!

      Comment


      • #4
        Hi Steve,

        To follow up on Fernando's explanation:

        Isn't it also possible to just include i.year in the group-mean-centered model
        No. Each year is a dummy variable that needs to be recentered; treat it as you would any other variable.


        I do not see any difference regarding missing information in the summary statistics.
        From you table, there *are* missing values. You are calculating mortality averages on 655 observations, while using governance averages on 524 observations. Thus, they are not the same sample.

        All in all, if you first drop the observations with missing values, and add the year dummies as extra regressors, you would get what you want. That said, I'm not sure why would you go through the effort to do it when -xtreg- (or equivalent commands) should work fine (unless it's just to understand what's going on).

        Best,
        Sergio

        Comment


        • #5
          Sergio!
          Of course! Thank you very much! Now I got it! ;-)

          why would you go through the effort to do it when -xtreg- (or equivalent commands) should work fine
          I want to estimate an autoregressive distributive lag model (ARDL) and an autoregressive distributive lag model with a second lag of the LDV (ARDL_LDV2) with panel corrected standard errors by accounting for country and year specific effects to test the dynamics of the model (following Beck/Katz 2011) and compare it to FE/RE and GMM estimations (xtabond2).

          So I would run something like the following, including all exogenous explanatory variables at current and lagged levels as well as the LDV and LDV2 by controlling for AR(1) and heteroscedasticity within panels:
          Code:
          xtpcse dy   l.dy l2.dy   dx1 dx2 dx1*dx2    l.dx1 l.dx2 l.dx1*l.dx2  dyear1 dyear2 dyear3, noconstant pairwise hetonly corr(ar1)
          If there is any shortcut to this "long route of estimation" I would be very happy to learn more about it!?
          Moreover, I´m still unsure whether it is reasonable to estimate the cross-product of the two group-mean-centered variables dx1*dx2 instead of the grand-mean-centered-interaction?
          Last edited by Steve Johnson; 20 Aug 2015, 04:57.

          Comment

          Working...
          X