Fixed-effects, group-mean-centering and interaction terms

Steve Johnson

Join Date: Mar 2015

Posts: 43
#1

Fixed-effects, group-mean-centering and interaction terms

19 Aug 2015, 10:02

Hi,
I have a country-year-panel dataset with T=5 and N=130 and I want to estimate a lagged dependent variable (LDV) model and compare it to an autoregressive distributive lag model (ARDL) and an autoregressive distributive lag model with a second lag of the LDV (ARDL_LDV2). Following Beck/Katz (2011) I want to mean-center all explanatory variables by year and country (in order to allow for year-and country specific intercepts) by simultaneously applying panel corrected standard errors.
To apply this, I first mean-center all variables (dependent and independent variables) by country and include year dummies in the OLS-regression on the deviations without an intercept.
According to William Goulds post on "Interpreting the intercept in the fixed-effects model":

„(…) removing within-group means and estimating a regression on the deviations without an intercept (as given in equation 3) produces the same coefficients but different standard errors.“ [compared to xtreg, fe]

Code:

egen double ybar = mean(y), by(ccode) egen double x1bar = mean(x), by(ccode) egen double x2bar = mean(x), by(ccode) gen yd = y-ybar gen x1d = x-x1bar gen x2d = x-x2bar xtreg y x1 x2, fe reg yd x1d, noconstant reg yd x1d x2d, noconstant reg yd x1d i.year, noconstant

Comparing these group-mean centered OLS results (without constant) with the results of Stata’s official xtreg,fe command should according William Goulds post lead to the same estimates but different standard errors, because of the difference in equation 3 (group-mean-centering) and equation 5 (which is applied by Stata’s xtreg,fe-command) (see Gould's post).
Including one (exogenous) predictor indeed leads to the same coefficient but different standard errors. However by including another country-mean-centered predictor the coefficients of both variables do not equal the results from fixed effects estimation anymore. Similarly, replacing x2 with i.year also leads to these differences in coefficients and standard errors between the results of the fe-command and the country-mean-centered results.
Why is that the case?

My second question relates to the application of group-mean versus grand-mean-centering to estimate interaction effects:
Following Aiken/West (1991) I test interaction effects, which I grand-mean-center before entering into my regression model in order to reduce the issue of multicollinearity and make interpretation easier. Therefore, how can I combine removing unit heterogeneity by group-mean-centering (as suggested by Beck/Katz) and reduce multicollinearity by grand-mean-centering at the same time?
Any comments or suggestions are welcome!
Thanks a lot in advance!

Last edited by Steve Johnson; 19 Aug 2015, 10:07.
Tags: fixed effects, panel data, regression
FernandoRios

Join Date: Apr 2014

Posts: 2495
#2

19 Aug 2015, 11:48

Dear Steve,
I do not quite understand your results. You should try to post what you are obtaining in your post, to make it easier to see the problem, partcular the one regarding to " However by including another country-mean-centered predictor the coefficients of both variables do not equal the results from fixed effects estimation anymore."

In other words.
xtreg y x1 x2, fe
should be equal to
reg dy dx1 dx2

as long as you do not have any missing information in your variables.
Now including the year fixed effect will give you different results because you would also need to demean all the year dummies.

xtreg y x1 i.year, fe
should be equal to
reg dy dx1 dyear1 dyear2 dyear3....etc

Now regarding your arguments for multicollinearity. I would suggest treat both steps independent. Meaning
1. estimate your variables and interactions that you wish to
2. Estimate them using the grand means as you describe
3. Demean all variables respect to the fixed effects groups.
4. Estimate the model
This should work for what you have in mind.
HTH
Fernando
Comment

Steve Johnson

Join Date: Mar 2015
Posts: 43

19 Aug 2015, 16:04

Dear Fernando,

thanks a lot for your quick reply and your advice. I really appreciate your help!

I tried to replicate the problem (i.e. that the coefficients differ) using an example data set, however, there the problem did not occur, i.e. obviously you are right:

xtreg y x1 x2, fe should be equal to reg dy dx1 dx2

So I guess, I did a mistake generating the group-mean variables, but I cannot find it. Therefore, below I attach the respective commands:

Code:

foreach var of varlist log_wdi_mort log_health_aidpc GOVERNANCE3 {
            egen double `var'_gmean = mean(`var'), by(ccode)
            gen `var'_w = (`var'-`var'_gmean)
            label var `var'_w "`var' group-mean centered"
            label var `var'_gmean "group-mean"       
            drop `var'_gmean

xtreg log_wdi_mort log_health_aidpc GOVERNANCE3, fe
reg    log_wdi_mort_w log_health_aidpc_w GOVERNANCE3_w, noconstant
            }

	(1)	(2)
	FE	OLS_group_mean_centered noconstant
VARIABLES	log_wdi_mort	log_wdi_mort_w

log_health_aidpc	0.0203*
	(0.0115)
GOVERNANCE3	-0.0720
	(0.0555)
log_health_aidpc_w		0.0139
		(0.0101)
GOVERNANCE3_w		-0.0819
		(0.0516)
Constant	3.481***
	(0.0289)

Observations	506	506
R-squared	0.012	0.008
Number of ccode	131

The number of observations is similar but neither the coefficients for log_health_aidpc in the FE-model and log_health_aidpc_w in the group-mean-model nor for Governance are equal.

as long as you do not have any missing information in your variables.

I do not see any difference regarding missing information in the summary statistics.

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
log_wdi_mort | 655 3.544712 .8371916 1.098612 5.078294
log_wdi_mo~w | 655 -4.07e-18 .2544643 -.8283019 .771646 /// group mean centered variable

log_healt~pc | 629 .3327553 1.678597 -6.277211 5.008636
log_health~w | 629 1.21e-17 1.020603 -3.552298 4.214557 /// group mean centered variable

GOVERNANCE3 524 -.4451592 .6630121 -2.319006 1.155119
GOVERNANC~w 524 7.44e-18 .1847843 -.7830873 .7734927 /// group mean centered variable

GOVERNANC~c 524 2.21e-17 .6630121 -1.873847 1.600278 /// grand mean centered variable
log_healt~_c | 629 -2.40e-17 1.678597 -6.609966 4.675881 /// grand mean centered variable

Now regarding your arguments for multicollinearity. I would suggest treat both steps independent. Meaning
1. estimate your variables and interactions that you wish to
2. Estimate them using the grand means as you describe
3. Demean all variables respect to the fixed effects groups.
4. Estimate the model

As far as I understand, you suggest to estimate the model three times with different specifications: 1) without centering at all, 2) with grand mean centering of the 2 explanatory variables, 3) group-mean-centering. However, one might expect the differences in the coefficient of the interaction term and the main effects between these specifications to be quite big, especially regarding the 3 model controlling for unit heterogeneity. Based on what criteria can I choose among those results?
And is the interaction and the main effects similarly interpretable as within a grand-mean interaction model? At least the grand-means and the group-means show that they are both close to zero.

Now including the year fixed effect will give you different results because you would also need to demean all the year dummies.

Code:

xtreg y x1 i.year, fe
should be equal to
reg dy dx1 dyear1 dyear2 dyear3....etc

How is dyear1, dyear2 etc calculated?
Isn't it also possible to just include i.year in the group-mean-centered model in order to control for country and year fixed effects (which would still allow for panel corrected standard errors)?

Code:

reg dy dx1 i.year

Once again, thank you very much for your support!!!

Comment

Sergio Correia

Join Date: Apr 2014

Posts: 420
#4

19 Aug 2015, 18:31

Hi Steve,

To follow up on Fernando's explanation:

Isn't it also possible to just include i.year in the group-mean-centered model

No. Each year is a dummy variable that needs to be recentered; treat it as you would any other variable.

I do not see any difference regarding missing information in the summary statistics.

From you table, there *are* missing values. You are calculating mortality averages on 655 observations, while using governance averages on 524 observations. Thus, they are not the same sample.

All in all, if you first drop the observations with missing values, and add the year dummies as extra regressors, you would get what you want. That said, I'm not sure why would you go through the effort to do it when -xtreg- (or equivalent commands) should work fine (unless it's just to understand what's going on).

Best,
Sergio
Comment
Steve Johnson

Join Date: Mar 2015

Posts: 43
#5

20 Aug 2015, 04:52

Sergio!
Of course! Thank you very much! Now I got it! ;-)

why would you go through the effort to do it when -xtreg- (or equivalent commands) should work fine

I want to estimate an autoregressive distributive lag model (ARDL) and an autoregressive distributive lag model with a second lag of the LDV (ARDL_LDV2) with panel corrected standard errors by accounting for country and year specific effects to test the dynamics of the model (following Beck/Katz 2011) and compare it to FE/RE and GMM estimations (xtabond2).

So I would run something like the following, including all exogenous explanatory variables at current and lagged levels as well as the LDV and LDV2 by controlling for AR(1) and heteroscedasticity within panels:

Code:

xtpcse dy l.dy l2.dy dx1 dx2 dx1*dx2 l.dx1 l.dx2 l.dx1*l.dx2 dyear1 dyear2 dyear3, noconstant pairwise hetonly corr(ar1)

If there is any shortcut to this "long route of estimation" I would be very happy to learn more about it!?
Moreover, I´m still unsure whether it is reasonable to estimate the cross-product of the two group-mean-centered variables dx1*dx2 instead of the grand-mean-centered-interaction?

Last edited by Steve Johnson; 20 Aug 2015, 04:57.
Comment

Announcement