Time-invariant variables in Fixed-effects model

Iván Higuera Mendieta

Join Date: Oct 2014

Posts: 28
#1

Time-invariant variables in Fixed-effects model

26 Nov 2014, 13:52

Hi to all,

I have panel data for municipalities and I am trying to estimate the following model:

Code:

xtset id year xtreg depvar x1 x2 x3 x4 i.year, fe

Where x1, x2 and x3 are time-variant variables, while x4 is not. I know that FE models don't allow time-invariant variables because you use FE precisely to make those constant and "control" for individual characteristics (Stata will drop these due to collinearity with the id). Nontheless, one of those time-invariant variables is important to my estimation. Is there a way to estimate the same fixed effects model with the time-invariant variable (x4)?

I have seen in some papers that they multiply the time-invariant variable with the time variable (something like x3X2000 or any year), but I don't know what is the theory behind this.

Any thoughts?

Thanks in advance,

Iván
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3846
#2

26 Nov 2014, 14:02

You might want to have a look into so-called "hybrid-models" or "between-within" models. Paul Allison (2009) gives good introduction, including Stata code. Also see Schunck (2013).

Best
Daniel

Allison, Paul D. (2009). Fixed Effects Regression Models. Thousand Oaks:SAGE.

Schunck, Reinhard. (2013). Within and between estimates in random-effects models: Advantages and drawbacks of correlated random effects and hybrid models.The Stata Journal, 13(1):65-76.
Comment
Iván Higuera Mendieta

Join Date: Oct 2014

Posts: 28
#3

26 Nov 2014, 14:45

Hello Daniel,

I appreciate the references, but right now it is difficult to me to access to them (those green books of SAGE are great!). I will review them, but in the meantime, Is there a way to make these interactions of time-invariant variables in the FE xtreg model?

If you multiply a time-invariant ("x4") by a time variable, say "year", you get a variable that now variate in time, but my question is: is the complete multiplication of both vectors?

Code:

gen x4Xyear = x4 * year **Regression** xtreg depvar x1 x2 x3 x4Xyear, fe

or you have to do it by several years?

Code:

**Suppose you have year indicators y1990 to y1990** foreach var of varlist y1* { gen x4X`var' = (x4 * `var') } **Regression** xtreg depvar x1 x2 x3 x4X*, fe

Thanks,

Iván
Comment
daniel klein

Join Date: Mar 2014

Posts: 3846
#4

26 Nov 2014, 15:08

Technically you should in each case use factor variable notation (see help fvvarlist).

From a substantial perspective do not use interactions as a way of including time-invariant predictors in the model. By interacting such a predictor with time, your model answers the theoretical question of how the effect of that predictor varies over time. It does by no means estimate a main effect of this predictor. If you are not interested in testing interaction effects then you should not use interactions.

The "hybrid-model" is actually a rather simple thing, that can be explained in three steps

1. Calculate the panel-unit-specific mean for all time-varying predictors (but not the response/outcome). This is something along the lines by <id> ,sort : egen x1_between = mean(x1)

2. Subtract the panel-unit-specific mean from the original values, i.e. preform the fixed-effects/within-transformation. This is as simple as generate x1_within = x1 - x1_between

3. Run a random-effects/mixed model where you include the time-varying predictors in their de-meaned form (those from step 2) and their mean (those calculated in step 1) along with the time-invariant predictors. This is, in the simplest form, xtreg depvar x1_within x1_between x2_within x2_between x3_within x3_between x4

You are done. The coefficients for the *_within variables resemble the fixed-effects estimates, while the *_between variables can be interpreted as a between estimator. The coefficients for time-invariant predictors are those from a random-effects model.

Be warned that interactions are not as straight forward implemented in such models, as one might think. But see Schunck (2013) for more on this point.

Best
Daniel
4 likes
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4984
#5

26 Nov 2014, 15:17

Don't generate interactions yourself. Use factor variable notation.

Code:

xtreg depvar x1 x2 x3 i.year c.x4#i.year, fe

I am assuming the x's are all continuous. If not, refer to them as i.x1, i.x2, etc.

Also see this thread:

http://www.statalist.org/forums/foru...fe-and-margins

As for rationale, you are4 saying that, while the value of x4 is invariant, its effect is not, i.e. it has different effects in different years. If that isn't what your theory says, then you shouldn't do this.

As for the hybrid model, see

http://www.statisticalhorizons.com/p...-hybrid-method

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2591
#6

26 Nov 2014, 15:32

Daniel already gave some good advice. Let me add my few cents to it.

Daniel's point 2 actually can be skipped. You will get exactly the same results in the third step by using the original variables x1 instead of x1_within ..., which you can easily verify. In the literature, this approach is also known as "correlated random effects".

With regard to the interactions of time-invariant variables with time-varying variables, I agree with Daniel. You have to be careful here with the interpretation of the coefficients because this changes your underlying model. In particular, when your initial model is true but you interact the time-invariant variable with year dummies, the respective coefficients of the interactions are just zero (in the population).

Starting from your initial model, you always need to impose additional exogeneity assumptions of one sort or another to identify the coefficients of time-invariant variables. The correlated random effects model imposes a Mundlak-type assumption of the form
\[
E [\alpha_i | \mathbf{X}_i, \mathbf{Z}_i] = c + \bar{\mathbf{x}}_i' \boldsymbol{\pi}
\]
where \( \bar{\mathbf{x}}_i \) is the unit-specific mean of the time-varying regressors (x1_between ... in Daniel's notation). It restricts the time-invariant regressors to be correlated with the "fixed effects" \( \alpha_i \) only indirectly via their correlation with the average of the x-regressors.

In addition to Daniel's suggestions, you might also want to have a look at the Hausman-Taylor model that assumes some of the variables to be uncorrelated with the unobserved effects, as described for example in
Wooldridge (2010): Econometric Analysis of Cross Section and Panel Data, MIT Press; Chapter 11.3,
and in the Stata manual for the command xthtaylor (although this Stata command is not flexible enough and will not work in your case because you only have one time-invariant regressor).

There are pros and cons with respect to the different methods, and it depends on your particular application which is most suitable.

Edit: There has been a similar discussion on Statalist before: http://www.statalist.org/forums/foru...riant-variable

Last edited by Sebastian Kripfganz; 26 Nov 2014, 15:45.

https://www.kripfganz.de/stata/
2 likes
Comment
daniel klein

Join Date: Mar 2014

Posts: 3846
#7

26 Nov 2014, 15:40

Skipping Step 2 changes the interpretation of the coefficients for the *_between variables, though, that will in the case of the correlated random effects model represent the difference between the within and between estimator. Since you are probably not interested in interpreting these coefficients anyway, Sebastian's "short-cut" to prefer the correlated random effects model over the hybrid model is good advice.

Best
Daniel
1 like
Comment
Iván Higuera Mendieta

Join Date: Oct 2014

Posts: 28
#8

26 Nov 2014, 15:47

Hi Richard and Daniel,

As for rationale, you are4 saying that, while the value of x4 is invariant, its effect is not, i.e. it has different effects in different years. If that isn't what your theory says, then you shouldn't do this.

Actually not, I expect that the time-invariant variable has an effect all over the years of the panel. I will try the hybrid model explained by Daniel and fortunately, I will not use a logit model, thanks for the warning. I have a question with the hybrid model, when you make the demeaning of the independent (and time-variable) variables, don't you have to do this by year and id?

Code:

foreach var of varlist x1 x2 x3 { by id (year) ,sort : even `var'_between = mean(`var') }

In Daniel's example it seems the demeaning is only made by observation (id).

Thanks again,

Iván
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2591
#9

26 Nov 2014, 15:47

Thanks Daniel. That's true. I only focused on the *_within variables.

https://www.kripfganz.de/stata/
Comment
Iván Higuera Mendieta

Join Date: Oct 2014

Posts: 28
#10

26 Nov 2014, 15:53

Sorry for the reply #8. I misunderstood the "demeaning" of the variables.

Thanks Sebastian, Richard and Daniel for all the advice.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30087
#11

26 Nov 2014, 16:02

Nontheless, one of those time-invariant variables is important to my estimation.

No, it's not!! In fact, precisely because it is time-invariant and you are incorporating fixed effects, the effect of that time-invariant variable is unidentifiable in that model. The use of an interaction between that variable and time will give you an interaction term that is estimable, but that is a different model altogether and what it estimates is, most emphatically, not the effect of that time-invariant variable (as others have already pointed out).

I'm not sure in what sense you think that variable is important. If your concern is that it is a potential confounder that needs to be adjusted for, the fixed-effect already does that (and more) for you and you needn't worry further about it. If you mean that it is important to your research program to identify the cross-panel association between this variable and your outcome, that can be done, but in the between-effects model, not the fixed-effects model.
3 likes
Comment
Iván Higuera Mendieta

Join Date: Oct 2014

Posts: 28
#12

26 Nov 2014, 16:09

Thanks for the explanation Clyde, that was precisely what I realized after all this post.
Comment
Sean O'Connor

Join Date: Jun 2014

Posts: 119
#13

16 Feb 2016, 03:29

Hi all,

I'm wondering if I can run something by people in regards to utilising invariant variables (regional dummies) in a fixed effects estimation?

I'm looking to examine the effects industrial diversity has on regional employment growth, with my model as follows.

For my control variables I want to include regional dummies, which although are invariant I believe their effects are not. Let me expand. Two regions in my estimation house the Department of Education and Department of Social Protection. Therefore, the majority of employees who work in these sectors - throughout the entire country are denoted as being employed in these regions.
The time period under examination is from 2006-2012. I argue that when the economic downturn occurs, from 2008 onwards the effects might not seem as severe in these two regions given the type of employment which is present in them.

Therefore, as Richard above states, while these regional dummies are invariant, I would argue there effects are not, particularly from 2008 onwards.

When I include such measures, a lot of the variables in my estimation (namely diversity) line up with theory and hypothesis also.
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#14

17 Dec 2016, 15:04

Originally posted by Clyde Schechter View Post

No, it's not!! In fact, precisely because it is time-invariant and you are incorporating fixed effects, the effect of that time-invariant variable is unidentifiable in that model. The use of an interaction between that variable and time will give you an interaction term that is estimable, but that is a different model altogether and what it estimates is, most emphatically, not the effect of that time-invariant variable (as others have already pointed out).

I'm not sure in what sense you think that variable is important. If your concern is that it is a potential confounder that needs to be adjusted for, the fixed-effect already does that (and more) for you and you needn't worry further about it. If you mean that it is important to your research program to identify the cross-panel association between this variable and your outcome, that can be done, but in the between-effects model, not the fixed-effects model.

Hey Clyde can you elaborate a bit more on this statement? If you have an variable in a fixed effects model that is an interaction between a time variant and time invariant variable, how do you interpret this coefficient? does it matter that you are no longer interpreting a simple slope that is the sum of your constant and the coefficient on the time invariant variable, as you would in an ols model without fixed effects? how does this change your model?

To add more context I have a model where I interact a measure of spending with the within unit mean of poverty, and another with the within unit mean of minority composition. What is very interesting is that they have opposite signed effects. If I include those in my model along with the raw value of expenditures, does my specification suffer from the lack of inclusion of the within unit means? Is what I have a valid specification?

Last edited by Philip Gigliotti; 17 Dec 2016, 15:50.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30087
#15

17 Dec 2016, 16:21

If you have an variable in a fixed effects model that is an interaction between a time variant and time invariant variable, how do you interpret this coefficient?

You interpret it the same way you would any other interaction coefficient.

does it matter that you are no longer interpreting a simple slope that is the sum of your constant and the coefficient on the time invariant variable, as you would in an ols model without fixed effects?

So let's say we have a time-invariant predictor, TI, and a time varying predictor TV and we do an OLS regression like -regress outcome TI##TV- The underlying model is

Code:

outcome = constant + b_TI TI + b_TV TV + b_interactionTI*TV + error, which we can rewrite as two equations: outcome = constant + b_TI TI + b'_TV TV + error, and b'_TV = b_TV+ b_interactionTI

Now, in a fixed effects model, the error term is split into two parts: u + epsilon, where u is a time-invariant fixed effect for each panel, and epsilon is time varying. So the equation looks like:

Code:

outcome = constant + b_TI TI + b'_TV TV + u + error, and b'_TV = b_TV+ b_interactionTI

Now, this model is unidentified because TI and u are colinear. You can add or subtract arbitrary quantities from u and get an equally valid equation by a compensatory change in b_TI because TI itself doesn't vary within panel. Or, vice versa. That is why it is in principle impossible to get a separate estimate for b_TI in a time invariant model. Any estimate will do: just make a compensatory change in u and the model predictions are exactly the same. So, the model can only be estimated if some identfying constraint is imposed. The conventional constraint is to set b_TI = 0, which is formally the equivalent of omitting the b_TITI term from the model. But notice that this in no way modifies anything about the other terms in the model. Notice also that b_TI does appear at all in the second equation in either the OLS or FE model. So everything works exactly as in OLS, except that no estimate is obtained for b_TI.

Now, that does not mean that the coefficient estimates for b_TV, and b_interaction turn out the same in OLS and FE. That is (usually) not the case because the FE model is estimating within-panel relationships between variables, whereas OLS, (mis)applied to panel data estimates an ill-defined mixture of between- and within-panel effects. So in this sense the models are quite different. But as you can see from the equations, the algebraic relationships between TV and the TV#TI interaction are the same, and the interaction coefficient has the same interpretation as a difference in the effect of TV conditional on the value of TI (though the effects themselves differ between the two models.)
Comment

Announcement

Time-invariant variables in Fixed-effects model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment