Multicollinearity in a fixed effects panel regression - difference in differences

Simon Sampson

Join Date: Aug 2016

Posts: 12
#1

Multicollinearity in a fixed effects panel regression - difference in differences

13 Sep 2016, 09:29

So I am having a multicollinearity issue with my current regression. I have a panel data of GP practice QOF scores from 2012-2015 and am using ,fe to calculate practice fixed effects. I have included a dummy (HIGH_2013) to identify which practices scored highly in one category of the QOF scores in 2013. I.e if a practice scored highly in the category in 2013, HIGH_2013 will be equal to 1 for every period for that practice.

I include that variable in my regression, along with a dummy equal to 1 if the year is 2014 and an interaction between the two (it's a difference-in-differences). However if I run the regression HIGH_2013 will be omitted. I believe this is occurring as it is simply acting as an identifier for a practice, which is essentially what fe is doing. Is this correct? Are diff-in-diff and fe fundamentally mutually exclusive or is there some way I can get around this problem?
Tags: None
Simon Sampson

Join Date: Aug 2016

Posts: 12
#2

13 Sep 2016, 09:35

For some more info - I am doing this because I think aggregate QOF scores may have two effects on my dependent variable, one in either direction. This is simply an attempt to separate one of them out. If I don't include HIGH_2013 on its own, I imagine the interaction between HIGH_2013 & the 2014 dummy will be hard to interpret, but would this change the coefficient of aggregate QOF scores?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#3

13 Sep 2016, 10:09

You are correct in observing that because HIGH_2013 is constant within practices, it is colinear with the fixed effects and is automatically omitted from the regression.

You are incorrect in thinking of this as a problem. It isn't. For the moment, set aside fixed effects and just look at how difference-in-differences (DID) analyses work. You regresss an outcome on group, time, and group#time. What does the coefficient of group represent in this model? Because of the interaction term, the coefficient of group represents only the difference between groups conditional on time = 0. That is, assuming time is coded 0 as baseline and 1 as follow-up, it is the baseline difference between groups. In the DID model, the baseline difference between groups is just a nuisance parameter. The important assumption for DID is that whatever that baseline difference is, it would have continued unchanged through the follow-up period. But the actual value of the baseline difference is irrelevant, to inference about the effect of group membership on outcome at followup. The parameter of greatest interest in the output of a DID regression is the coefficient of the interaction term. It represents the separate effect of group membership in the follow-up time period. That is the "pay dirt" when DID models are used to estimate causal effects.

Now, when the data is panel data and you choose to use a fixed-effects regression, you find that this irrelevant nuisance parameter is not estimable and is omitted from the model. No big deal! The parameter that matters is the interaction coefficient, and that is not colinear with the fixed effects and is not omitted from the analysis. So you're in perfectly good shape here.

If, however, your research goals go beyond trying to estimate a causal effect of HIGH_2013 and, for other reasons, you actually need to explicitly estimate the baseline difference in outcome betweern HIGH_2013 practices and the others, then you cannot do it in a fixed effects model. A separate between-effects regression for just that purpose could be appropriate. Or you could consider using a random effects model for your DID analysis, which will give you a simultaneous estimate of the baseline group difference.
Comment
Simon Sampson

Join Date: Aug 2016

Posts: 12
#4

13 Sep 2016, 10:18

Originally posted by Clyde Schechter View Post

You are correct in observing that because HIGH_2013 is constant within practices, it is colinear with the fixed effects and is automatically omitted from the regression.

You are incorrect in thinking of this as a problem. It isn't. For the moment, set aside fixed effects and just look at how difference-in-differences (DID) analyses work. You regresss an outcome on group, time, and group#time. What does the coefficient of group represent in this model? Because of the interaction term, the coefficient of group represents only the difference between groups conditional on time = 0. That is, assuming time is coded 0 as baseline and 1 as follow-up, it is the baseline difference between groups. In the DID model, the baseline difference between groups is just a nuisance parameter. The important assumption for DID is that whatever that baseline difference is, it would have continued unchanged through the follow-up period. But the actual value of the baseline difference is irrelevant, to inference about the effect of group membership on outcome at followup. The parameter of greatest interest in the output of a DID regression is the coefficient of the interaction term. It represents the separate effect of group membership in the follow-up time period. That is the "pay dirt" when DID models are used to estimate causal effects.

Now, when the data is panel data and you choose to use a fixed-effects regression, you find that this irrelevant nuisance parameter is not estimable and is omitted from the model. No big deal! The parameter that matters is the interaction coefficient, and that is not colinear with the fixed effects and is not omitted from the analysis. So you're in perfectly good shape here.

If, however, your research goals go beyond trying to estimate a causal effect of HIGH_2013 and, for other reasons, you actually need to explicitly estimate the baseline difference in outcome betweern HIGH_2013 practices and the others, then you cannot do it in a fixed effects model. A separate between-effects regression for just that purpose could be appropriate. Or you could consider using a random effects model for your DID analysis, which will give you a simultaneous estimate of the baseline group difference.

Thanks for the thorough response! You are correct in that I am not actually interested in HIGH_2013, rather the interaction term is the only coefficient of interest. I was just worried that without being able to include HIGH_2013 the interaction term would be inaccurate for some reason - I've just been tying my brain in knots thinking about it.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#5

13 Sep 2016, 10:36

You are in general right to be concerned that a model that includes an interaction term but not the associated "main effects" is mis-specified. This fixed-effects regression colinearity is the only major exception to that rule. The resulting model is not mis-specified because even though the HIGH_2013 effect isn't explicitly in the model, its "effects" are represented, indirectly, by the practice fixed effects instead. So it's all copacetic.
Comment
Simon Sampson

Join Date: Aug 2016

Posts: 12
#6

13 Sep 2016, 10:40

I thought that might be the case, but I haven't seen an example of this before so was unsure, and stats / econometrics is not my strong point. I suppose I am performing a somewhat unusual regression.
Comment

Announcement

Multicollinearity in a fixed effects panel regression - difference in differences

Comment

Comment

Comment

Comment

Comment