Fixed Effects and Clustering in Panel Data

Karan Patel

Join Date: Nov 2023

Posts: 2
#1

Fixed Effects and Clustering in Panel Data

29 Nov 2023, 14:44

Hi, I am a university student and I am struggling to understand if my regression is correct.

I have a dataset spanning 150 countries across 6 variables including gini, readiness, vulnerability, gdppc, pop, pop density from 1995-2021. I want to run a regression to test the effect of readiness on gini, with the other variables as controls, with some lagged effects. Readiness and vulnerability come from the ND-GAIN index for climate shocks, and my research is on the effect of climate readiness on income inequality.

So far, I have tried to xtset ID, year, and then xtreg gini_disp readiness L10.readiness L20.readiness vuln gdppc pop popden, fe vce(cluster ID) where ID is the variable I created for each country to xtset.

However, I do not fully understand the meanings of fixed effects and clustering and so cannot tell if I am doing this correctly or not. My understanding is that to control for error terms being correlated with the independent variables (which is heterogeneous by country) due to omitted variable bias we must include fe and cluster for the country to get statistically correct results.

On a similar note, I have seen papers use ηi+μt to control for fixed effects by country and time (which is what I want to do!) - have I captured this in my regression above or do i need to alter it.7

Thank you in advance
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30151
#2

29 Nov 2023, 15:06

Code:

xtreg gini_disp readiness L10.readiness L20.readiness vuln gdppc pop popden, fe vce(cluster ID)

is syntax for an appropriate one-way fixed effects regression. The -fe- option causes Stata to (silently) incorporate your "ηi". (Well, that isn't actually what it does internally, but the results are the same as if it were.)

But that analysis lacks any μt effect. To get that, you have to expand your command slightly:

Code:

xtreg gini_disp readiness L10.readiness L20.readiness vuln gdppc pop popden i.year, fe vce(cluster ID)

My understanding is that to control for error terms being correlated with the independent variables (which is heterogeneous by country) due to omitted variable bias we must include fe and cluster for the country to get statistically correct results.

This statement is overly broad. It is true that in the vast majority of panel data analyses one should include -fe- and cluster the vce on the panel variable. But it is not always true. This is not the time or place to go into detail, because it would be too lengthy, but just know that there are circumstances where -re- can, and some where it should, be used instead of -fe-, and still others where ordinary least squares regression does the trick. Clustering on the panel variable is also sometimes unnecessary, and there are even situations (small sample size) where it is wrong to use.

Added: I'd also like to point out that not every panel data regression needs to include time fixed effects (μt). In many circumstances, omitting those is perfectly reasonable, and in some circumstances it is obligatory. Not to mention that there are also circumstances where the effects of time are better modeled with a (or sometimes several) continuous variable(s).

Last edited by Clyde Schechter; 29 Nov 2023, 15:10.
Comment
Karan Patel

Join Date: Nov 2023

Posts: 2
#3

29 Nov 2023, 15:26

Hi Clyde, thank you for the rapid response.

In this context, I agree with your added point in that I do not think I need to add that as I am controlling for that with the additional variables like gdppc. So, I will go with the first line of code in your answer. Also, I just want to get the explanation for why I am including fe correct - is it to control for common shocks which may have affected all countries? (so as to isolate the effect of readiness on how a climate shock impacts gini)

Thanks again
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30151
#4

29 Nov 2023, 16:27

I just want to get the explanation for why I am including fe correct - is it to control for common shocks which may have affected all countries? (so as to isolate the effect of readiness on how a climate shock impacts gini)

Yes, that's exactly it. I don't like using the phrase "control for" here, because this is observational data and, consequently, nothing is actually "controlled." I prefer to say that we are "adjusting for" the common shocks. Of course, the use of "control" has, for better or for worse, penetrated common usage. But I think it is best to be conscious of the fact that the expression "control for," taken at face value, is misleading in non-experimental contexts..
1 like
Comment

Announcement

Fixed Effects and Clustering in Panel Data

Comment

Comment

Comment