Pooled OLS with clustered standard errors or Random Effects (Panel Data)

Hovhannes Nahapetyan

Join Date: Jul 2017

Posts: 44
#1

Pooled OLS with clustered standard errors or Random Effects (Panel Data)

16 Nov 2018, 21:49

Hi all,

I would appreciate it very much if someone could clarify how to choose between Pooled OLS with clustered standard errors or Random Effects. Since composite error term in panel data consists of two term (u(i) + e(i)) the need to cluster with fixed effect regression is clear: although u(i) is removed but we still have e(i,t) to worry about and thus we cluster. However, when comparing random effects (xtreg, re cluster()) and pooled OLS with clustered standard errors (reg, cluster()), I have hard time understanding how one should choose between the two. Random effects don’t get rid of u(i) and therefore clustering addresses heteroskedasticity and autocorrelation for both terms i.e u(i) and e(i.t) but so should pooled OLS with clustered standard errors. Is there a difference and what should be the guiding principle for choosing one over the other.

Thanks
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

17 Nov 2018, 00:37

Hovhannes:
the clearest approach aims at checking wheteher random effect exists after -xtreg,re-.
If the -xttest0- outcome lacks statistical significance, you should go pooled OLS.

Kind regards,
Carlo
(Stata 19.0)
Comment
Hovhannes Nahapetyan

Join Date: Jul 2017

Posts: 44
#3

17 Nov 2018, 17:32

Hi Carlo,

Thank you so much for your prompt reply! I think I understand the difference between fixed, random and oil to some extend: random effects is between fixed and pooled OLS dependant on the value of theta. if variance of u(i) is zero then theta is 0 and thus use pooled OLS (and cluster standard errors in case e(it) are heteroskedastic and autocorrelated. if on the other hand theta is 1 and all variation is between, then use fixed effects to get rid of u(i). xttest0 will tell us whether theta is 0 or variance of u(i) is zero and thus we can decide if we need to run random or pooled OLS.

I think I am ok so far. But what I am confused about is this: in both random effects and OLS we assume that u(i) is not correlated with x(i) and thus as long as we cluster both OLS and random effects to take into account the clustered nature of panel data why should be care weather theta is 0 or different from zero. No correlation with x(i) will insure that coefficient on x(i) is not biased and clustering will take care of heteroskedasticity and autocorrelation and it should not matter whether u(i) is in the composite error (assumption under random effects) or not (assumption under OLS). In other words, what is random effects doing that pooled OLS with clustered standard errors is not. I ran both random effects and OLS and coefficients for some of my variables are different but for some others are not which adds more to the confusion.

Thank you again for your response.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

18 Nov 2018, 05:01

Hovhannes,
thanks for further clarifications. that allow me to expand a bit on my previous reply.
- clustering standard errors (SEs) in pooled OLS is due to the panel data structure of your dataset. By ignoring it (that is, using default SEs) you do not take panel data structure of your data into account and pretend that observations of your pooled OLS are independent (which is not the case, as we know). In pooled OLS you cluster because observations belonging to the same panel are assumed to be more similar than observations belonging to different panels (and this usually is little to do with heteroskedasticity). Conversely, clustering SEs under -xtreg- is not mandatory and makes sense only if you suspect/have detected heteroskedasticity and/or autocorrelation;
- as far as the similarity of pooled OLS coefficients with the ones of -xtreg, re- is concerned, due to the preconditions of no correlation between residual and the vector of regressors, pooled OLS is consistent when -xtreg,re- is the way to go (as random effect specification assumes that both terms of the composite error should not be correlated with the vector of regressors).

Kind regards,
Carlo
(Stata 19.0)
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#5

18 Nov 2018, 05:23

There is no difference between fixed and random effects regression, with respect to whether you cluster when estimating it, or you dont.

1. If you estimate a fixed effects regression, and you believe that after you have accounted for the fixed effects your remaining error is homoskedastic iid both across and within panels, you do not need to cluster. If you think that some correlation /heteroskedastic patterns in the data remain, you should cluster post fixed effects.

2. If you estimate random effects regression, you are already assuming that the u(i) effect is uncorrelated with the remaining error. However even assuming this, the pattern of correlation has a particular structure (also called the equi-correlated structure in the GEE literature). If you believe that this equi-correlated structure is the correct one, you do not need to cluster. If you believe that after taking into account the equi-correlated structure still other correlation/heteroskedasticity patterns remain, you need to cluster.
Comment
Hovhannes Nahapetyan

Join Date: Jul 2017

Posts: 44
#6

18 Nov 2018, 10:37

Hi Carlo,

Thank you again for your through response. let me ask for for further clarification. When we cluster pooled OLS we are saying that the rationale is the panel data structure of our data (i.e. we cannot consider our observations for the same (i) as i.i.d but this structure arises only due to TIME VARIANT unobservables that are contained in our time variant error term e(it). We have justified moving to pooled OLS by determining that u(i) does not exist i..e. there are no time TIME INVARIANT unobservables that explain part of the variation from (i) to (i). Thus we use xttest0 to see if that's the case.

But where I am confused is this: let's assume we incorrectly assume there is no u(i) and all variation is due to e(it). Clustering should still take care of this because we are still clustering on both u(i) + e(it). If this is the case then there should not be any difference between pooled OLS and random effects because both acknowledge that data are not i.i.d. and by clustering take that into account. So what is the added value of random effects (compared to pooled OLS) other than acknowledging that error term contains both u(i) and e(it)? Even if we incorrectly assume that error term does not contain u(i) clustering is done for the whole error term.

This should also lead to no coefficient differences in estimating the model via pooled OLS and random effects models because in both cases we assume no correlation between u(i) and x(i). But when running models I find my coefficients to be different which seems to indicate that random effects as compared to pooled OLS do more than just acknowledge panel structure of the data. I want to understand what that additional thing is and that probably would be the answer to: what is the different between clustered pooled OLS and clustered random effects models.

Thanks
Comment
Hovhannes Nahapetyan

Join Date: Jul 2017

Posts: 44
#7

18 Nov 2018, 23:50

Ok I think this is the summary of my question unless I am misinterpreting Cameron and Trivedi's Microeconomics (chapter 21): there is absolutely no difference between pooled OLS with clustered standard errors and random effects model- both are consistent and produce correct standard errors and it does not matter whether pooled OLS or random effects models are correct ( and inconsistent if fixed effect model is correct but this was an obvious one)

Below is the relevant part from the book:

''In summary, pooled OLS is appropriate if the constant-coefficient or random effects models are appropriate, but panel-corrected standard errors and t-statistics must be used for statistical inference. Pooled OLS is inconsistent if the fixed effects model is appropriate "
1 like
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

19 Nov 2018, 06:38

Standard panel data model:

Yit = b*Xit + Ui + Eit.

1. If Ui is correlated with Eit, and Eit is iid both in the i and the t, you estimate a fixed effects regression, no clustering is necessary.

2. If Ui is uncorrelated with Eit, and Eit is iid both in the i and the t, you estimate a random effects regression, and no clustering is needed.

3. Ui is correlated with Eit, and Eit is not iid, then you estimate fixed effects regression and you cluster your standard errors.

4. If Ui is uncorrelated with Eit, but Eit is not iid, then you estimate random effects regression and you cluster your standard errors.

Your last statement above is incorrect. Random effects panel regression is consistent and the standard errors are correct if and only if 2. is the correct model. If

5. If Ui is uncorrelated with Eit, and Eit is not iid, then you have to either:
a) Do OLS with panel level clustering, or
b) Do random effect estimation with panel level clustering.
1 like
Comment
Hovhannes Nahapetyan

Join Date: Jul 2017

Posts: 44
#9

20 Nov 2018, 22:48

Hi Joro,

Thank you for your reply. I think the whole reason one would move from random to fixed effects is because there is correlation between Ui and Xit and thus Xit estimated via random effects or OLS would be biased Fixed effects would subtract out Ui and thus remove bias due to time invariant unobservables. Additionally, you'd want to cluster standard errors in fixed effects regression if you suspect that Eit might be correlated. Correlation between Eit and Ui is a minor issue in this judgment, the key is whether Ui is correlated with Xit.

Since random effects assumes no correlation between Ui and Xit ( and yes it also assumed no correlation between Ui and Xit) then estimates are unbiased and one would cluster to address inter panel correlation i.e. if one suspects there is correlation between Eit, since there is a correlation in the composite error due to Ui.

Now the question was Pooled OLS also assumes no correlation between Ui and Xit. Additionally, it treats all observations as independent which is not the case since the nature of the data is panel. However, that's being acknowledged as soon as one clusters at entity level ( of course in first two cases you want to address possible heteroskedasticity as well) to make it robust to both autocorrelation and heteroskedasticiy. So seems like once that's done there should not be any difference between pooled OLS and random effects in terms of consistency. Random effects would be more efficient than pooled OLS but in a lot of cases that efficiency is not a lot. I don't think that correlation between Eit and Ui (although an assumption under random effects) features prominently in choosing between random and fixed effects or between random and OLS.
1 like
Comment
Hollis Ding

Join Date: Apr 2018

Posts: 3
#10

09 May 2020, 16:55

Hi Hovhannes,

I also confused if one can cluster the error to take account into the serial correlation and heteroskedasticity, why we need to consider Fe or Re model? I have read the RFS paper by Peterson (2009), and it seems most paper just use the OLS cluster error estimator in an empirical panel study.
Comment
Sally Ahmed

Join Date: Dec 2020

Posts: 59
#11

02 Sep 2021, 08:39

Hello I am trying to run a clustered pooled regression by year and industry but I have an error in the command I don't know what is the right command:

Code:

reg AQ ESGCombinedScore SIZE_W GROWTH_W RETURNONASSETS_W MTB_W RDINT_W LEV_W CYCLE_W WGI if year>2001 & year<2019, cluster (year IndustryX)
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#12

02 Sep 2021, 11:30

-regress- cannot do two way clustering.

Check the user written reghdfe by Correia.

Originally posted by Sally Ahmed View Post

Hello I am trying to run a clustered pooled regression by year and industry but I have an error in the command I don't know what is the right command:

Code:

reg AQ ESGCombinedScore SIZE_W GROWTH_W RETURNONASSETS_W MTB_W RDINT_W LEV_W CYCLE_W WGI if year>2001 & year<2019, cluster (year IndustryX)
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#13

02 Sep 2021, 11:49

I totally bombed this thread in #8, what I had written in #8 is all wrong. Seems like I have been carrying forward a typo, but there are no excuses for not noticing this nonsense.

What I meant is:

Standard panel data model:

Yit = b*Xit + Ui + Eit.

1. If Ui is correlated with Xit, and Eit is iid both in the i and the t, you estimate a fixed effects regression, no clustering is necessary.

2. If Ui is uncorrelated with Xit, and Eit is iid both in the i and the t, you estimate a random effects regression, and no clustering is needed.

3. If Ui is correlated with Xit, and Eit is not iid, then you estimate fixed effects regression and you cluster your standard errors.

4. If Ui is uncorrelated with Xit, but Eit is not iid, then you estimate random effects regression and you cluster your standard errors.

Your last statement above is incorrect. Random effects panel regression is consistent and the standard errors are correct if and only if 2. is the correct model. If

5. If Ui is uncorrelated with Xit, and Eit is not iid, then you have to either:
a) Do OLS with panel level clustering, or
b) Do random effect estimation with panel level clustering.

Everything that is written in #9 is correct.

Of course the correlation between Ui and Eit is irrelevant.
Comment
Sally Ahmed

Join Date: Dec 2020

Posts: 59
#14

02 Sep 2021, 19:26

Ok I have used this code and it now works thank you for letting me know more about reghdfe

Code:

reghdfe AQ ESGCombinedScore SIZE_W GROWTH_W RETURNONASSETS_W MTB_W RDINT_W LEV_W CYCLE_W WGI if year>2001 & year<2019, noabsorb vce (cluster year IndustryX)
Comment

Announcement

Pooled OLS with clustered standard errors or Random Effects (Panel Data)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment