OLS -Testing for heteroskedasticity and Wooldridge Serial Correlation Test

Martin Johanneson

Join Date: May 2025

Posts: 19
#1

OLS -Testing for heteroskedasticity and Wooldridge Serial Correlation Test

22 May 2025, 02:25

Hello Statalist forum

I have some questions regarding OLS regression (Panel data) with fixed variables

In an OLS regression with industry, country, and year fixed effects, how should one execute a heteroskedasticity test most efficiently?

In Stata, I ran the OLS regression

2) Then, which command do I have to use, is it xttest0 or the hettest command?

2.1) Additionally, should I also execute the White test as well?

3) After discovering that heteroskedasticity is evident, which vce should one use? The vce rob or vce (cluster firmid) regarding investigating the relationship between X and Y across firms across (over the) years, and not within firms?

4) Does OLS regression with fixed effect variables like reg Y X + control variables + i-industry, i.country i.year control , vce (either rob or cluster firm id) control for time-invariant factors? If so, how and why?

5) I am considering executing the Wooldridge Serial Correlation Test in my OLS? Is it an appropriate tool to use for OLS regression, or is it uncommon?

Thank you in advance. Your inputs will be very helpful

Best
Martin
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#2

22 May 2025, 02:40

Martin:
1) the community-contributed modules -xttest2- and -xttest3- are probably what you're looking for.
2) if you detect heteroskedasticity and/or autocorrelation of the epsilon, go -robust- or -vce(cluster panelid)-. Unlike -regress-, they do the very same job here. If you have at least 30 panels, go -robust- or -vce(cluster panelid)- by default.
3) if you have panel data, -xtreg,fe- gives you back more information than OLS.
2) -xttest0- test can be used after -xtreg,re- only.

Kind regards,
Carlo
(Stata 19.0)
Comment
Martin Johanneson

Join Date: May 2025

Posts: 19
#3

22 May 2025, 03:04

Hi Carlo, thank you for your response and input . Appreciate it,

1) Thank you. So xttest2 is for heteroskedasticity, and xttest3 is for Wald?
1.1) Isn't the White test relevant in my case?

2) I was recommended by my supervisor to do the OLS with fe variables, and not fe. It was since for our thesis, we want to investigate the relationship between X and Y across the sample/firms, not within, as the fe-command adjusts for (doesn't it?). I have a panel dataset of 86 firms across 9 years.

2) xttest0 is for re, aha. I see, thank you!

3) Actually, when I did OLS with vce rob , the regression became significant, but when I used the vce (cluster firmid), it was not significant. In my case, I should use the vce (cluster fimid) right? I noticed the standard error increased when I applied the vce (cluster firmid) compared to vce rob-. Doesn't that mean that vce cluster firm id is best suited?

4) Have you by any chance heard about the Wooldridge Serial Correlation Test? I am not sure if it is an appropriate tool to use for OLS regression.

5- Addional question?:
I noticed in this forum that the word "controlling" is uncommon to use. In a couple of master thesis, I have read they use the word "controlling" when they are saying that they "controlling for time-invariant factors" or "controlling for spurious relationship," for example. Is the appropriate to use that word in that case, or is the word "adjusting/adjust" more appropriate when referring to controlling variables and mitigating effects as such?

Thank you again in advance

Best,
Martin
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17704

22 May 2025, 06:03

Martin:
1) correct. No need for White.
2) I am not that clear with your statements here. Maybe your supervisor want something along the following lines:

Code:

. use "https://www.stata-press.com/data/r19/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. regress ln_wage c.age##c.age i.year i.idcode if idcode<=3

      Source |       SS           df       MS      Number of obs   =        39
-------------+----------------------------------   F(18, 20)       =      4.86
       Model |  4.21278813        18  .234043785   Prob > F        =    0.0005
    Residual |  .962950828        20  .048147541   R-squared       =    0.8139
-------------+----------------------------------   Adj R-squared   =    0.6465
       Total |  5.17573896        38  .136203657   Root MSE        =    .21943

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0773019   .2865219     0.27   0.790    -.5203723    .6749761
             |
 c.age#c.age |  -.0045583   .0012212    -3.73   0.001    -.0071057    -.002011
             |
        year |
         69  |   .3367906   .4335876     0.78   0.446    -.5676572    1.241238
         70  |   .2089384   .6771373     0.31   0.761    -1.203545    1.621422
         71  |   .3144116   .9610926     0.33   0.747    -1.690392    2.319216
         72  |   .5888124   1.253657     0.47   0.644     -2.02627    3.203894
         73  |   .8912873   1.550825     0.57   0.572    -2.343676    4.126251
         75  |   1.246958   2.152898     0.58   0.569    -3.243908    5.737823
         77  |   1.560689   2.761762     0.57   0.578    -4.200247    7.321624
         78  |   1.941522   3.068213     0.63   0.534    -4.458659    8.341703
         80  |    2.34498   3.684737     0.64   0.532    -5.341247    10.03121
         82  |   2.698954   4.315145     0.63   0.539     -6.30228    11.70019
         83  |   2.994437   4.618087     0.65   0.524    -6.638723     12.6276
         85  |   3.538578   5.245889     0.67   0.508    -7.404154    14.48131
         87  |   3.965153   5.878139     0.67   0.508    -8.296429    16.22674
         88  |    4.40786   6.407149     0.69   0.499    -8.957218    17.77294
             |
      idcode |
          2  |  -.4183815   .0918256    -4.56   0.000    -.6099263   -.2268366
          3  |   .6579353   1.834332     0.36   0.724    -3.168414    4.484284
             |
       _cons |   1.341224   4.651269     0.29   0.776    -8.361153     11.0436
------------------------------------------------------------------------------

. xtreg ln_wage c.age##c.age i.year if idcode<=3, fe

Fixed-effects (within) regression               Number of obs     =         39
Group variable: idcode                          Number of groups  =          3

R-squared:                                      Obs per group:
     Within  = 0.7404                                         min =         12
     Between = 0.4068                                         avg =       13.0
     Overall = 0.4014                                         max =         15

                                                F(16, 20)         =       3.57
corr(u_i, Xb) = -0.8560                         Prob > F          =     0.0042

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0773019   .2865219     0.27   0.790    -.5203723    .6749761
             |
 c.age#c.age |  -.0045583   .0012212    -3.73   0.001    -.0071057    -.002011
             |
        year |
         69  |   .3367906   .4335876     0.78   0.446    -.5676572    1.241238
         70  |   .2089384   .6771373     0.31   0.761    -1.203545    1.621422
         71  |   .3144116   .9610926     0.33   0.747    -1.690392    2.319216
         72  |   .5888124   1.253657     0.47   0.644     -2.02627    3.203894
         73  |   .8912873   1.550825     0.57   0.572    -2.343676    4.126251
         75  |   1.246958   2.152898     0.58   0.569    -3.243908    5.737823
         77  |   1.560689   2.761762     0.57   0.578    -4.200247    7.321624
         78  |   1.941522   3.068213     0.63   0.534    -4.458659    8.341703
         80  |    2.34498   3.684737     0.64   0.532    -5.341247    10.03121
         82  |   2.698954   4.315145     0.63   0.539     -6.30228    11.70019
         83  |   2.994437   4.618087     0.65   0.524    -6.638723     12.6276
         85  |   3.538578   5.245889     0.67   0.508    -7.404154    14.48131
         87  |   3.965153   5.878139     0.67   0.508    -8.296429    16.22674
         88  |    4.40786   6.407149     0.69   0.499    -8.957218    17.77294
             |
       _cons |   1.465543   5.342682     0.27   0.787    -9.679096    12.61018
-------------+----------------------------------------------------------------
     sigma_u |  .54258328
     sigma_e |  .21942548
         rho |  .85944136   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(2, 20) = 10.43                      Prob > F = 0.0008

.

3) under -regress-, -robust- takes heteroskedasticity only into account.
4) Sure. Just code -search xtserial-. It works with panel dataset, though. That said, with 86 panels, -vce(cluster panelid)- is mandatory in both -regress- and -xtreg,fe-.
5) You adjust, not control for.

Kind regards,
Carlo
(Stata 19.0)

Comment

Martin Johanneson

Join Date: May 2025

Posts: 19
#5

22 May 2025, 07:03

Hi Carlo, thank you again for the response

Yes, that's right. She wanted us to apply the OLS method with industry-fixed, country-fixed fixed and year-fixed variables in the regression.

Question 1: Is that called OLS, or is it actually a fixed effect model I have applied when incorporating industry, country and year as fixed variables?

In that regard. I used vce rob and vce (cluster firmid) to compare the OLS regressions

With "vce rob", the regression became significant, but with the vce cluster, it did not become significant

Question nr 2:
1) I assume vce cluster is much more "stricter", but given my case, which of the vce should I use , the vce rob or vce (cluster firmid)?

I am leaning more towards vce rob then vce (cluster firmid), but I am not sure. Like, why do one cluster at the firm level?

Question 3:
I have a panel dataset with 86 firms across 9 years. I only have a handful of industries (8) and 4 countries. I would normally cluster in the country, but given that I have 4 countries, it is not optimal, correct?

I hope this clarifies a little more, and thank you so much in advance

Best,
Martin
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#6

22 May 2025, 23:28

Martin:
1) It's actually a different way to call the -fe- estimator. You should use -vce(cluster panelid)- for your SEs.
2) Your query id ill-posed. You should go -vce(cluster panelid)- for your SEs because your panel-specific observations are not independent, as they actually belong to the same panel. With 86 panels you're on the safe side to cluster your SEs at firm level. Forget the -robust- option, as, under -regress-, it takes heteroskedastcity only into account. That said, even if you had autocorrelation + heteroskedastcity, -vce(cluster panelid)- is the way to go.
3) Correct. You sgould cluster at firm level. In addition, if you go -fe- and both -industry- and -country- are time-invariant predictors (as they usually are), the -fe- estimator will wipe them out, without returning any coefficient.

Kind regards,
Carlo
(Stata 19.0)
Comment
Martin Johanneson

Join Date: May 2025

Posts: 19
#7

23 May 2025, 03:06

Perfect, thank you very much for the input, Carlo!

An additional question to 1) and 3)

1) How should one address the method? Is it called "OLS with fixed effect variables, controlling for firm time-invariant factors" ?

1.1) Is it correct terminology to say "controlling"? Or is "adjusting" more appropriate in the field of economics/econometrics? I have noticed previous studies using "controlling" and "adjusting"

Although you cluster at the firm level:
3.1) Does heteroskedasticity and autocorrelation reduce?
3.2) If yes, it does not eliminate heteroskedasticity and autocorrelation completely, correct?

Thank you in advance

Best
Martin
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#8

23 May 2025, 04:51

Martin:
1) fixed effect panel regression;
1.1) adjusting
3.1&3.2) The do not reduce. Simply, the SEs account for them.

Kind regards,
Carlo
(Stata 19.0)
Comment
Martin Johanneson

Join Date: May 2025

Posts: 19
#9

23 May 2025, 06:24

Perfect, thank you so much!

Regarding 1): I assumed firstly that OLS (panel) with fixed effect variables like this: reg Y X + control variables, i.Industry i.Year i.Country that this is called Pooled OLS with dummies(?)
Is that true, or have I simply been mistaken?

1.2) If the Hausman test moves in favor of FE, does that mean that we can call the: "reg Y X + control variables, i.Industry i.Year i.Country vce (cluster firmid) " a "Fixed Effect panel regression"?

Looking forward to hearing from you

Best,
Martin

Last edited by Martin Johanneson; 23 May 2025, 06:31.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#10

23 May 2025, 07:47

Martin:
1) I) would call it pooled OLS with fixed effect;
2) Yes.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2148
#11

23 May 2025, 08:38

The terminology can be confusing. If you have firm level data, when you use the phrase "fixed effects" it is commonly assumed that you included fixed effects at the firm level. Having said that, one often sees something like "I included industry, country, and year fixed effects." In my view, this language is cheating a bit by trying to piggyback on the popularity of FE at the unit level, but in the end it's just terminology. Plus, I'm in the minority.

So, if you say you used pooled OLS and included industry, country, and time fixed effects, everyone will know what you mean.

Now about standard errors: with panel data, unless your N (# cross-sectional units) is "small," you should at least cluster at the unit level -- firm, in your case. But there are situations where you should cluster at a higher level, such as industry. If a policy, such as a regulation, is applied at the industry level, then you should probably cluster at the industry level -- assuming you have enough industries for the asymptotic approximations to work well enough. If the policy is at the country level, ideally you'd cluster at that level -- but you might not have enough countries to justify that. If the variation in the x variables is at the firm level, I would use:

Code:

reg y x1 ... xK i.industry i.country i.year, vce(cluster firmid)

Standard tests for heteroskedasticity will assume no serial correlation, and so you get into a circularity problem. And you likely have serial correlation, and clustering takes care of both serial correlation and heteroskedasticity. So I would just cluster at the firmid level and not test for those problems unless the standard errors are too large to be useful.
1 like
Comment
Martin Johanneson

Join Date: May 2025

Posts: 19
#12

23 May 2025, 08:45

Thank you once again, Carlo

I am sorry if I asking too much of the terminology here:

The command in use: reg Y X + control variables, i.Industry i.Year i.Country vce (cluster firmid)

1) Based on the command above. Do I then apply the Pooled Ordinary Least Squares (OLS) regression with fixed effect dummies for industry, year, and country? Or do I apply the Fixed Effect panel regression with the same command mentioned earlier? =
reg Y X + control variables, i.Industry i.Year i.Country vce (cluster firmid)

1.1) Or are they all the same, even though I am not using the xtreg, fe command?

Pooled OLS regression with fixed effect dummies for industry, year, and country = Fixed Effect panel regression = Fixed effect model?

Thank you so much for your time and input in advance

Best,
Martin

Last edited by Martin Johanneson; 23 May 2025, 08:49.
Comment
Martin Johanneson

Join Date: May 2025

Posts: 19
#13

23 May 2025, 08:59

Hi Jeff! Thank you for your valuable insights

Yes, I got very confused on the terminology, as they seem like they have the same mechanisms in a way.

My N= 86*9 years

Question 1) So, do you think I should address the method as Pooled Ordinary Least Squares (OLS) regression with fixed effect dummies for industry, year, and country?

1.1) I have only 8 industries and 4 countries, so in my case, I should cluster at firmid, since I have a limited number of countries and industries, correct?

Question 2)
Based on what you said regarding clustering at the industry or clustering at the country due to changes in regulations and such, is this something I can address in my thesis? That we cluster at firm id, but I ideally would have wanted to cluster at country or industry (adding reasons for why). But due to lthe imited number of industries and countries, clustering at firm-id deems appropriate in my case as I expect auto-serial correlation?

2.1) Or would this aspect perhaps weaken my thesis by saying alternative clustering options?

Thank you in advance

Best,
Martin
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2148
#14

23 May 2025, 10:35

Actually, your N = 89 and T = 9. This N is typically large enough to justify clustering at the firmid level. Because it's allowing for serial correlation across time, you don't really get credit for how large T is. In fact, the larger is T, the harder it is to justify clustering for serial correlation,
Comment
Martin Johanneson

Join Date: May 2025

Posts: 19
#15

23 May 2025, 11:53

Ahah, I see. Thank you for the clarification

I am sorry to repeat myself:

Regarding the questions I mentioned above:
Q1: Do you think I should address the method as Pooled Ordinary Least Squares (OLS) regression with fixed effect dummies for industry, year, and country?

Q2:
Based on what you said regarding clustering at the industry or clustering at the country due to changes in regulations and such, is this something I can address in my thesis? That we cluster at firm id, but I ideally would have wanted to cluster at country or industry (adding reasons for why). But due to lthe imited number of industries and countries, clustering at firm-id deems appropriate in my case as I expect auto-serial correlation?

2.1) Or would this aspect perhaps weaken my thesis by saying alternative clustering options?

Thank you in advance

Best,
Martin
Comment

Announcement