Problems while running Hausman test: dummy variables omitted because of collinearity in fixed effects

ThanhThao Nguyen

Join Date: Mar 2023
Posts: 5

Problems while running Hausman test: dummy variables omitted because of collinearity in fixed effects

31 Mar 2023, 09:52

Hi Stata experts, this is my first time posting so please excuse me if I make any mistakes. I have an issue running the Hausman test and couldn't figure out a solution, so I hope someone is kind enough to help me. I'm writing my master's thesis, and one of the regressions I will run is the impact of board characteristics such as CEO-Chair being the same person (CEODUAL), board tier structure (TIER), board-level employee representation (BLER), influence firms' ESG Scores (ESGSCORE). I have attached a sample of my data at the end. I created one-year lags for my independent variables (hence, the suffix l_). I have panel data observing the same companies over the years, so from what I learned from university, I should choose between a fixed effect (FE) or random effect (RE) model to run the regression. To figure out whether FE or RE is preferred, I tried running the Hausman test by these commands:

Code:

xtreg ESGSCORE l_CEODUAL l_TIER l_BLER l_SUSCOMMITTEE l_SIZE l_ROA l_BGENDIV l_BIND l_BSIZE, fe
estimates store fe
xtreg ESGSCORE l_CEODUAL l_TIER l_BLER l_SUSCOMMITTEE l_SIZE l_ROA l_BGENDIV l_BIND l_BSIZE, re
estimates store re
hausman fe re

There were 2 issues occurred that I need help with:
1) When I run this code for FE

Code:

xtreg ESGSCORE l_CEODUAL l_TIER l_BLER l_SUSCOMMITTEE l_SIZE l_ROA l_BGENDIV l_BIND l_BSIZE, fe

, it says that my 2 dependent variables of interests, l_TIER and l_BLER, are omitted because of collinearity. After some research online, I suspected this happened because the values of these two variables (both dummy) for a particular firm stay the same across all years i.e. if a firm has a one-tier board structure, it will remain so for the entire sample period. I've read that a solution would be dropping variables that have this issue. However, these two variables are the core of my paper, so I can't just drop them. Therefore, my question to you Stata experts is: Is there a way I can solve this issue without having to drop my variables?

2) When I run this code for the Hausman test

Code:

hausman fe re

, an error appeared saying that "hausman cannot be used with vce(robust), vce(cluster cvar), or p-weighted data". However, I didn't include any of such options in the code. Could someone kindly explain to me why this error happens and what I can do to run this Hausman test?

Thank you so so much for your help!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float ESGSCORE byte(l_CEODUAL l_TIER l_BLER l_SUSCOMMITTEE) float(l_SIZE l_ROA l_BGENDIV l_BIND) byte l_BSIZE
66.03 . . . .         .      .     .     .  .
61.45 0 0 0 1 15.529489  16.51  37.5    60  8
71.93 0 0 0 1 15.719557   27.3 33.33 66.67  9
63.16 0 0 0 1 15.882247  20.52    25 66.67  8
 73.7 0 0 0 1 15.981244  15.38 33.33 66.67  9
78.65 0 0 0 1 15.963428   2.91    40    70 10
64.51 . . . .         .      .     .     .  .
 48.5 0 0 0 1 16.119694   3.22 17.39    88 23
66.13 0 0 0 1 16.082262   3.75 33.33 92.59 12
80.93 0 0 0 1 16.124971   4.23 33.33    75 12
81.46 0 0 0 1  16.16192   4.48 33.33 76.92 12
74.36 0 0 0 1 16.297161   3.79 41.67 68.42 12
25.61 . . . .         .      .     .     .  .
54.63 0 1 0 0 14.861144   7.02     0   100  4
58.38 0 1 0 0 14.878737   7.59    25   100  4
61.25 0 1 0 0 14.957438   8.35    25   100  4
64.87 0 1 0 0 15.054146    7.4    25   100  4
68.63 0 1 0 0  14.98923   3.92    40   100  5
87.22 . . . .         .      .     .     .  .
62.52 1 1 0 1 17.503046   5.04    25 92.31 12
92.83 1 1 0 1 17.527122   5.96 18.18 92.31 11
93.76 1 1 0 1 17.568613   5.59 18.18   100 11
79.68 1 1 0 1  17.59404   3.43 18.18   100 11
77.32 1 1 0 1 17.388329  12.54 18.18   100 11
 65.1 . . . .         .      .     .     .  .
71.28 0 1 0 1 19.792303    .72 42.86   100  7
 54.8 0 1 0 1  19.78884    .72 42.86   100  7
 40.3 0 1 0 1  19.75818    .99 28.57   100  7
53.02 0 1 0 1 19.741007    .92 28.57   100  7
61.18 0 1 0 1 19.794264     .4  37.5   100  8
77.48 . . . .         .      .     .     .  .
73.64 0 0 0 1 19.036228    .23    25 71.43 12
79.51 0 0 0 0  19.07942    .41    25 63.64 16
79.64 0 0 0 0 16.337713    .81 21.43 72.22 14
84.42 0 0 0 0 16.244553   2.32 41.67 68.75 12
84.38 0 0 0 0 16.152206   7.75 45.45 69.23 11
   83 . . . .         .      .     .     .  .
84.23 1 0 0 1 16.613424   6.55 27.27 54.55 11
90.07 1 0 0 1  16.60925   2.92 27.27 58.33 11
88.17 1 0 0 1  16.46524   3.79 27.27 69.23 11
88.24 1 0 0 1  16.61816   3.56 27.27 66.67 11
52.76 1 0 0 1 16.669603   3.42 27.27 63.64 11
75.08 . . . .         .      .     .     .  .
79.77 1 0 1 1 16.269184   3.27 43.75 47.37 16
78.07 1 0 1 1  16.29641   4.34 43.75    50 16
81.52 1 0 1 1 16.358528  17.86    50 52.94 12
   83 1 0 1 1 16.431885   3.87 54.55 41.67 11
82.83 1 0 1 1 16.156258 -15.89    50    50 12
38.23 . . . .         .      .     .     .  .
40.75 0 0 0 0 16.360321   2.15    30 72.73 10
37.97 0 0 0 0  16.40776   2.61    30    60 10
43.22 0 0 0 0 16.458643   2.34    30 63.64 10
37.15 0 0 0 0 16.536184   2.92    30    50 10
42.08 0 0 0 0 16.593603   1.69    30    50 10
61.73 . . . .         .      .     .     .  .
64.32 1 0 0 0 17.251486   2.98 17.65 27.27 17
65.14 1 0 0 0 17.211285   3.45 16.67 27.78 18
54.18 1 0 0 0 17.305927   3.58 16.67 27.78 18
72.37 1 0 0 0 17.412422   3.62 17.65 27.78 17
44.96 1 0 0 0  17.38349   2.05 18.75 29.41 16
70.07 . . . .         .      .     .     .  .
67.68 0 1 0 1 16.196785   7.89 22.22 88.89  9
62.15 0 1 0 1  16.24426   8.37  37.5   100  8
 69.9 0 1 0 1 16.192635   5.06  37.5   100  8
73.62 0 1 0 1 16.240618   7.67  37.5   100  8
72.39 0 1 0 1  16.15655   -.64 42.86   100  7
77.54 . . . .         .      .     .     .  .
67.83 0 1 1 1  16.48579   7.86 27.78 88.89 18
80.69 0 1 1 1 16.446823   8.04 27.78 88.89 18
87.09 0 1 1 1 16.520958  12.01 27.78 89.47 18
64.19 0 1 1 1 16.790377  12.13 31.25   100 16
49.65 0 1 1 1 16.802202   2.81 31.25 94.12 16
57.02 . . . .         .      .     .     .  .
61.31 0 0 0 1 15.130846   6.57    40 69.23 10
59.43 0 0 0 1  15.25563   8.65  37.5    80  8
   57 0 0 0 1 15.408575   8.85    40 81.82 10
58.45 0 0 0 1 15.522746   8.52    40    80 10
 61.9 0 0 0 1 15.566275   9.59 41.67    75 12
 66.4 . . . .         .      .     .     .  .
77.94 0 1 0 0  19.84292    .19    25   100  8
60.23 0 1 0 0  19.74778     .7 22.22   100  9
 78.4 0 1 0 0 19.735476    .32 28.57   100  7
82.65 0 1 0 1  19.85599    .49 28.57   100  7
86.11 0 1 0 1 19.869614    .13 42.86   100  7
71.98 . . . .         .      .     .     .  .
 67.4 1 0 1 1  16.17561   4.87 40.91  12.5 22
75.74 1 0 1 1  16.47402   5.39    45  8.33 20
70.74 1 0 1 1 16.592278   4.92 47.06  9.09 17
 61.1 1 0 1 1 16.634268   4.44 44.44   8.7 18
72.27 1 0 1 1 16.734577  -5.23 47.06 10.53 17
 37.4 . . . .         .      .     .     .  .
44.47 0 0 0 0 18.454687    .08 23.08 66.67 13
54.28 0 0 0 0 18.444553    .67 30.77 71.43 13
65.09 0 0 0 0 18.429058    .86 35.71 71.43 14
54.33 0 0 0 0 18.502874   1.01 33.33 66.67 15
 68.3 1 0 0 1  18.52115   1.13 33.33  62.5 15
34.46 . . . .         .      .     .     .  .
57.05 0 0 0 1 18.345892   1.57    25    75 12
69.26 0 0 0 1 18.285158   1.26    25    75 12
62.66 0 0 0 1  18.30228   1.29 27.27 81.82 11
end

Tags: panel data

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

01 Apr 2023, 09:09

Thanh Tao:
welcome to this forum.
1) the -fe- estimator wipes out all the time-invariant variables. Going -re- may be a solution, if the -re- estimator is the way to go. Conversely, if the -fe- is way to go, the -re- estimator is inconsistent (and you're coefficients unreliabe);
2) as you noted, -hausman- does not allow non-defaul standard errors, whereas the community-contributed module -xtoverid- does.
Tha said, please also note that:
a) the null of -xtoverid- is that -re- is the way to go (therefore, -re- only should be tested). If the null is rejected (and no message points you out to pooled OLS, due to the lack of a panel-wise effect), you should stick with -fe-;
b) being a bit old-fashioned, -xtoverid- does not allow -fvvarlist- notation; see -xi- for the usual fix.

Kind regards,
Carlo
(Stata 19.0)
Comment
ThanhThao Nguyen

Join Date: Mar 2023

Posts: 5
#3

05 Apr 2023, 02:20

Thank you so much Carlo Lazzaro for your response. I have some further questions I hope you could help me with.
1) Regarding your answer to 1), assuming that -fe- is the way to go for my case, is there any way/fix that so that I could use -fe- estimator without having to drop these time-invariant variables?
2) Could you kindly explain what you meant by "-xtoverid- does not allow -fvvarlist- notation" please? Is it actually a problem in my case? If yes, which code should I use to combine -xtoverid- and -xi- in order to run the Hausman test? I have read about them separately but I do not know how to run them. I am very new to Stata and econometrics so I apologize in advance if my questions are trivial. Thank you so much for your help!

Best,

Thao
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

05 Apr 2023, 02:40

Thanh:
1) you may want to consider the Mundlak correction (see https://blog.stata.com/2015/10/29/fi...ak-approach/):
2) the community-contributed module -xtoverid- was created long before the -fvvarlist- command was include among the official Stata commands.
If you do not have categorical variable among your predictors, you can safely ignore the issue. Otherwise, you may want to consider the following toy-example (to be tweaked according to your research need), keeping in mind that -xtoverid- needs the -re- specification only to be tested:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xi: xtreg ln_wage i.race, re vce(cluster idcode)
i.race            _Irace_1-3          (naturally coded; _Irace_1 omitted)

Random-effects GLS regression                   Number of obs     =     28,534
Group variable: idcode                          Number of groups  =      4,711

R-squared:                                      Obs per group:
     Within  = 0.0000                                         min =          1
     Between = 0.0198                                         avg =        6.1
     Overall = 0.0186                                         max =         15

                                                Wald chi2(2)      =     102.23
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                             (Std. err. adjusted for 4,711 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    _Irace_2 |  -.1300382   .0131411    -9.90   0.000    -.1557943   -.1042821
    _Irace_3 |   .1011474   .0665033     1.52   0.128    -.0291967    .2314915
       _cons |   1.691756   .0069814   242.32   0.000     1.678073    1.705439
-------------+----------------------------------------------------------------
     sigma_u |  .38195681
     sigma_e |  .32028665
         rho |  .58714668   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(idcode)
Sargan-Hansen statistic 102.231  Chi-sq(2)    P-value = 0.0000

.

In this case (setting aside the very likely model misspecification), the -xtoverid- outcome woud point out to the -fe- specification.

Kind regards,
Carlo
(Stata 19.0)

Comment

ThanhThao Nguyen

Join Date: Mar 2023

Posts: 5
#5

10 Apr 2023, 00:44

Thank you so much Carlo Lazzaro for your help!
Comment

Announcement

Problems while running Hausman test: dummy variables omitted because of collinearity in fixed effects

Comment

Comment

Comment

Comment