Panel data - all dummies omitted because of collinearity

Tom Phillipson

Join Date: Apr 2019

Posts: 4
#1

Panel data - all dummies omitted because of collinearity

15 Apr 2019, 07:15

Hi everybody,

I'm using panel data to examine the effect of immigration on house prices and I am trying to use a fixed effects model. I am including year and local area dummies, however Stata is omitting all of my local area dummies because of collinearity. I am using the command:

xtreg fdlnHP immipop fdlagunemployment fdlagcrime fdlagbenefits fdlagdwpop i.Year i.LA2, fe robust

where fdlnHP is my house price variabel and immipop is my immigration variable.

Is anybody able to please explain why I may be having this problem and offer a solution?

Any help would be hugely appreciated.

Many thanks,

Tom
Tags: None

1 like
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#2

15 Apr 2019, 08:02

Tom:
welcome to this forum.
in all likelihood, all those dummies are collinear with the fixed effect.
The first fix is to test if -fe- specification serves your data really better than the -re- one.
As you imposed cluster-robust standardd errors (probably due to serial correlation and/or heteroskedasticity), -hausman- is out of debate and you have to switch to the community-contributed programme -xtoverid- (type -search xtoverid- from within Stata to spot and install it).
Unfortunately, being a bit old fashioned, -xtoverid- does not support -fvvarlist- notation; the usual fix is to prefix your code with -xi:-:

Code:

xi: xtreg fdlnHP immipop fdlagunemployment fdlagcrime fdlagbenefits fdlagdwpop i.Year i.LA2, re robust xtoverid

If -xtoverid- outcome reaches statistical significance, go-fe-; otherwise, stick with -re- specification.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Tom Phillipson

Join Date: Apr 2019

Posts: 4
#3

15 Apr 2019, 10:14

Thanks very much for your response, Carlo. I have followed your instruction and tried the random effects model using -xtoverid- and my chi2 statsitic is missing (shows as a . )and my between r-squared is 1.00, which I find rather puzzling.

Many thanks,

Tom
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#4

15 Apr 2019, 10:20

Tom:
what's the outcome of -xttest0- run after -xtreg,re-?

Kind regards,
Carlo
(Stata 19.0)
Comment
Tom Phillipson

Join Date: Apr 2019

Posts: 4
#5

16 Apr 2019, 01:41

Hi Carlo,

I've attached a screenshot as I'm not too sure on the interpretation of this test/what it means?

Many thanks.
Attached Files
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#6

16 Apr 2019, 02:52

Tom:
as per FAQ, please do not attach screenshots but use CODE delimiters to share what you typed and what Stata gave you back. Thanks.
That said, the test outcome tells you that there's no evidence of individual effect in your dataset; hence, you should analyze your data via a pooled OLS regression.

Kind regards,
Carlo
(Stata 19.0)
Comment
Tom Phillipson

Join Date: Apr 2019

Posts: 4
#7

17 Apr 2019, 03:15

My apologies, Carlo. Thank you very much for your help.
Comment
Khair Amal

Join Date: Nov 2019

Posts: 4
#8

19 Nov 2019, 05:46

Hi Carlo,

Please since the xtoverid command is unable to handle factor variables (year dummies in my case), I used cluster robust standard errors and hence prefixed my code with xi: as suggested by Carlo in order for me to choose between FE and RE using xtoverid. However, prefixing my code with xi: deletes one of the year dummies which makes me unable to use xtoverid. Moreover, if I run FE or RE with robust option without prefixing my code with xi: no year dummy is omitted. So my question is, please how do I make the xtoverid work when there is an omitted factor variable (year dummy) by prefixing my code with xi.

The background to the above is that, in all my models, the Hausman test chose the FE as the best. However, after testing for group-wise heteroscedasticity, I realized that all my FE estimates suffer from heteroscedasticity, and per Carlo's sugggestion, I cannot use robust standard errors to correct that, but rather use xtoverid to choose between RE and FE again while applying the robust option.

Thanks.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17708

19 Nov 2019, 07:11

Khair:
welcome to this forum.
The omission of one year dummy is intentional and shelters you from the so called dummy trap (https://en.wikipedia.org/wiki/Dummy_...le_(statistics)).
Actually, I was not able to reproduce your problem:

Code:

use "http://www.stata-press.com/data/r15/nlswork.dta"
. xi: xtreg ln_wage age i.year, rob
i.year            _Iyear_68-88        (naturally coded; _Iyear_68 omitted)

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1060                                         min =          1
     between = 0.0918                                         avg =        6.1
     overall = 0.0807                                         max =         15

                                                Wald chi2(15)     =    1244.11
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0137208   .0019471     7.05   0.000     .0099046     .017537
   _Iyear_69 |   .0744312   .0102944     7.23   0.000     .0542545    .0946078
   _Iyear_70 |   .0453659   .0106757     4.25   0.000     .0244419    .0662899
   _Iyear_71 |   .0819949   .0116296     7.05   0.000     .0592013    .1047885
   _Iyear_72 |   .0827461   .0129338     6.40   0.000     .0573963    .1080959
   _Iyear_73 |   .0840751   .0138388     6.08   0.000     .0569516    .1111986
   _Iyear_75 |   .0707387   .0162295     4.36   0.000     .0389295    .1025479
   _Iyear_77 |   .1032639   .0193333     5.34   0.000     .0653713    .1411565
   _Iyear_78 |   .1279039   .0210903     6.06   0.000     .0865676    .1692401
   _Iyear_80 |    .108871   .0247186     4.40   0.000     .0604235    .1573185
   _Iyear_82 |    .098831    .027873     3.55   0.000      .044201    .1534611
   _Iyear_83 |   .1127655   .0301942     3.73   0.000      .053586     .171945
   _Iyear_85 |   .1380611   .0335078     4.12   0.000     .0723871    .2037351
   _Iyear_87 |   .1264818   .0373374     3.39   0.001     .0533019    .1996617
   _Iyear_88 |   .1640382   .0402879     4.07   0.000     .0850755    .2430009
       _cons |   1.162473   .0397287    29.26   0.000     1.084606     1.24034
-------------+----------------------------------------------------------------
     sigma_u |  .36664367
     sigma_e |  .30300411
         rho |  .59418375   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(idcode)
Sargan-Hansen statistic  79.267  Chi-sq(15)   P-value = 0.0000

.

Conversely, if I do not prefix my -xtreg- code with -xi:-, as expected the community-contributed command -xtoverid- throws an error message:

Code:

. xtreg ln_wage age i.year, rob

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1060                                         min =          1
     between = 0.0918                                         avg =        6.1
     overall = 0.0807                                         max =         15

                                                Wald chi2(15)     =    1244.11
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0137208   .0019471     7.05   0.000     .0099046     .017537
             |
        year |
         69  |   .0744312   .0102944     7.23   0.000     .0542545    .0946078
         70  |   .0453659   .0106757     4.25   0.000     .0244419    .0662899
         71  |   .0819949   .0116296     7.05   0.000     .0592013    .1047885
         72  |   .0827461   .0129338     6.40   0.000     .0573963    .1080959
         73  |   .0840751   .0138388     6.08   0.000     .0569516    .1111986
         75  |   .0707387   .0162295     4.36   0.000     .0389295    .1025479
         77  |   .1032639   .0193333     5.34   0.000     .0653713    .1411565
         78  |   .1279039   .0210903     6.06   0.000     .0865676    .1692401
         80  |    .108871   .0247186     4.40   0.000     .0604235    .1573185
         82  |    .098831    .027873     3.55   0.000      .044201    .1534611
         83  |   .1127655   .0301942     3.73   0.000      .053586     .171945
         85  |   .1380611   .0335078     4.12   0.000     .0723871    .2037351
         87  |   .1264818   .0373374     3.39   0.001     .0533019    .1996617
         88  |   .1640382   .0402879     4.07   0.000     .0850755    .2430009
             |
       _cons |   1.162473   .0397287    29.26   0.000     1.084606     1.24034
-------------+----------------------------------------------------------------
     sigma_u |  .36664367
     sigma_e |  .30300411
         rho |  .59418375   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid
68b:  operator invalid
r(198);

.

As usual (and per FAQ), posting what you typed and what Stata gave you back is the best way to help interested listers helping yourself. Thanks.

Kind regards,
Carlo
(Stata 19.0)

Comment

Khair Amal

Join Date: Nov 2019

Posts: 4
#10

19 Nov 2019, 07:52

High Carlo,

Please kindly see the output below:

xi: xtreg LNNeonatalnew l.LNlow_rate_f l.LNDPTnew l.LNMeaslesnew l.LNPopulationnew l.LNSecondary_Grossnew l.LNFertilitynew l.LNGDPnew i.Year, robust

i.Year _IYear_2005-2010 (naturally coded; _IYear_2005 omitted)
note: _IYear_2010 omitted because of collinearity

Random-effects GLS regression Number of obs = 203
Group variable: Id Number of groups = 48

R-sq: Obs per group:
within = 0.7313 min = 1
between = 0.7218 avg = 4.2
overall = 0.7218 max = 5

Wald chi2(11) = 369.26
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

(Std. Err. adjusted for 48 clusters in Id)
--------------------------------------------------------------------------------------
| Robust
LNNeonatalnew | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------------+----------------------------------------------------------------
LNlow_rate_f |
L1. | -.0233584 .0257418 -0.91 0.364 -.0738115 .0270947
|
LNDPTnew |
L1. | -.0330512 .0227593 -1.45 0.146 -.0776586 .0115562
|
LNMeaslesnew |
L1. | -.0528075 .0241425 -2.19 0.029 -.100126 -.0054891
|
LNPopulationnew |
L1. | -.0444169 .0303463 -1.46 0.143 -.1038945 .0150608
|
LNSecondary_Grossnew |
L1. | -.1016777 .0449872 -2.26 0.024 -.189851 -.0135044
|
LNFertilitynew |
L1. | .45537 .1131286 4.03 0.000 .233642 .677098
|
LNGDPnew |
L1. | -.0054544 .0071695 -0.76 0.447 -.0195063 .0085975
|
_IYear_2006 | .0887855 .0412678 2.15 0.031 .007902 .1696689
_IYear_2007 | .1307056 .0302211 4.32 0.000 .0714733 .1899379
_IYear_2008 | .1269416 .0213035 5.96 0.000 .0851875 .1686956
_IYear_2009 | .0707181 .0128396 5.51 0.000 .0455529 .0958832
_IYear_2010 | 0 (omitted)
_cons | 3.222562 .2388929 13.49 0.000 2.75434 3.690783
---------------------+----------------------------------------------------------------
sigma_u | .20081473
sigma_e | .08415948
rho | .850603 (fraction of variance due to u_i)
--------------------------------------------------------------------------------------

.
. xtoverid
o. operator not allowed
r(101);

Thanks
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17708

#11

19 Nov 2019, 08:22

Khairr:
thanks for posting your output (for the future, please paste it within CODE delimiters. Thanks).
You have a secon year dummy omitted due to collinearity.
As far as I know, the only fix is to create separate year dummies and omit one (eg, year 2010) by hand.

Code:

use "http://www.stata-press.com/data/r15/nlswork.dta"
. tab year, gen(year_dum)

  interview |
       year |      Freq.     Percent        Cum.
------------+-----------------------------------
         68 |      1,375        4.82        4.82
         69 |      1,232        4.32        9.14
         70 |      1,686        5.91       15.05
         71 |      1,851        6.49       21.53
         72 |      1,693        5.93       27.47
         73 |      1,981        6.94       34.41
         75 |      2,141        7.50       41.91
         77 |      2,171        7.61       49.52
         78 |      1,964        6.88       56.40
         80 |      1,847        6.47       62.88
         82 |      2,085        7.31       70.18
         83 |      1,987        6.96       77.15
         85 |      2,085        7.31       84.45
         87 |      2,164        7.58       92.04
         88 |      2,272        7.96      100.00
------------+-----------------------------------
      Total |     28,534      100.00

. xi: xtreg ln_wage age year_dum2-year_dum15

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1060                                         min =          1
     between = 0.0918                                         avg =        6.1
     overall = 0.0807                                         max =         15

                                                Wald chi2(15)     =    3253.70
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0137208   .0018898     7.26   0.000     .0100169    .0174247
   year_dum2 |   .0744312    .012506     5.95   0.000     .0499199    .0989425
   year_dum3 |   .0453659   .0120494     3.77   0.000     .0217496    .0689822
   year_dum4 |   .0819949   .0125373     6.54   0.000     .0574222    .1065676
   year_dum5 |   .0827461   .0136074     6.08   0.000      .056076    .1094162
   year_dum6 |   .0840751   .0143598     5.85   0.000     .0559304    .1122198
   year_dum7 |   .0707387   .0167492     4.22   0.000     .0379108    .1035665
   year_dum8 |   .1032639   .0197156     5.24   0.000      .064622    .1419059
   year_dum9 |   .1279039   .0214888     5.95   0.000     .0857866    .1700211
  year_dum10 |    .108871   .0247933     4.39   0.000      .060277     .157465
  year_dum11 |    .098831   .0280824     3.52   0.000     .0437906    .1538714
  year_dum12 |   .1127655   .0298539     3.78   0.000     .0542529    .1712781
  year_dum13 |   .1380611   .0333412     4.14   0.000     .0727135    .2034087
  year_dum14 |   .1264818   .0369222     3.43   0.001     .0541156     .198848
  year_dum15 |   .1640382   .0393563     4.17   0.000     .0869012    .2411752
       _cons |   1.162473     .03784    30.72   0.000     1.088308    1.236638
-------------+----------------------------------------------------------------
     sigma_u |  .36664367
     sigma_e |  .30300411
         rho |  .59418375   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  
Sargan-Hansen statistic  88.037  Chi-sq(15)   P-value = 0.0000

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Khair Amal

Join Date: Nov 2019

Posts: 4
#12

19 Nov 2019, 09:34

Hi Carlo,

Thanks so much.
I tried creating the dummies by hand and yet still, one of the years is omitted due to collinearity even after excluding one of the year dummies.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#13

19 Nov 2019, 10:09

Khair:
then I think you have to live with it and manually omit the dummy that contributes to create collinearity in addition to the reference year dummy.

Kind regards,
Carlo
(Stata 19.0)
Comment
Khair Amal

Join Date: Nov 2019

Posts: 4
#14

20 Nov 2019, 00:53

Ok, thanks.
Comment
dan billy

Join Date: Jan 2020

Posts: 13
#15

27 Jan 2020, 10:15

Hi everybody,

I'm using panel data to examine the effect of eco age on tone and readability of annual reports for 5 years (2013-2017) and I have used fe and re. I am including year as dummies, however Stata is omitting year-dummy 2013 because of collinearity which make the total number of observations low. I am using the command:

local RHS1 "ceoage "
*/

local controls "liquidityratio netincome firmsize PriceToBookValue boardsize firmage"

foreach depvar in tone readability {
forval i = 1/1 {
xi: xtreg `depvar' `RHS`i'' i.year, re robust

Any help would be hugely appreciated.

Many thanks,

dan billy
Comment

Announcement