Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data - all dummies omitted because of collinearity

    Hi everybody,

    I'm using panel data to examine the effect of immigration on house prices and I am trying to use a fixed effects model. I am including year and local area dummies, however Stata is omitting all of my local area dummies because of collinearity. I am using the command:

    xtreg fdlnHP immipop fdlagunemployment fdlagcrime fdlagbenefits fdlagdwpop i.Year i.LA2, fe robust

    where fdlnHP is my house price variabel and immipop is my immigration variable.

    Is anybody able to please explain why I may be having this problem and offer a solution?

    Any help would be hugely appreciated.

    Many thanks,

    Tom

  • #2
    Tom:
    welcome to this forum.
    in all likelihood, all those dummies are collinear with the fixed effect.
    The first fix is to test if -fe- specification serves your data really better than the -re- one.
    As you imposed cluster-robust standardd errors (probably due to serial correlation and/or heteroskedasticity), -hausman- is out of debate and you have to switch to the community-contributed programme -xtoverid- (type -search xtoverid- from within Stata to spot and install it).
    Unfortunately, being a bit old fashioned, -xtoverid- does not support -fvvarlist- notation; the usual fix is to prefix your code with -xi:-:
    Code:
    xi: xtreg fdlnHP immipop fdlagunemployment fdlagcrime fdlagbenefits fdlagdwpop i.Year i.LA2, re robust
    xtoverid
    If -xtoverid- outcome reaches statistical significance, go-fe-; otherwise, stick with -re- specification.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Thanks very much for your response, Carlo. I have followed your instruction and tried the random effects model using -xtoverid- and my chi2 statsitic is missing (shows as a . )and my between r-squared is 1.00, which I find rather puzzling.

      Many thanks,

      Tom

      Comment


      • #4
        Tom:
        what's the outcome of -xttest0- run after -xtreg,re-?
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Hi Carlo,

          I've attached a screenshot as I'm not too sure on the interpretation of this test/what it means?

          Many thanks.
          Attached Files

          Comment


          • #6
            Tom:
            as per FAQ, please do not attach screenshots but use CODE delimiters to share what you typed and what Stata gave you back. Thanks.
            That said, the test outcome tells you that there's no evidence of individual effect in your dataset; hence, you should analyze your data via a pooled OLS regression.
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              My apologies, Carlo. Thank you very much for your help.

              Comment


              • #8
                Hi Carlo,


                Please since the xtoverid command is unable to handle factor variables (year dummies in my case), I used cluster robust standard errors and hence prefixed my code with xi: as suggested by Carlo in order for me to choose between FE and RE using xtoverid. However, prefixing my code with xi: deletes one of the year dummies which makes me unable to use xtoverid. Moreover, if I run FE or RE with robust option without prefixing my code with xi: no year dummy is omitted. So my question is, please how do I make the xtoverid work when there is an omitted factor variable (year dummy) by prefixing my code with xi.

                The background to the above is that, in all my models, the Hausman test chose the FE as the best. However, after testing for group-wise heteroscedasticity, I realized that all my FE estimates suffer from heteroscedasticity, and per Carlo's sugggestion, I cannot use robust standard errors to correct that, but rather use xtoverid to choose between RE and FE again while applying the robust option.

                Thanks.


                Comment


                • #9
                  Khair:
                  welcome to this forum.
                  The omission of one year dummy is intentional and shelters you from the so called dummy trap (https://en.wikipedia.org/wiki/Dummy_...le_(statistics)).
                  Actually, I was not able to reproduce your problem:
                  Code:
                  use "http://www.stata-press.com/data/r15/nlswork.dta"
                  . xi: xtreg ln_wage age i.year, rob
                  i.year            _Iyear_68-88        (naturally coded; _Iyear_68 omitted)
                  
                  Random-effects GLS regression                   Number of obs     =     28,510
                  Group variable: idcode                          Number of groups  =      4,710
                  
                  R-sq:                                           Obs per group:
                       within  = 0.1060                                         min =          1
                       between = 0.0918                                         avg =        6.1
                       overall = 0.0807                                         max =         15
                  
                                                                  Wald chi2(15)     =    1244.11
                  corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                  
                                               (Std. Err. adjusted for 4,710 clusters in idcode)
                  ------------------------------------------------------------------------------
                               |               Robust
                       ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                           age |   .0137208   .0019471     7.05   0.000     .0099046     .017537
                     _Iyear_69 |   .0744312   .0102944     7.23   0.000     .0542545    .0946078
                     _Iyear_70 |   .0453659   .0106757     4.25   0.000     .0244419    .0662899
                     _Iyear_71 |   .0819949   .0116296     7.05   0.000     .0592013    .1047885
                     _Iyear_72 |   .0827461   .0129338     6.40   0.000     .0573963    .1080959
                     _Iyear_73 |   .0840751   .0138388     6.08   0.000     .0569516    .1111986
                     _Iyear_75 |   .0707387   .0162295     4.36   0.000     .0389295    .1025479
                     _Iyear_77 |   .1032639   .0193333     5.34   0.000     .0653713    .1411565
                     _Iyear_78 |   .1279039   .0210903     6.06   0.000     .0865676    .1692401
                     _Iyear_80 |    .108871   .0247186     4.40   0.000     .0604235    .1573185
                     _Iyear_82 |    .098831    .027873     3.55   0.000      .044201    .1534611
                     _Iyear_83 |   .1127655   .0301942     3.73   0.000      .053586     .171945
                     _Iyear_85 |   .1380611   .0335078     4.12   0.000     .0723871    .2037351
                     _Iyear_87 |   .1264818   .0373374     3.39   0.001     .0533019    .1996617
                     _Iyear_88 |   .1640382   .0402879     4.07   0.000     .0850755    .2430009
                         _cons |   1.162473   .0397287    29.26   0.000     1.084606     1.24034
                  -------------+----------------------------------------------------------------
                       sigma_u |  .36664367
                       sigma_e |  .30300411
                           rho |  .59418375   (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------
                  
                  . xtoverid
                  
                  Test of overidentifying restrictions: fixed vs random effects
                  Cross-section time-series model: xtreg re  robust cluster(idcode)
                  Sargan-Hansen statistic  79.267  Chi-sq(15)   P-value = 0.0000
                  
                  .
                  Conversely, if I do not prefix my -xtreg- code with -xi:-, as expected the community-contributed command -xtoverid- throws an error message:
                  Code:
                  . xtreg ln_wage age i.year, rob
                  
                  Random-effects GLS regression                   Number of obs     =     28,510
                  Group variable: idcode                          Number of groups  =      4,710
                  
                  R-sq:                                           Obs per group:
                       within  = 0.1060                                         min =          1
                       between = 0.0918                                         avg =        6.1
                       overall = 0.0807                                         max =         15
                  
                                                                  Wald chi2(15)     =    1244.11
                  corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                  
                                               (Std. Err. adjusted for 4,710 clusters in idcode)
                  ------------------------------------------------------------------------------
                               |               Robust
                       ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                           age |   .0137208   .0019471     7.05   0.000     .0099046     .017537
                               |
                          year |
                           69  |   .0744312   .0102944     7.23   0.000     .0542545    .0946078
                           70  |   .0453659   .0106757     4.25   0.000     .0244419    .0662899
                           71  |   .0819949   .0116296     7.05   0.000     .0592013    .1047885
                           72  |   .0827461   .0129338     6.40   0.000     .0573963    .1080959
                           73  |   .0840751   .0138388     6.08   0.000     .0569516    .1111986
                           75  |   .0707387   .0162295     4.36   0.000     .0389295    .1025479
                           77  |   .1032639   .0193333     5.34   0.000     .0653713    .1411565
                           78  |   .1279039   .0210903     6.06   0.000     .0865676    .1692401
                           80  |    .108871   .0247186     4.40   0.000     .0604235    .1573185
                           82  |    .098831    .027873     3.55   0.000      .044201    .1534611
                           83  |   .1127655   .0301942     3.73   0.000      .053586     .171945
                           85  |   .1380611   .0335078     4.12   0.000     .0723871    .2037351
                           87  |   .1264818   .0373374     3.39   0.001     .0533019    .1996617
                           88  |   .1640382   .0402879     4.07   0.000     .0850755    .2430009
                               |
                         _cons |   1.162473   .0397287    29.26   0.000     1.084606     1.24034
                  -------------+----------------------------------------------------------------
                       sigma_u |  .36664367
                       sigma_e |  .30300411
                           rho |  .59418375   (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------
                  
                  . xtoverid
                  68b:  operator invalid
                  r(198);
                  
                  .
                  As usual (and per FAQ), posting what you typed and what Stata gave you back is the best way to help interested listers helping yourself. Thanks.
                  Kind regards,
                  Carlo
                  (Stata 18.0 SE)

                  Comment


                  • #10
                    High Carlo,

                    Please kindly see the output below:

                    xi: xtreg LNNeonatalnew l.LNlow_rate_f l.LNDPTnew l.LNMeaslesnew l.LNPopulationnew l.LNSecondary_Grossnew l.LNFertilitynew l.LNGDPnew i.Year, robust

                    i.Year _IYear_2005-2010 (naturally coded; _IYear_2005 omitted)
                    note: _IYear_2010 omitted because of collinearity

                    Random-effects GLS regression Number of obs = 203
                    Group variable: Id Number of groups = 48

                    R-sq: Obs per group:
                    within = 0.7313 min = 1
                    between = 0.7218 avg = 4.2
                    overall = 0.7218 max = 5

                    Wald chi2(11) = 369.26
                    corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

                    (Std. Err. adjusted for 48 clusters in Id)
                    --------------------------------------------------------------------------------------
                    | Robust
                    LNNeonatalnew | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                    ---------------------+----------------------------------------------------------------
                    LNlow_rate_f |
                    L1. | -.0233584 .0257418 -0.91 0.364 -.0738115 .0270947
                    |
                    LNDPTnew |
                    L1. | -.0330512 .0227593 -1.45 0.146 -.0776586 .0115562
                    |
                    LNMeaslesnew |
                    L1. | -.0528075 .0241425 -2.19 0.029 -.100126 -.0054891
                    |
                    LNPopulationnew |
                    L1. | -.0444169 .0303463 -1.46 0.143 -.1038945 .0150608
                    |
                    LNSecondary_Grossnew |
                    L1. | -.1016777 .0449872 -2.26 0.024 -.189851 -.0135044
                    |
                    LNFertilitynew |
                    L1. | .45537 .1131286 4.03 0.000 .233642 .677098
                    |
                    LNGDPnew |
                    L1. | -.0054544 .0071695 -0.76 0.447 -.0195063 .0085975
                    |
                    _IYear_2006 | .0887855 .0412678 2.15 0.031 .007902 .1696689
                    _IYear_2007 | .1307056 .0302211 4.32 0.000 .0714733 .1899379
                    _IYear_2008 | .1269416 .0213035 5.96 0.000 .0851875 .1686956
                    _IYear_2009 | .0707181 .0128396 5.51 0.000 .0455529 .0958832
                    _IYear_2010 | 0 (omitted)
                    _cons | 3.222562 .2388929 13.49 0.000 2.75434 3.690783
                    ---------------------+----------------------------------------------------------------
                    sigma_u | .20081473
                    sigma_e | .08415948
                    rho | .850603 (fraction of variance due to u_i)
                    --------------------------------------------------------------------------------------

                    .
                    . xtoverid
                    o. operator not allowed
                    r(101);



                    Thanks

                    Comment


                    • #11
                      Khairr:
                      thanks for posting your output (for the future, please paste it within CODE delimiters. Thanks).
                      You have a secon year dummy omitted due to collinearity.
                      As far as I know, the only fix is to create separate year dummies and omit one (eg, year 2010) by hand.
                      Code:
                      use "http://www.stata-press.com/data/r15/nlswork.dta"
                      . tab year, gen(year_dum)
                      
                        interview |
                             year |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                               68 |      1,375        4.82        4.82
                               69 |      1,232        4.32        9.14
                               70 |      1,686        5.91       15.05
                               71 |      1,851        6.49       21.53
                               72 |      1,693        5.93       27.47
                               73 |      1,981        6.94       34.41
                               75 |      2,141        7.50       41.91
                               77 |      2,171        7.61       49.52
                               78 |      1,964        6.88       56.40
                               80 |      1,847        6.47       62.88
                               82 |      2,085        7.31       70.18
                               83 |      1,987        6.96       77.15
                               85 |      2,085        7.31       84.45
                               87 |      2,164        7.58       92.04
                               88 |      2,272        7.96      100.00
                      ------------+-----------------------------------
                            Total |     28,534      100.00
                      
                      . xi: xtreg ln_wage age year_dum2-year_dum15
                      
                      Random-effects GLS regression                   Number of obs     =     28,510
                      Group variable: idcode                          Number of groups  =      4,710
                      
                      R-sq:                                           Obs per group:
                           within  = 0.1060                                         min =          1
                           between = 0.0918                                         avg =        6.1
                           overall = 0.0807                                         max =         15
                      
                                                                      Wald chi2(15)     =    3253.70
                      corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                      
                      ------------------------------------------------------------------------------
                           ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                               age |   .0137208   .0018898     7.26   0.000     .0100169    .0174247
                         year_dum2 |   .0744312    .012506     5.95   0.000     .0499199    .0989425
                         year_dum3 |   .0453659   .0120494     3.77   0.000     .0217496    .0689822
                         year_dum4 |   .0819949   .0125373     6.54   0.000     .0574222    .1065676
                         year_dum5 |   .0827461   .0136074     6.08   0.000      .056076    .1094162
                         year_dum6 |   .0840751   .0143598     5.85   0.000     .0559304    .1122198
                         year_dum7 |   .0707387   .0167492     4.22   0.000     .0379108    .1035665
                         year_dum8 |   .1032639   .0197156     5.24   0.000      .064622    .1419059
                         year_dum9 |   .1279039   .0214888     5.95   0.000     .0857866    .1700211
                        year_dum10 |    .108871   .0247933     4.39   0.000      .060277     .157465
                        year_dum11 |    .098831   .0280824     3.52   0.000     .0437906    .1538714
                        year_dum12 |   .1127655   .0298539     3.78   0.000     .0542529    .1712781
                        year_dum13 |   .1380611   .0333412     4.14   0.000     .0727135    .2034087
                        year_dum14 |   .1264818   .0369222     3.43   0.001     .0541156     .198848
                        year_dum15 |   .1640382   .0393563     4.17   0.000     .0869012    .2411752
                             _cons |   1.162473     .03784    30.72   0.000     1.088308    1.236638
                      -------------+----------------------------------------------------------------
                           sigma_u |  .36664367
                           sigma_e |  .30300411
                               rho |  .59418375   (fraction of variance due to u_i)
                      ------------------------------------------------------------------------------
                      
                      . xtoverid
                      
                      Test of overidentifying restrictions: fixed vs random effects
                      Cross-section time-series model: xtreg re  
                      Sargan-Hansen statistic  88.037  Chi-sq(15)   P-value = 0.0000
                      
                      .
                      Kind regards,
                      Carlo
                      (Stata 18.0 SE)

                      Comment


                      • #12
                        Hi Carlo,

                        Thanks so much.
                        I tried creating the dummies by hand and yet still, one of the years is omitted due to collinearity even after excluding one of the year dummies.

                        Comment


                        • #13
                          Khair:
                          then I think you have to live with it and manually omit the dummy that contributes to create collinearity in addition to the reference year dummy.
                          Kind regards,
                          Carlo
                          (Stata 18.0 SE)

                          Comment


                          • #14
                            Ok, thanks.

                            Comment


                            • #15
                              Hi everybody,

                              I'm using panel data to examine the effect of eco age on tone and readability of annual reports for 5 years (2013-2017) and I have used fe and re. I am including year as dummies, however Stata is omitting year-dummy 2013 because of collinearity which make the total number of observations low. I am using the command:

                              local RHS1 "ceoage "
                              */

                              local controls "liquidityratio netincome firmsize PriceToBookValue boardsize firmage"

                              foreach depvar in tone readability {
                              forval i = 1/1 {
                              xi: xtreg `depvar' `RHS`i'' i.year, re robust


                              Any help would be hugely appreciated.

                              Many thanks,

                              dan billy

                              Comment

                              Working...
                              X