Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • very high VIF in FE: An issue?

    Hi,

    I am currently running my model fit diagnostics for my FE model (after Hausman Test led me there) for my unbalanced panel data (over 20 years for 300 firm observations). My main variables of interest are ROA (continuous DV) and Firmstatus (Family /non-Family hence binary IV). As I would lose my time invariant Binary Firmstatus variable using the FE command I instead added dummies for Industry Fixed Effects and Year Fixed Effects to the normal regress command (which if I only believe there to be those two kinds of FE should lead to the same result, right?!). Please find the code and output below. When applying the VIF test afterwards quite a few dummies show a very high VIF, I however believe to have heard that high VIF in the fixed effects (hence in the dummies) is not an issue for as long as my main variables of interest do have low VIF. Is that correct?
    Also my model excludes quite a few of the dummies due to multicollinearity. Is that something I can be relaxed about or should I exclude those dummies from the model? That however would lead to an inconsistent FE, wouldn't it? As only including the dummies for some years and some industries seems very counter-intuitive to me.

    Code:
    reg ROA Family_Firm_Identifier Founder_Identity FirmAge FirmSize Indebtedness dSIC* Dummy*
    note: dSIC_21 omitted because of collinearity
    note: dSIC_25 omitted because of collinearity
    note: dSIC_27 omitted because of collinearity
    note: dSIC_28 omitted because of collinearity
    note: dSIC_29 omitted because of collinearity
    note: dSIC_211 omitted because of collinearity
    note: dSIC_212 omitted because of collinearity
    note: dSIC_218 omitted because of collinearity
    note: dSIC_236 omitted because of collinearity
    note: dSIC_245 omitted because of collinearity
    note: dSIC_254 omitted because of collinearity
    note: Dummy1997 omitted because of collinearity
    
          Source |       SS           df       MS      Number of obs   =     1,941
    -------------+----------------------------------   F(70, 1870)     =      7.53
           Model |  5.25591976        70  .075084568   Prob > F        =    0.0000
        Residual |   18.653481     1,870  .009975124   R-squared       =    0.2198
    -------------+----------------------------------   Adj R-squared   =    0.1906
           Total |  23.9094008     1,940  .012324433   Root MSE        =    .09988
    
    ----------------------------------------------------------------------------------------
                       ROA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
    Family_Firm_Identifier |   .0188036   .0085665     2.20   0.028     .0020027    .0356044
          Founder_Identity |  -.0427618   .0098529    -4.34   0.000    -.0620856   -.0234379
                   FirmAge |   .1853727   .1287851     1.44   0.150    -.0672048    .4379503
                  FirmSize |    .003569   .0019351     1.84   0.065    -.0002263    .0073642
              Indebtedness |  -.0573934   .0159563    -3.60   0.000    -.0886874   -.0260993
                   dSIC_21 |          0  (omitted)
                   dSIC_22 |  -.1449583    .073005    -1.99   0.047    -.2881381   -.0017784
                   dSIC_23 |   .0009146   .0764311     0.01   0.990    -.1489847    .1508138
                   dSIC_24 |  -.0674274   .1009501    -0.67   0.504    -.2654141    .1305592
                   dSIC_25 |          0  (omitted)
                   dSIC_26 |  -.0012879   .0744527    -0.02   0.986    -.1473071    .1447313
                   dSIC_27 |          0  (omitted)
                   dSIC_28 |          0  (omitted)
                   dSIC_29 |          0  (omitted)
                  dSIC_210 |  -.0025597   .0872761    -0.03   0.977    -.1737286    .1686092
                  dSIC_211 |          0  (omitted)
                  dSIC_212 |          0  (omitted)
                  dSIC_213 |   .0368844    .072704     0.51   0.612    -.1057052    .1794739
                  dSIC_214 |  -.0374997   .0718448    -0.52   0.602    -.1784041    .1034047
                  dSIC_215 |  -.0157173   .0762414    -0.21   0.837    -.1652445    .1338099
                  dSIC_216 |  -.0230872   .0734984    -0.31   0.753    -.1672348    .1210603
                  dSIC_217 |   .0176328   .0748015     0.24   0.814    -.1290703     .164336
                  dSIC_218 |          0  (omitted)
                  dSIC_219 |   .0173708   .0747813     0.23   0.816    -.1292929    .1640345
                  dSIC_220 |  -.0658194   .0718646    -0.92   0.360    -.2067626    .0751238
                  dSIC_221 |  -.0647588   .0718314    -0.90   0.367     -.205637    .0761194
                  dSIC_222 |   .0349215   .0725374     0.48   0.630    -.1073412    .1771841
                  dSIC_223 |  -.0128589    .072365    -0.18   0.859    -.1547835    .1290657
                  dSIC_224 |  -.0398613   .0751402    -0.53   0.596    -.1872288    .1075062
                  dSIC_225 |  -.0558447   .0743697    -0.75   0.453     -.201701    .0900116
                  dSIC_226 |  -.0873475     .08709    -1.00   0.316    -.2581514    .0834563
                  dSIC_227 |   .0127991   .0735681     0.17   0.862    -.1314851    .1570833
                  dSIC_228 |   .0115055   .0752447     0.15   0.878     -.136067    .1590779
                  dSIC_229 |  -.0619794   .0747614    -0.83   0.407    -.2086039     .084645
                  dSIC_230 |   .0092115   .0731454     0.13   0.900    -.1342437    .1526668
                  dSIC_231 |  -.0776747   .0724366    -1.07   0.284    -.2197397    .0643904
                  dSIC_232 |  -.0282482   .0724722    -0.39   0.697    -.1703831    .1138868
                  dSIC_233 |  -.0375641   .0725277    -0.52   0.605    -.1798078    .1046796
                  dSIC_234 |   .0437672   .0746224     0.59   0.558    -.1025847    .1901191
                  dSIC_235 |   .0005581   .0741014     0.01   0.994    -.1447721    .1458882
                  dSIC_236 |          0  (omitted)
                  dSIC_237 |   .0545687   .0726822     0.75   0.453     -.087978    .1971155
                  dSIC_238 |  -.0074123   .0764702    -0.10   0.923    -.1573882    .1425637
                  dSIC_239 |   .0703232   .0719653     0.98   0.329    -.0708175    .2114638
                  dSIC_240 |  -.0552216   .0724247    -0.76   0.446    -.1972633    .0868201
                  dSIC_241 |  -.1479057   .0721074    -2.05   0.040    -.2893251   -.0064862
                  dSIC_242 |  -.0484005   .0813355    -0.60   0.552    -.2079184    .1111175
                  dSIC_243 |  -.0688708    .072801    -0.95   0.344    -.2116505    .0739089
                  dSIC_244 |  -.1035505   .0742998    -1.39   0.164    -.2492696    .0421687
                  dSIC_245 |          0  (omitted)
                  dSIC_246 |   .1040758   .1011427     1.03   0.304    -.0942886    .3024402
                  dSIC_247 |  -.0621619   .0805607    -0.77   0.440    -.2201601    .0958364
                  dSIC_248 |  -.0381583   .0924176    -0.41   0.680    -.2194108    .1430942
                  dSIC_249 |  -.0188679   .0715424    -0.26   0.792    -.1591792    .1214434
                  dSIC_250 |  -.0536758   .0791216    -0.68   0.498    -.2088517    .1015001
                  dSIC_251 |  -.0338603   .1010242    -0.34   0.738    -.2319923    .1642717
                  dSIC_252 |  -.0104392   .0723414    -0.14   0.885    -.1523176    .1314392
                  dSIC_253 |  -.0396922    .075325    -0.53   0.598    -.1874221    .1080377
                  dSIC_254 |          0  (omitted)
                 Dummy1996 |  -.0074366   .0160545    -0.46   0.643    -.0389232    .0240499
                 Dummy1997 |          0  (omitted)
                 Dummy1998 |   .0033208   .0158193     0.21   0.834    -.0277046    .0343462
                 Dummy1999 |   .0085642   .0155393     0.55   0.582    -.0219119    .0390403
                 Dummy2000 |   .0076011   .0154038     0.49   0.622    -.0226093    .0378115
                 Dummy2001 |  -.0204896   .0153838    -1.33   0.183    -.0506608    .0096817
                 Dummy2002 |   -.013163   .0153123    -0.86   0.390     -.043194     .016868
                 Dummy2003 |  -.0112213   .0153591    -0.73   0.465    -.0413441    .0189014
                 Dummy2004 |  -.0097841   .0153916    -0.64   0.525    -.0399706    .0204023
                 Dummy2005 |  -.0102948   .0154268    -0.67   0.505    -.0405503    .0199608
                 Dummy2006 |  -.0059925   .0157086    -0.38   0.703    -.0368008    .0248158
                 Dummy2007 |  -.0121173   .0157905    -0.77   0.443    -.0430863    .0188516
                 Dummy2008 |  -.0196699    .015766    -1.25   0.212    -.0505907     .011251
                 Dummy2009 |  -.0219917    .015882    -1.38   0.166      -.05314    .0091566
                 Dummy2010 |  -.0055589   .0160647    -0.35   0.729    -.0370655    .0259477
                 Dummy2011 |  -.0026359    .015913    -0.17   0.868    -.0338449    .0285731
                 Dummy2012 |  -.0220094   .0159773    -1.38   0.169    -.0533446    .0093258
                 Dummy2013 |  -.0035355   .0162617    -0.22   0.828    -.0354285    .0283574
                 Dummy2014 |  -.0041097   .0161099    -0.26   0.799    -.0357051    .0274856
                 Dummy2015 |  -.0189518   .0161629    -1.17   0.241    -.0506509    .0127473
                 Dummy2016 |  -.0173793   .0162946    -1.07   0.286    -.0493368    .0145782
                 Dummy2017 |  -.0176761   .0167466    -1.06   0.291    -.0505202     .015168
                 Dummy2018 |  -.0204806   .0217803    -0.94   0.347    -.0631969    .0222356
                     _cons |  -1.232561   .9828261    -1.25   0.210    -3.160112    .6949908
    ----------------------------------------------------------------------------------------
    
    .
    end of do-file
    
    . estat vif
    
        Variable |       VIF       1/VIF  
    -------------+----------------------
        dSIC_249 |    104.81    0.009541
        dSIC_214 |     68.10    0.014684
        dSIC_221 |     65.86    0.015183
        dSIC_241 |     56.85    0.017590
        dSIC_220 |     51.88    0.019274
        dSIC_239 |     50.17    0.019930
        dSIC_231 |     41.79    0.023928
        dSIC_213 |     35.76    0.027968
        dSIC_230 |     33.20    0.030125
        dSIC_240 |     32.05    0.031199
        dSIC_223 |     31.02    0.032240
        dSIC_252 |     30.50    0.032782
        dSIC_233 |     30.17    0.033149
        dSIC_232 |     29.63    0.033754
        dSIC_243 |     27.39    0.036508
        dSIC_237 |     27.30    0.036628
        dSIC_222 |     27.19    0.036774
         dSIC_22 |     21.96    0.045547
        dSIC_216 |     17.57    0.056924
        dSIC_227 |     17.08    0.058561
        dSIC_225 |     14.22    0.070309
        dSIC_235 |     14.12    0.070819
        dSIC_234 |     13.23    0.075574
        dSIC_244 |     12.58    0.079505
         dSIC_26 |     12.09    0.082734
        dSIC_228 |     11.79    0.084815
        dSIC_217 |     11.65    0.085823
        dSIC_224 |     11.20    0.089257
        dSIC_219 |     11.10    0.090116
        dSIC_229 |     11.09    0.090164
        dSIC_253 |      9.03    0.110794
        dSIC_215 |      8.10    0.123467
        dSIC_238 |      7.57    0.132102
         dSIC_23 |      7.56    0.132237
        dSIC_250 |      5.62    0.177871
        dSIC_247 |      5.18    0.192919
        dSIC_242 |      4.63    0.216187
    Founder_Id~y |      4.11    0.243097
    Family_Fir~r |      3.49    0.286474
        dSIC_210 |      3.05    0.328068
        dSIC_226 |      3.04    0.329471
        dSIC_248 |      2.56    0.389906
       Dummy2002 |      2.19    0.457205
       Dummy2003 |      2.18    0.458861
       Dummy2005 |      2.18    0.459329
       Dummy2004 |      2.17    0.461435
       Dummy2001 |      2.16    0.461901
       Dummy2000 |      2.15    0.465301
       Dummy2006 |      2.10    0.476032
       Dummy2007 |      2.10    0.476203
       Dummy1999 |      2.10    0.476288
       Dummy2008 |      2.09    0.477685
       Dummy2009 |      2.08    0.481163
       Dummy2014 |      2.07    0.483763
       Dummy2016 |      2.07    0.484014
       Dummy2011 |      2.06    0.484668
       Dummy2015 |      2.06    0.486200
       Dummy2012 |      2.06    0.486235
        dSIC_246 |      2.05    0.488053
        dSIC_251 |      2.04    0.489199
        FirmSize |      2.04    0.489520
         dSIC_24 |      2.04    0.489917
       Dummy2010 |      2.03    0.492160
       Dummy1998 |      2.02    0.495994
       Dummy2013 |      2.01    0.497744
       Dummy2017 |      2.00    0.499703
         FirmAge |      1.97    0.507707
       Dummy1996 |      1.96    0.510676
    Indebtedness |      1.51    0.662036
       Dummy2018 |      1.45    0.689321
    -------------+----------------------
        Mean VIF |     14.92
    
    .
    Thank you very much for your advice in advance.
    Last edited by Jon Hoefer; 07 Feb 2020, 03:44.

  • #2
    John:
    it is not clear tome what you've done.
    You mentioned a fixed effect regression (that makes me think of a panel data structure), but you did not use -xtreg-.
    As far as your current code is concerned, you ran an OLS that ignores the panel data structure of your data and treats all the observations as independednt (whereas they are grouped within the related panel if you actually have a penel dataset).
    Hence, before worrying yourself with -estat vif-, I would first fix the previous points.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo, thank you for your answer. Maybe I did not emphasise the point above enough, but I did not use xtreg because that would make me lose my main IV (binary and time invariant) Family_Firm_Identifier (just like you showed in an other post with a time invariant binary variable "race" https://www.statalist.org/forums/for...ata-regression in #12). As my main hypothesis is that this very IV influences ROA it would obviously make no sense to apply this technique. I have read in another post in this forum that one can also use the simple regress command for as long as one "manually" includes the fixed effects. I follow other authors in assuming fixed effects for industry and year and hence include a dummy for every 2-digit SIC (industry) and every year. Maybe to get this straight before I even explore any further, is that incorrect?
      Last edited by Jon Hoefer; 07 Feb 2020, 03:58.

      Comment


      • #4
        Maybe one thing to add: Before I run the regression,
        I sorted my data as panel data by:
        Code:
         xtset GVKEY Fiscal_Year
               panel variable:  GVKEY (unbalanced)
                time variable:  Fiscal_Year, 1996 to 2018, but with gaps
                        delta:  1 unit
        Where GVKEY is a firm identifier and Fiscal_Year my time variable of interest. Now after having set the data to panel, having done the Hausman test which clearly argues against RE (p<0.0001) running the above OLS with dummies for each (minus one of course to the dummy problem) year and industry both xtreg, fe and regress should lead to the same results, but allowing me to keep my time-invariant IV? Otherwise, what procedure would you suggest?

        Comment


        • #5
          Jon:
          thanks for claryfing.
          Some comments about your last post:
          1) why creating dummies yourself when you can reply upor the wonderful capabilities of -fvvarlist- notation (that also rules out a reference category of the categorical variable by default)?;
          2) if you go -regress- , there's no need to -xtset- your data first;
          3) if you want to investigate a panel data regression with a time-invariant predictor you're particularly interested in, see -help mundlak-;
          4) if -hausman- point you towards -re- specification, a fixed effect OLS would not be consistent anyhow.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Grazie mille for your reply, Carlo.
            I am referring to your 4 remarks individually:
            1. I am afraid I do not really get what you meant by your statement. Despite there (if I get you correctly) being a smoother way of creating dummies, mine should still do the job, shouldn't they?
            2. Ok thanks
            3. In other words: you would suggest looking into hybrid models?
            4. Hausman - unfortunately - points clearly towards a FE specification.
            Code:
             quietly xtreg ROA Family_Firm_Identifier Founder_Identity FirmAge FirmSize Indebtedness dSIC* Dummy*,fe
            
            .  estimates store fixed
            
            .  quietly xtreg ROA Family_Firm_Identifier Founder_Identity FirmAge FirmSize Indebtedness dSIC* Dummy*,re
            
            .  estimates store random
            
            .  hausman fixed random
            
            Note: the rank of the differenced variance matrix (6) does not equal the number of coefficients being tested (27); be sure this is what you expect, or there may be
                    problems computing the test.  Examine the output of your estimators for anything unexpected and possibly consider scaling your variables so that the
                    coefficients are on a similar scale.
            
                             ---- Coefficients ----
                         |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                         |     fixed        random       Difference          S.E.
            -------------+----------------------------------------------------------------
            Family_Fir~r |    .0024099     .0041036       -.0016937         .004278
            Founder_Id~y |   -.0022414    -.0105292        .0082878          .00457
                 FirmAge |   -3.894968    -.1777954       -3.717173        9.004528
                FirmSize |   -.0004543     .0008761       -.0013304        .0034097
            Indebtedness |   -.2012069    -.1851324       -.0160745        .0060527
               Dummy1996 |    .0223855     .0228469       -.0004614        .0079928
               Dummy1997 |    .0266767     .0281036       -.0014269        .0073347
               Dummy1998 |    .0252141     .0278399       -.0026258        .0065317
               Dummy1999 |    .0246568     .0292352       -.0045784          .00613
               Dummy2000 |    .0182915     .0233822       -.0050907        .0055048
               Dummy2001 |   -.0063129    -.0024861       -.0038268        .0053867
               Dummy2002 |    .0015466     .0051724       -.0036258        .0053229
               Dummy2003 |    .0029164     .0065888       -.0036724        .0050955
               Dummy2004 |    .0010688     .0050439       -.0039752        .0047036
               Dummy2005 |    .0008576     .0048051       -.0039475        .0044071
               Dummy2006 |    .0048292      .009085       -.0042558        .0041151
               Dummy2007 |    -.000525     .0029007       -.0034256        .0037287
               Dummy2008 |   -.0064345    -.0031635       -.0032711        .0036727
               Dummy2009 |   -.0161623     -.012405       -.0037573        .0035329
               Dummy2010 |    -.003255     .0011726       -.0044275        .0032054
               Dummy2011 |   -.0018434     .0024622       -.0043056        .0028133
               Dummy2012 |   -.0202648    -.0161874       -.0040774        .0025327
               Dummy2013 |   -.0028271     .0011176       -.0039448        .0021046
               Dummy2014 |     .000733     .0034577       -.0027246         .001594
               Dummy2015 |   -.0081857    -.0061809       -.0020048        .0015425
               Dummy2016 |   -.0081859    -.0055372       -.0026487        .0012423
               Dummy2017 |   -.0083356    -.0059177       -.0024179        .0008024
            ------------------------------------------------------------------------------
                                       b = consistent under Ho and Ha; obtained from xtreg
                        B = inconsistent under Ha, efficient under Ho; obtained from xtreg
            
                Test:  Ho:  difference in coefficients not systematic
            
                              chi2(6) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                                      =       19.86
                            Prob>chi2 =      0.0029
                            (V_b-V_B is not positive definite)
            The one thing you were not referring to was my question as to wether or not the two approaches should lead the same results.
            However I assume by your comment 3) that my current approach is not achieving what I intended to, correct?

            Comment


            • #7
              Maybe one more thing to add, which puzzles me:
              I am following Anderson, Reeb (2003) (https://onlinelibrary.wiley.com/doi/...540-6261.00567) which quote:
              " We use a two-way fixed effects model for our regression analysis. The fixed effects are dummy variables. For each year of the sample and dummy variables for each two-digit SIC code"
              . This is exactly what I did with my dummies above. They however are then able to report significant coefficients for Family Firm Identity (again: time-invariant and binary), just as I was so far using the OLS command with my dummies. This is a top notch paper so their model cannot be misspecified. So is there any other possible trick they may have used? They cleary state two-way fixed effects and not hybrid, so I assume they must have used the classic FE model somehow?

              Comment


              • #8
                Jon:
                you're welcome.
                1) your handmade categorical variables do the job but are error prone;
                3) yes;
                4) I agree and, in all likelihood, I mistook one part of your previous post. -hausman- outcom points you toward -fe- (not -re-, as I erroneously surmised in my previous reply. Sorry for that.), but -fe- estimator wipes out time-invariant predictors (as it should). If the omission of time-invariant catergorical variable is problematic for your research goal, going -mundlak- is the only option that springs to my mind.

                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Jon:
                  it may well be that they did not compare -fe- vs -re- specification via -hausman-.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Originally posted by Carlo Lazzaro View Post
                    Jon:
                    it may well be that they did not compare -fe- vs -re- specification via -hausman-.
                    But they still use a (two-way) FE model and are able to report a coefficient for the time-invariant dummy . Maybe they just did what I do, namely use an OLS which they tried to adjust towards FE adding the dummies, as they also state:
                    The regression equation we employ for our multivariate analysis takes the form [...]
                    (maybe this is a hint towards OLS) I wonder if I can find the other post on this forum where someone pointed out that this is often the case in Management papers, but the more I think about it, the more it worries me that this statistically does not really make sense. Because either way, if the OLS adjusted towards FE really does the same job, it should also lead the same results, right? Than how does the one drop the IV (namely the FE) while the other doesn't?! Hence my confusion.

                    Comment


                    • #11
                      But Carlo:
                      To come back to the initial issue of the topic, assuming that the model works correctly, how would you deal with the high VIF and multicollinearity?

                      Comment


                      • #12
                        Jon:
                        my gut-feeling is that most of the problem depends on the categorical variables.
                        Hence:
                        1) I would use -testparm- to check whether or not they are necessary (I usually do not discard non-significant predictors on a priori basis, since my opinion is that significant as well as non-significant results are equally informative. But when problems creep up, measures should be taken);
                        2) I would investigate whether any of your continuos predictor actuallly show evidence of a non-linear relationship with the regressand;
                        3) I would check whether you model is correctly specified or not (see -estat ovtest- and/or -linktest-). By the way, point 2), if proved, is a form of misspecification;
                        4) I would check whether serial correlation in the idiosyncratic error within panel does exist, as your time dimension actually stretches over a 20-year timespan. If that were the case, just -cluster- your standard errors on your -panelid-;
                        5) if you do nit detect serial correlation but heteroskedasticity only (see -estat hettest), go -robust- instead of -cluster-. If you detect both serial correlation and heteroskedastcicity, go -cluster-.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Grazie mille, anzi duemila Carlo, this is highly valuable. I will do what you suggested and in the meantime contact by supervisor with regards to the paper as he is familiar with it. Thank you and have a nice weekend.

                          Comment


                          • #14
                            Jon:
                            Italian speaker or Google translator (that makes me proficient in many languages) user?
                            Mores seriously, I do reciprocate all the best for you and your research (leisure week-en included).
                            If your research is the core part of your dissertation/thesis, discuss your whole regression strategy with your supervisor and update her/him frequently, just to avoid unpleasant surprises during the last mile of your academic run.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Dear Carlo,

                              indeed Italien speaking (although unfortunately non-native but fluent ). Just to clear up the puzzle from above: Mr. Anderson himself just answered my via mail, sometimes daring to ask is the best one can do, here is the reply:
                              In the regressions, we used dummy variables for time and industry and then clustered on firm-level identifier. These were pooled OLS regressions. Some of the issue deals with changing nomenclature that we use in describing econometric techniques. By today's standards, we would call it pooled OLS with dummies for industry and time.
                              Will hence do the same and potentially come back with the model specifications at a later stage.
                              Buon fine settimana

                              Comment

                              Working...
                              X