Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Overlapping dummies, multicollinearity issue?

    Dear Statalists, a quick heads up: The following question is not Stata but conception related, and probably an easy one but I somehow cannot see the forrest for the trees currently (have been overthinking this for too long) and could therefore need your help and did not find this question being asked previously on this platform and maybe others might benefit from the answer, too.
    In the analysis for my thesis, I use several dummies:
    1) Founder dummy = 1, if founder of a firm is present on firm or holds >5% shares
    2) Family member dummy = 1, if family members (of the founder) are present on a firm or hold >5% shares
    3) Family Firm Status = if if 1) + 2) >= 1 (in plain English: Either the founder, or family members or both groups satisfy the conditions)
    Now I wanted to include both the Family firm Status and the Founder Dummy as Independent variables in my model. However, technically 1) is a subset of 3) (in that every founder firm is also a Family Firm but not every Family Firm is a Founder firm), hence including both dummies should lead to issues, correct? I do not have VIF issues, but still thinking about it makes me wonder whether or not this is actually correct, as this may just be due to a small overlap (that is the vast majority of firms in my sample might be Family firms without the founder still on board). Thank you very much in advance for your help, Jon.

  • #2
    Jon:
    my first attempt would be to include both the predictors and see whether a perfect/quasi-perfect multicollinearity issue comes alive.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo, thank you, as always, for your prompt feedback. I have indeed included both and in fact, subsequent VIF tests do not imply issues (far below 10), despite the correlation between the two being 0.82 and significant at p<0.01). Furthermore, the results change indeed dramatically once I include both (before Family Firm status is insignificant) as founder effect and Family Firm effect seem to go in two different directions (both are significant, one positive the other negative). However, what is the interpretation of Family Firm status after including also the founder dummy, which is also part of the Family Firm status. Can I still say that c.p. the Family Firms (that is firms with the founder or family members or both present) differ in beta*1 units from their non-Family Firm counterparts? Or is does the Family Firm Status coefficient get reduced to only those firms without a founder present (hence technically 2))?

      Edit: Attached the results for the VIF
      Code:
       vif
      
          Variable |       VIF       1/VIF  
      -------------+----------------------
        _ISIC_2_73 |      5.83    0.171665
      Founder_Id~y |      4.43    0.225612
        _ISIC_2_28 |      4.35    0.229678
        _ISIC_2_36 |      4.31    0.232036
      Family_Fir~r |      3.69    0.271112
      Last edited by Jon Hoefer; 12 Feb 2020, 06:40.

      Comment


      • #4
        Jon:
        can you please post the regression output? Thanks.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Sure:
          Output with both included:
          Code:
          Linear regression                               Number of obs     =      1,771
                                                          F(56, 180)        =          .
                                                          Prob > F          =          .
                                                          R-squared         =     0.2536
                                                          Root MSE          =     .09752
          
                                                    (Std. Err. adjusted for 181 clusters in GVKEY)
          ----------------------------------------------------------------------------------------
                                 |               Robust
                             ROA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -----------------------+----------------------------------------------------------------
          Family_Firm_Identifier |   .0272986   .0129049     2.12   0.036     .0018344    .0527629
                Founder_Identity |  -.0405587   .0143123    -2.83   0.005    -.0688002   -.0123172
                         FirmAge |   .3204976    .322732     0.99   0.322    -.3163272    .9573223
                        FirmSize |   .0036477   .0042386     0.86   0.391    -.0047161    .0120114
                       Growthopp |  -.1409543   .0286455    -4.92   0.000    -.1974786   -.0844301
                    Indebtedness |  -.0500975   .0781068    -0.64   0.522    -.2042202    .1040251
          Output with only Family Firm Status:
          Code:
          Linear regression                               Number of obs     =      1,771
                                                          F(55, 180)        =          .
                                                          Prob > F          =          .
                                                          R-squared         =     0.2469
                                                          Root MSE          =     .09792
          
                                                    (Std. Err. adjusted for 181 clusters in GVKEY)
          ----------------------------------------------------------------------------------------
                                 |               Robust
                             ROA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -----------------------+----------------------------------------------------------------
          Family_Firm_Identifier |   -.000121   .0122647    -0.01   0.992    -.0243221    .0240802
                         FirmAge |   .1546479   .3463861     0.45   0.656    -.5288518    .8381476
                        FirmSize |   .0040671   .0043195     0.94   0.348    -.0044563    .0125905
                       Growthopp |  -.1438089   .0296015    -4.86   0.000    -.2022195   -.0853982
                    Indebtedness |  -.0525091   .0786527    -0.67   0.505    -.2077089    .1026908
          As you see the results are tremendously different and it appears that including both (for as long as it statistically acceptable) reveals more information

          Comment


          • #6
            Jon:
            I would get rid of the only Family Firm Status regression model.
            Please note that results may sound worse than they actually are: in your second regression model -Family_Firm_Identifier- actually plays no role (and the flipped sign conveys no information).
            That said, I would check whether your first model is correctly specified (see -estat ovtest-; -linktest-): for instance, have you already checked that -FirmAge- has no quadratic relationship with the regressand?
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Ok thank you,
              do you have an opinion on:
              However, what is the interpretation of Family Firm status after including also the founder dummy, which is also part of the Family Firm status. Can I still say that c.p. the Family Firms (that is firms with the founder or family members or both present) differ in beta*1 units from their non-Family Firm counterparts? Or is does the Family Firm Status coefficient get reduced to only those firms without a founder present (hence technically 2))?
              This is what worries me the most, as I somehow am not sure how to interpret my results

              Comment


              • #8
                Jon:
                how is ROA expressed (log; percentage; else)?
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  As EBITDA/Book value of total assets, hence a fraction/percentage

                  Comment


                  • #10
                    Jon:
                    thanks for claryfing (my last experience with corporate finance dates back to more than 30 years ago!).
                    That said:
                    1) coeteris paribus (c.p.) (also known as: other things being equal), when -Family_Firm_Identifier- takes on 1, ROA increases by 3 percentage points;
                    2) c.p., when -Founder_Identity- takes on 1, ROA decreases by 4 percentage points.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Great, thank you. As for the omitted variable test (RESET) this seems to be an issue:
                      Code:
                      . estat ovtest
                      
                      Ramsey RESET test using powers of the fitted values of ROA
                             Ho:  model has no omitted variables
                                      F(3, 1698) =     27.31
                                        Prob > F =      0.0000
                      
                      . linktest
                      
                            Source |       SS           df       MS      Number of obs   =     1,771
                      -------------+----------------------------------   F(2, 1768)      =    303.28
                             Model |  5.53576864         2  2.76788432   Prob > F        =    0.0000
                          Residual |  16.1354861     1,768  .009126406   R-squared       =    0.2554
                      -------------+----------------------------------   Adj R-squared   =    0.2546
                             Total |  21.6712547     1,770  .012243647   Root MSE        =    .09553
                      
                      ------------------------------------------------------------------------------
                               ROA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                              _hat |   .9412379   .0469053    20.07   0.000     .8492422    1.033234
                            _hatsq |   .3920246   .1541232     2.54   0.011     .0897417    .6943074
                             _cons |  -.0011915   .0065296    -0.18   0.855    -.0139981    .0116152
                      ------------------------------------------------------------------------------
                      However, adding various squared terms of the variables changes little to nothing, hence I conclude to leave to model as it is and discuss the shortcomings in the limitations part?
                      The issue surely stems from unavailable data, such as, most importantly, CEO ability/skill. Furthermore, previous research has used additional variables which need to be hand collected, which is not feasible in the permitted time frame. As such, I would again address this in the limitations part of the thesis. As I only can work with the data that is available to me / collectable in the short timeframe of for my thesis, would this be sufficient, or is there any other option for me, that I currently overlook? I already cluster my data which should at least counteract heteroscedasticity and endogeneity, both seemed to be an issue too, when tested!
                      Ps. Just FYI, this way of calculating ROA is just one way of estimating it, you could have as well used EBIT instead of EBITDA or net income / BV of total assets...
                      Last edited by Jon Hoefer; 12 Feb 2020, 08:21.

                      Comment


                      • #12
                        Jon:
                        the main issue here is that a misspecified model is basically unreliable, because misspecification may hide endogeneity (and non-default standard errors cannot take endogeneity into account).
                        Obviously, the lack of other predictors could be the reason of that: I would discuss this issue with your supervisor (focusing on the most palatable way to touch upon model limitation in your dissertation), to avoid unexpected criticisms (especially from discussants) when you'll present your results.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Thanks for the suggestions. Would you confirm the following statement: A RESET test leading to a lower F value indicates a better model fit, regardless of the p value? Because I can indeed reduce the F from 27 to 12, which however still leads to highly significant results and hence me to reject the H0 that no variables are omitted.

                          Comment


                          • #14
                            Jon:
                            not quite.
                            If the regression model does not pass the -estat ovtest- or -linktest- is misspecified.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Originally posted by Carlo Lazzaro View Post
                              Jon:
                              thanks for claryfing (my last experience with corporate finance dates back to more than 30 years ago!).
                              That said:
                              1) coeteris paribus (c.p.) (also known as: other things being equal), when -Family_Firm_Identifier- takes on 1, ROA increases by 3 percentage points;
                              2) c.p., when -Founder_Identity- takes on 1, ROA decreases by 4 percentage points.
                              Does that mean that a founder firm, i.e. a firm with a founder present but no other members decreases ROA by 1 percentage point (i.e. +3% for being a family firm and -4% for being a founder firm)?

                              Comment

                              Working...
                              X