Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorial Variable has Collinearity with Fixed Effects? Help!

    Hi all, I'm in urgent need of help and would seriously appreciate your time! I am conducting a regression of a few independent variables against the dependent variable (Firm Value) with company-specific fixed effects. However, one of my independent variables is collinear with the fixed effects. This independent variable (that exhibits the collinearity) has either a value of 0 or 1 depending on whether the company has a sunset clause on its dual class shares (i.e. dual class shares must retire after a certain period of time). I cannot remove this independent variable because that's the whole point of my study. I used xtreg firmvalue {the different indep vars}, fe. Is there any way to resolve this?

    p.s. completely new to stata here
    Last edited by Emily Heng; 05 Apr 2019, 10:18.

  • #2
    Hi Emily,
    The simple answer is NO. if the variable is co-linear with the fixed effects, you cannot differentiate between the Firm FE and the variable of interest.
    A more complex answer is, perhaps, under specific assumptions. Look into Random effect model (xtreg, re). That will technically let you include variables that are collinear with the fixed effects.
    For now, i suggest to explore that option, look into the manual, formulas and remarks (they do a good job explaing the assumptions of each model), and take a look at any intro to econometrics book that distinguishes between both strategies. Its always a good idea to know what xtreg does before using it.
    HTH
    Fernando

    Comment


    • #3
      Hi Fernando,

      Thanks so much for your reply! Unfortunately, the company-specific effects are likely correlated with my independent variables (return on assets, leverage etc). Hence, the random effect model is probably not suitable for my case.

      The regression I'm running is almost identical to the one run in this report: https://www.sec.gov/files/case-again...a-appendix.pdf (See page 3 & 10). What really confuses me is that the authors managed to have both industry fixed effects (I opted for company fixed effects) and the categorical variable in their regression. Am I missing something here?

      Btw, I set my panel data at:
      Code:
      xtset company1, YearsinceIPO
      Last edited by Emily Heng; 05 Apr 2019, 20:46.

      Comment


      • #4
        There model is different from yours. They do not use firm level fixed-effects: they use industry level fixed effects. The dual shares variable will, as you have found, be constant over time within a firm, but it will not be constant across an entire industry. That's why that variable can remain in their model.

        That said, the price they have paid for this is that they have largely ignored the problem of nesting of observations within firms. They use standard errors clustered at the firm (not the industry) level--which partly, but not completely, deals with this problem. I have certainly seen this partial solution used frequently. I'm not a fan, because I think that model mis-specification can be a greater sin than inconsistent estimation. To fully deal with it requires including firm level fixed effects instead of industry level, and that would kill the dual shares variable for them, as it had for you.

        As Fernando Rios has pointed out, a random effects model would not have this limitation. I understand why you feel it may not be suitable. Another alternative is a hybrid model. -xthybrid-, available from SSC, estimates these rather conveniently and might be suitable here. It gives you some of the advantages of both fixed and random effects models all wrapped up in a single analysis. And another alternative is to use -xtreg, be-, though if the effects you are looking at operate over time the loss of the time dimension in an -xtreg, be- analysis would be a serious problem.

        Comment


        • #5
          Hi Clyde, that definitely cleared up my confusion (= Will definitely look into -xthybrid-.

          In the meantime, I tried switching from company-specific to industry-specific fixed effects instead.

          Code:
          xtreg FirmValue size roa cap_ex r_and_d cash leverage Sunsetclause i.industry i.YearsinceIPO
          However, Stata automatically conducts a "random-effects GLS regression" for me instead of the fixed-effects model I wanted. Why is that so? And is there a way to resolve this as I want to avoid using the random-effects model?

          Many thanks!

          Comment


          • #6
            Emily:
            the last part of your query is easily solved by adding -, fe- at the end of your code:
            Code:
            xtreg FirmValue size roa cap_ex r_and_d cash leverage Sunsetclause i.industry i.YearsinceIPO, fe
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Hi Carlo,

              Thank you for your reply! If I add -fe- to the end of my code, there will be company-specific fixed effects as well which I'm trying to avoid due to its collinearity with my Sunsetclause categorical variable. I only wish to include industry and time fixed effects.

              Is there any way to address this?

              Comment


              • #8
                Emily:
                the problem then lies in the predictors of your model and the difference with the regression reported in the paper you mentioned (as Others wisely commented on previously).
                As Clyde helpfully pointed you out to, the community-contributed command -xthybrid- may be a solution.
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment


                • #9
                  Hi Carlos,

                  To clarify, I'm keeping my regression the same as what was done in the paper I mentioned by using industry instead of company-fixed effects. So that should technically allow me to avoid collinearity and perform the fixed effects model- but Stata performs a "random-effects GLS regression" instead. Are you saying that there is no way for Stata to perform the fixed effects model without also automatically including the company-fixed effects?

                  Comment


                  • #10
                    Try Correlated Random Effects (well explained in Wooldridge's Introductory Econometrics textbook. If you do you don't need to avoid company fixed effects. Here is a simple example where ldist (log of distance) and ldistsq (square of the log of distance ) are time invariant. In my example, I have only concen(tration) which varies over time.So I create a variable called concenbar which is the mean value of concen for every route.
                    Code:
                    . use airfare
                    . xtreg lfare concen ldist ldistsq y98 y99 y00, fe cluster(id)
                    note: ldist omitted because of collinearity
                    note: ldistsq omitted because of collinearity
                    
                    Fixed-effects (within) regression               Number of obs     =      4,596
                    Group variable: id                              Number of groups  =      1,149
                    
                    R-sq:                                           Obs per group:
                         within  = 0.1352                                         min =          4
                         between = 0.0576                                         avg =        4.0
                         overall = 0.0083                                         max =          4
                    
                                                                    F(4,1148)         =     120.06
                    corr(u_i, Xb)  = -0.2033                        Prob > F          =     0.0000
                    
                                                     (Std. Err. adjusted for 1,149 clusters in id)
                    ------------------------------------------------------------------------------
                                 |               Robust
                           lfare |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                          concen |    .168859   .0494587     3.41   0.001     .0718194    .2658985
                           ldist |          0  (omitted)
                         ldistsq |          0  (omitted)
                             y98 |   .0228328    .004163     5.48   0.000     .0146649    .0310007
                             y99 |   .0363819   .0051275     7.10   0.000     .0263215    .0464422
                             y00 |   .0977717   .0055054    17.76   0.000     .0869698    .1085735
                           _cons |   4.953331   .0296765   166.91   0.000     4.895104    5.011557
                    -------------+----------------------------------------------------------------
                         sigma_u |  .43389176
                         sigma_e |  .10651186
                             rho |  .94316439   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    
                    . * Correlated Random Effects
                    . 
                    . egen concenbar = mean(concen), by(id)
                    . 
                    . xtreg lfare concen concenbar ldist ldistsq y98 y99 y00, re cluster(id)
                    
                    Random-effects GLS regression                   Number of obs     =      4,596
                    Group variable: id                              Number of groups  =      1,149
                    
                    R-sq:                                           Obs per group:
                         within  = 0.1352                                         min =          4
                         between = 0.4216                                         avg =        4.0
                         overall = 0.4068                                         max =          4
                    
                                                                    Wald chi2(7)      =    1273.17
                    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                    
                                                     (Std. Err. adjusted for 1,149 clusters in id)
                    ------------------------------------------------------------------------------
                                 |               Robust
                           lfare |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                          concen |    .168859   .0494749     3.41   0.001       .07189    .2658279
                       concenbar |   .2136346   .0816403     2.62   0.009     .0536227    .3736466
                           ldist |  -.9089297   .2721637    -3.34   0.001    -1.442361   -.3754987
                         ldistsq |   .1038426   .0201911     5.14   0.000     .0642688    .1434164
                             y98 |   .0228328   .0041643     5.48   0.000     .0146708    .0309947
                             y99 |   .0363819   .0051292     7.09   0.000     .0263289    .0464349
                             y00 |   .0977717   .0055072    17.75   0.000     .0869777    .1085656
                           _cons |   6.207889   .9118109     6.81   0.000     4.420773    7.995006
                    -------------+----------------------------------------------------------------
                         sigma_u |  .31933841
                         sigma_e |  .10651186
                             rho |  .89988885   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    Note that all the other coefficients are the same as in FE.
                    If you have several time variant regressors you will have to create a mean variable for each of them.

                    Comment


                    • #11
                      Hi Eric,

                      Should we subtract the original cocen values from the mean you calculated? Cocen_within should then be added to the regerssion, am I right? Also, why is cocen added to your regression?

                      Code:
                      generate cocen_within = cocen - cocenbar
                      However, the -re- specification assumes company fixed-effects are not correlated with the independent variables right? As I mentioned in the earlier posts, there's a high chance of correlation between the specific company and my independent variables..not sure how I can explicitly address this?
                      Last edited by Emily Heng; 06 Apr 2019, 23:17.

                      Comment


                      • #12
                        The estimated coefficients you get from CRE as formulated in my example is exactly what you would get from FE with the addition that now you have a coefficient estimated for the time invariant coefficients. By adding concenbar you are subtracting the mean from concen (look up the maths of the effects of adding a variable to a regression). The addition of concenbar controls for the correlation between the fixed effect and the time varying variables, which is why we can then use RE
                        Run the regression as I formulated it in my example.

                        Comment


                        • #13
                          Hi Eric,

                          Thanks so much for your reply. I wish to also add time-fixed effects, how would you do it?

                          Could you kindly explain how I should interpret the coefficient of cocen vs cocenbar? Which one would be equivalent to the coefficient of the time-variant independent variable in a fixed effects model? (pardon me as i have no background in econometrics)

                          Comment


                          • #14
                            Not much time available now. Trying to catch up with my work. Essentially CRE is a synthesis of RE and FE. If the coefficient on concenbar equals zero, the model reduces to RE, if not it is FE. The coefficient on concenbar should not be interpreted.

                            Comment

                            Working...
                            X