Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pooled OLS for panel data

    Hello all,
    I'm estimating the effect of a time-invariant variable "DLA " on the profitability of firms. The DLA is a dummy variable. My data has 96 companies and 11 years. I understand a fixed effect model won't work because DLA will be perfectly collinear with i.firm. The code I used to test this was
    Code:
    xtreg margin DLA size leverage growth current i.year, fe vce(cluster company1)
    . The DLA estimate was omitted. I then turned to pool OLS using code
    Code:
    reg margin DLA size leverage growth current rd i.company1 i.year, vce(cluster company1)
    . To my surprise, STATA generated an estimate for DLA by omitting one company.

    I thought pooled OLS with dummies is the same as a fixed effect model with xtreg, but evidently, it is not. I'm not sure if the pooled OLS estimates is the right method and whether the estimated coefficient on DLA is unbiased.

  • #2
    The coefficient on DLA is not identified, even using regress. To see this, note that if you change the reference category for company1, the coefficient on DLA changes as well. Compare, e.g.,

    Code:
    reg margin DLA size leverage growth current rd ib10.company1 i.year, vce(cluster company1)
    and

    Code:
    reg margin DLA size leverage growth current rd ib29.company1 i.year, vce(cluster company1)
    Granted, it is more difficult to identify the collinearity with regress, but that does not mean that the command resolves this.

    Comment


    • #3
      Your reg command isn't really pooled OLS. Pooled OLS would not include representation of company.

      Putting the semantics aside, OLS with indicators ("dummies") is equivalent to a fixed effect model. You just aren't perceiving the equivalence because the two ouputs you are getting are based on different parameterizations of the same model, so the algebraic transformation that equates them is not immediately apparent.

      Let's start from the fundamental problem that you recognized: DLA is perfectly colinear with i.firm. In linear-algebra-speak that's said: the set of variables consisting of DLA and i.firm is linearly dependent. The set of variable i.firm by itself is not. So, it appears, the DLA and i.form set contains one extra, linearly redundant variable. And of course a model with a perfectly colinear set of variables is unidentified. So to identify the model we must do something to the model to remove the colinearity. One way of doing that would be to remove one of the variables from the set {DLA, i.firm}. And, it doesn't matter which one you remove: you will get a linearly independent set of variables and the resulting model will be identifiable. Well, -xtreg, fe- solves this problem b removing DLA. -reg ...DLA...i.firm- resolves it by removing one of the firm indicators (in addition to the one routinely omitted as the reference level of i.firm, which itself is done because if no reference variable were available, {_cons, i.firm, reference level of firm} would be a perfectly colinear set.)

      If you look at the outputs from your two analyses you will notice that, possibly with the exception of some minuscule rounding errors, the coefficients of all of the variables other than those involved in the colinearity, are exactly the same. This is the first clue that you may be dealing with two models that are just algebraic transforms. Better still, run -predict, xb- after both models and you will see that you get exactly the same predicted values of margins from both models (again, perhaps with some minuscule rounding errors). This verifies that the models are equivalent. In principle, you could even write down the algebraic equations the relate the coefficients of the variables in one model to the coefficients of the variables in the other model, but I don't want to get involved in inverting matrices to do that here.

      So, you were, in fact, correct, in your belief that regression with indicators is equivalent to a fixed effects regression (for linear models). However, you need to understand that in situations like this with colinearities that get broken by omitting a variable, the coefficients of all the variables that participate in the colinear relationship are meaningless numbers. In fact, it is possible to prove mathematically that you could prespecify the coefficient you wanted for any one of them, and there is a way of breaking the colinearity that would produce that desired result. (Evidently this can't be just about choosing which variable to remove, because there are only a finite number of those. Colinearities can also be broken by adding linear constraints to the model. In fact, from a linear algebra perspective, "removing" a variable is best thought of as constraining its coefficient to zero.)

      So the point is that the coefficients of DLA and i.firm1 in the -reg- model are not meaningful, and in particular, the coefficient of DLA is not interpretable as an effect of DLA. It's just an arbitrary number. It is mathematically impossible to get an estimate of DLA using a fixed-effects model here, nor by using anything that is algebraically equivalent to a fixed-effects model. To get an estimate of the effect of DLA you must do something else.

      Added: Crossed with #2.

      Comment


      • #4
        This is super helpful! Thank you so much, Clyde! It appears my choices are between a random effects and a hybrid model.

        Comment


        • #5
          Following Clyde's response, I’m choosing between a pooled OLS
          Code:
          reg margin DL $firmcontrol, vce(cluster company1)
          and a random effect model
          Code:
          xtreg margin_2 DL $firmcontrol, re vce(cluster company1)
          . I got very different estimates on the time-invariant variable DL from the two methods . I used the BB-L LM test to see which method is appropriate and it seems that the random effect is.
          Code:
          Breusch and Pagan Lagrangian multiplier test for random effects
          
                  margin_2[company1,t] = Xb + u[company1] + e[company1,t]
          
                  Estimated results:
                                   |       Var     SD = sqrt(Var)
                          ---------+-----------------------------
                          margin_2 |   .4770867       .6907146
                                 e |   .0563001       .2372764
                                 u |   .4469155       .6685174
          
                  Test: Var(u) = 0
                                       chibar2(01) =  1291.49
                                    Prob > chibar2 =   0.0000
          I'm wondering:
          • Am I interpreting this correctly that the random effects model is preferred because my model exhibits firm-level variance (cross-sectional heterogeneity)? But why would firm-level variance bias the pooled OLS estimation?
          • I understand that the assumption of the random effects model is that the unobserved factors are not correlated with my independent variables. How can I test whether this condition is met? I believe the Hausman test would not be appropriate here since I’m estimating a time-invariant variable and cannot run a fixed-effects model.

          Comment


          • #6
            Although it is not correct mathematically, intuitively you should view the random effect as an additional variable in the model. It's like a fixed effect, but without being problematic with respect to colinearity--think of it that way. So the bias of the coefficients in the OLS model is just like omitted variable bias. There is company-level variation in the outcome, which, if not accounted for by the random intercepts, is variance that is shared with the other predictors and gets incorrectly attributed to them.

            I don't know of any way you can try to test the assumption of independence of the unobserved factors with your independent variables in this situation. It is probably better to do a correlated random effects model using -xtreg, cre- if you have StataNow, or -xthybrid-, from SSC, if you don't.

            Comment


            • #7
              Understood. Appreciate your response!

              Comment


              • #8
                Stevie:
                take a look at The Stata Blog » Fixed effects or random effects: The Mundlak approach
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  In addition to the great responses posted here, I thought OP may find useful to refer to the Frisch Waugh Lovell theorem, and the equivalence between least squares dummy variables models and fixed effect models. One reference could be found here: https://www.hbs.edu/research-computi...fwltheorem.pdf.

                  Comment

                  Working...
                  X