Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • fixed effects specification: dummies (LSDV) vs within estimator

    Hello everyone,


    I have a question on the various specifications of a fixed effect model. I always thought that within estimators using areg, xtreg, or reghdfe should (be able to) provide identical results as using simple dummies within your reg specification. In other words, that within estimators are identical to least squares dummy variables (LSDV).

    In the mock example below, I want to regress impshare on uncertainty. Both vary only in time (quarter). Now say I would want to include quarter fixed effects. This should absorb all variation in impshare and uncertainty, and thus result in no output for uncertainty. Indeed, that's what I get when I use areg.
    However, when I regress impshare on uncertainty and include quarter dummies (which again, I thought was identical to using the areg option), I do get an output for uncertainty, which puzzles me. Why is uncertainty now not perfectly collinear with quarter dummies and how do I interpret the uncertainty coefficient in this case?


    See data and code below. please ignore that it's an oversimplified database, resulting in the omission of standard errors etc. It's because I collapsed a panel dataset to this time series format for simplification. The coefficients and results regarding areg or dummies are the same.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long quarter double impshare float uncertainty
    201501 .08978211473287988    35
    201502  .0908441642096677    35
    201503 .09298928428359604    35
    201504 .09066902977290353    35
    201601  .0975096476492042    35
    201602 .09609810364723077    35
    201603 .09644018437561988 37.86
    201604 .09476250311679726 38.13
    201701 .09643089107585387 38.13
    201702 .09585398951901894 35.82
    201703    .09537700327528 38.89
    201704 .09612222201218018 38.64
    201801 .09723944204344515 38.54
    201802 .09668838998878439 36.94
    201803 .09647264244667098 43.23
    201804  .0966329961359868 53.52
    201901 .09828297726092321 56.51
    201902 .09214517032365498 49.93
    201903 .09308704924572131 54.45
    end

    Code:
     areg impshare uncertainty, absorb(quarter)  // uncertainty omitted due to collinearity with quarter; as expected
    
    reg impshare uncertainty i.quarter // a coefficient of 0.000169 for uncertainty, despite the inclusion of quarter fixed effects. how to interpret??

  • #2
    Look carefully at the note presented as the first line output from each of three regressions run on your example data.
    Code:
    . areg impshare uncertainty, absorb(quarter)
    note: uncertainty omitted because of collinearity.
    Code:
    . reg impshare uncertainty i.quarter
    note: 201903.quarter omitted because of collinearity.
    Code:
    . reg impshare i.quarter uncertainty 
    note: uncertainty omitted because of collinearity.

    Comment


    • #3
      Ha! so if I change the order of the variables, then I do get what I would expect.. Cool, thanks!

      I noticed this note before

      Code:
       note: 201903.quarter omitted because of collinearity.
      but didn't know what to make of that. why is he just excluding one level of quarter in that specification rather than the entire uncertainty variable?

      Comment


      • #4
        Because in general Stata has no way to determine what the user would prefer.

        With categorical variables the user can specify which category can be treated as the base level and excluded, or in the absence of guidance from the user, the first category - 201501.quarter in your data - will be chosen to be quietly excluded, as you will see from looking at the output of either reg command. There's no good way of instructing Stata "I'd prefer this nominally continuous variable to be excluded if in fact it is collinear with some combination of other variables."

        Comment


        • #5
          Dear William,

          Many thanks again for your answer. So I've learned that if two “independent” variables are collinear with each other, Stata will drop one, but it requires guidance on which one to drop. In this case, Stata simply drops the variable that is specified last (in this case 2019Q3).

          That still leaves me wondering on how to interpret the coefficient on the uncertainty variable (0.000169), when including the quarter dummies as specified above. Does this coefficient only refer to a specific time period? Or is it still a sort of overall, pooled, effect, with the dummy variables not acting as true fixed effects, but providing mere level changes in impshare for the different periods?

          Or perhaps I should simply not overthink this and just accept that this is not the way to specify a fixed effects model anyway and move on ?

          Comment


          • #6
            The perfect fit to your example data is instructive.

            For all the quarters other than 201501 and 201903, there is a fixed effects dummy whose fitted coefficient is exactly the difference between the actual value of impshare and the predicted value of impshare given the value of uncertainty in that quarter and the fitted constant and fitted coefficient on uncertainty. So regardless of the values of the fitted constant and the fitted coefficient on uncertainty, all of these quarters can be fit perfectly.

            For 201501 and 201903, there is no fixed effect dummy included, so the predicted values of impshare are a function of the value of uncertainty in those two quarters, and the fitted constant and fitted coefficient on uncertainty. Two points, so a line determined by two coefficients can be fitted exactly, and the lack of fit in all the other quarters will be absorbed by the fixed effects dummies for those quarters.

            I leave it as an exercise to the reader how to think of this. I come down on the side of not trying to express the meaning of a misspecified fixed effects model using terminology designed for expressing the meaning of a correctly specified fixed effects model.
            Last edited by William Lisowski; 05 Nov 2021, 09:41.

            Comment


            • #7
              This was a very useful note. Thank you for explaining this! Please correct me if I am wrong, does that mean absorb shifts FE/s to the start of regression equation?

              Code:
              areg y x, absorb(x1 x2)
              
              regress y i.x1 i.x2 x

              Comment


              • #8
                Fahad:
                not quite, as -areg- can absorb one categorical variable only:
                Code:
                . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
                (1978 automobile data)
                
                . regress price mpg i.foreign i.rep78
                
                      Source |       SS           df       MS      Number of obs   =        69
                -------------+----------------------------------   F(6, 62)        =      3.94
                       Model |   159087839         6  26514639.9   Prob > F        =    0.0021
                    Residual |   417709119        62  6737243.86   R-squared       =    0.2758
                -------------+----------------------------------   Adj R-squared   =    0.2057
                       Total |   576796959        68  8482308.22   Root MSE        =    2595.6
                
                ------------------------------------------------------------------------------
                       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                -------------+----------------------------------------------------------------
                         mpg |  -299.6068   63.34525    -4.73   0.000    -426.2322   -172.9815
                             |
                     foreign |
                    Foreign  |   1102.334   901.7772     1.22   0.226    -700.2928    2904.961
                             |
                       rep78 |
                          2  |   841.3622   2055.452     0.41   0.684    -3267.428    4950.153
                          3  |   1285.116   1901.486     0.68   0.502    -2515.901    5086.132
                          4  |   1155.571   1984.561     0.58   0.562     -2811.51    5122.652
                          5  |   2353.179   2130.577     1.10   0.274    -1905.784    6612.142
                             |
                       _cons |   10856.24   2266.757     4.79   0.000      6325.06    15387.43
                ------------------------------------------------------------------------------
                
                . areg price mpg, abs(foreign rep78)
                absorb():  too many variables specified
                r(103);
                
                . areg price mpg, abs(foreign)
                
                Linear regression, absorbing indicators          Number of obs     =        74
                Absorbed variable: foreign                       No. of categories =         2
                                                                 F(1, 71)          =     27.91
                                                                 Prob > F          =    0.0000
                                                                 R-squared         =    0.2838
                                                                 Adj R-squared     =    0.2637
                                                                 Root MSE          = 2530.9456
                
                ------------------------------------------------------------------------------
                       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                -------------+----------------------------------------------------------------
                         mpg |  -294.1955   55.69172    -5.28   0.000    -405.2417   -183.1494
                       _cons |   12430.83    1222.03    10.17   0.000     9994.169    14867.48
                ------------------------------------------------------------------------------
                F test of absorbed indicators: F(1, 71) = 6.371               Prob > F = 0.014
                
                
                
                . areg price mpg, abs(rep78)
                
                Linear regression, absorbing indicators          Number of obs     =        69
                Absorbed variable: rep78                         No. of categories =         5
                                                                 F(1, 63)          =     20.72
                                                                 Prob > F          =    0.0000
                                                                 R-squared         =    0.2584
                                                                 Adj R-squared     =    0.1995
                                                                 Root MSE          = 2605.7822
                
                ------------------------------------------------------------------------------
                       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                -------------+----------------------------------------------------------------
                         mpg |  -280.2615   61.57666    -4.55   0.000    -403.3126   -157.2103
                       _cons |   12112.77   1347.968     8.99   0.000      9419.07    14806.47
                ------------------------------------------------------------------------------
                F test of absorbed indicators: F(4, 63) = 1.072               Prob > F = 0.378
                
                .
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment

                Working...
                X