Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mundlak procedure

    Dear all,

    When using the Mundlak procedure, if one wishes to include year fixed effects (i.e. a dummy variable for each year except one) to control for year-specific shocks, should one also include the mean of each year's dummy variable? If I use the "mundlak" command, stata automatically includes these means when I have year dummies as right-hand side variables. However, I'm not sure if this is correct.

    Thank you!

  • #2
    Yes, in principle you must include the time averages of year dummies.

    However if your panel is balanced, that is, you have the same number of periods T for each cross sectional unit i, then your time averages of the year dummies will be dropped due to multicollinearity. So in this case you do not need to bother calculating the time averages of the year dummies.

    But if your panel is not balanced, you need to include the time averages of the dummies and they will not be dropped due to multicollinearity.

    Comment


    • #3
      This question appears often around here, so I will provide detailed analysis.

      Case 1: Balanced panel data

      Code:
      ***Balanced sample
      
      webuse grunfeld, clear
      
      keep if year<1940
      
      qui tab time, gen(year)
      
      foreach X of varlist invest mvalue kstock year1-year5 {
      
      egen `X'mean = mean(`X'), by(company)
      
      }
      
      xtreg invest mvalue kstock year2-year5, fe
      
      reg invest mvalue kstock year2-year5 mvaluemean kstockmean year2mean-year5mean
      Resulting in

      Code:
      . ***Balanced sample
      . 
      . webuse grunfeld, clear
      
      . 
      . keep if year<1940
      (150 observations deleted)
      
      . 
      . qui tab time, gen(year)
      
      . 
      . foreach X of varlist invest mvalue kstock year1-year5 {
        2. 
      . egen `X'mean = mean(`X'), by(company)
        3. 
      . }
      
      . 
      . xtreg invest mvalue kstock year2-year5, fe
      
      Fixed-effects (within) regression               Number of obs     =         50
      Group variable: company                         Number of groups  =         10
      
      R-sq:                                           Obs per group:
           within  = 0.6380                                         min =          5
           between = 0.5577                                         avg =        5.0
           overall = 0.5628                                         max =          5
      
                                                      F(6,34)           =       9.99
      corr(u_i, Xb)  = 0.1638                         Prob > F          =     0.0000
      
      ------------------------------------------------------------------------------
            invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
            mvalue |   .0599551   .0117349     5.11   0.000      .036107    .0838032
            kstock |  -.3416076   .1099002    -3.11   0.004    -.5649518   -.1182634
             year2 |   10.61359   12.52288     0.85   0.403    -14.83597    36.06315
             year3 |   24.93282   14.52805     1.72   0.095    -4.591731    54.45737
             year4 |    22.7716   14.48812     1.57   0.125    -6.671813      52.215
             year5 |   17.27716   16.03763     1.08   0.289    -15.31523    49.86954
             _cons |   51.61509   13.40755     3.85   0.000     24.36768    78.86251
      -------------+----------------------------------------------------------------
           sigma_u |  83.765483
           sigma_e |  26.122686
               rho |  .91136638   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      F test that all u_i=0: F(9, 34) = 31.22                      Prob > F = 0.0000
      
      . 
      . reg invest mvalue kstock year2-year5 mvaluemean kstockmean year2mean-year5mean
      note: year2mean omitted because of collinearity
      note: year3mean omitted because of collinearity
      note: year4mean omitted because of collinearity
      note: year5mean omitted because of collinearity
      
            Source |       SS           df       MS      Number of obs   =        50
      -------------+----------------------------------   F(8, 41)        =     14.42
             Model |  557641.992         8  69705.2489   Prob > F        =    0.0000
          Residual |  198233.399        41  4834.96094   R-squared       =    0.7377
      -------------+----------------------------------   Adj R-squared   =    0.6866
             Total |   755875.39        49  15426.0284   Root MSE        =    69.534
      
      ------------------------------------------------------------------------------
            invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
            mvalue |   .0599551   .0312361     1.92   0.062    -.0031274    .1230376
            kstock |  -.3416075   .2925347    -1.17   0.250    -.9323932    .2491782
             year2 |   10.61358   33.33366     0.32   0.752    -56.70511    77.93227
             year3 |   24.93281   38.67105     0.64   0.523    -53.16497    103.0306
             year4 |   22.77159   38.56477     0.59   0.558    -55.11155    100.6547
             year5 |   17.27714   42.68928     0.40   0.688    -68.93561    103.4899
        mvaluemean |   .0201061   .0322834     0.62   0.537    -.0450916    .0853039
        kstockmean |   .5480482   .3185861     1.72   0.093    -.0953494    1.191446
         year2mean |          0  (omitted)
         year3mean |          0  (omitted)
         year4mean |          0  (omitted)
         year5mean |          0  (omitted)
             _cons |  -27.32733   30.81893    -0.89   0.380    -89.56743    34.91276
      ------------------------------------------------------------------------------
      
      .
      Moral of the story: the time averages of the time dummies were dropped due to multicollinearity in the balanced sample, but the computer did not break. Still the estimates of our two main variables of interest mvalue and kstock are the same across the fixed effects, and the Mundlak regression.

      Comment


      • #4
        Case 2: Unbalanced panel data

        Code:
        ***Unbalanced sample
        
        webuse grunfeld, clear
        
        keep if year<1940
        
        drop if (company==1 | company==2) & year<1937
        
        drop if (company==3 | company==4) & year>1936
        
        drop if (company==5 | company==6) & year>1935
        
        
        qui tab time, gen(year)
        
        foreach X of varlist invest mvalue kstock year1-year5 {
        
        egen `X'mean = mean(`X'), by(company)
        
        }
        
        xtreg invest mvalue kstock year2-year5, fe
        
        reg invest mvalue kstock year2-year5 mvaluemean kstockmean year2mean-year5mean
        Resulting in

        Code:
        . ***Unbalanced sample
        .
        . webuse grunfeld, clear
        
        .
        . keep if year<1940
        (150 observations deleted)
        
        .
        . drop if (company==1 | company==2) & year<1937
        (4 observations deleted)
        
        .
        . drop if (company==3 | company==4) & year>1936
        (6 observations deleted)
        
        .
        . drop if (company==5 | company==6) & year>1935
        (8 observations deleted)
        
        .
        .
        . qui tab time, gen(year)
        
        .
        . foreach X of varlist invest mvalue kstock year1-year5 {
          2.
        . egen `X'mean = mean(`X'), by(company)
          3.
        . }
        
        .
        . xtreg invest mvalue kstock year2-year5, fe
        
        Fixed-effects (within) regression               Number of obs     =         32
        Group variable: company                         Number of groups  =         10
        
        R-sq:                                           Obs per group:
             within  = 0.9038                                         min =          1
             between = 0.0001                                         avg =        3.2
             overall = 0.0159                                         max =          5
        
                                                        F(6,16)           =      25.04
        corr(u_i, Xb)  = -0.4799                        Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
              invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              mvalue |   .0415765   .0103648     4.01   0.001     .0196042    .0635488
              kstock |  -.9826048   .1415481    -6.94   0.000    -1.282673   -.6825361
               year2 |   4.082768   10.23045     0.40   0.695    -17.60482    25.77035
               year3 |   22.32429    11.8815     1.88   0.079    -2.863374    47.51195
               year4 |    24.6022   11.68686     2.11   0.051    -.1728256    49.37723
               year5 |   33.88411   13.44504     2.52   0.023     5.381908    62.38632
               _cons |   132.3256   18.28009     7.24   0.000     93.57356    171.0777
        -------------+----------------------------------------------------------------
             sigma_u |  148.40276
             sigma_e |  16.729725
                 rho |    .987451   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(9, 16) = 31.97                      Prob > F = 0.0000
        
        .
        . reg invest mvalue kstock year2-year5 mvaluemean kstockmean year2mean-year5mean
        note: year4mean omitted because of collinearity
        note: year5mean omitted because of collinearity
        
              Source |       SS           df       MS      Number of obs   =        32
        -------------+----------------------------------   F(10, 21)       =     19.27
               Model |  447979.654        10  44797.9654   Prob > F        =    0.0000
            Residual |  48829.5933        21  2325.21873   R-squared       =    0.9017
        -------------+----------------------------------   Adj R-squared   =    0.8549
               Total |  496809.248        31  16026.1048   Root MSE        =    48.221
        
        ------------------------------------------------------------------------------
              invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              mvalue |   .0415765   .0298746     1.39   0.179    -.0205512    .1037042
              kstock |  -.9826048   .4079879    -2.41   0.025    -1.831062   -.1341476
               year2 |   4.082767   29.48749     0.14   0.891    -57.23983    65.40536
               year3 |   22.32429   34.24637     0.65   0.522    -48.89493     93.5435
               year4 |    24.6022   33.68533     0.73   0.473    -45.45027    94.65468
               year5 |   33.88411   38.75298     0.87   0.392    -46.70712    114.4753
          mvaluemean |   .0298491   .0310428     0.96   0.347    -.0347078    .0944061
          kstockmean |    1.17485   .4251216     2.76   0.012     .2907617    2.058939
           year2mean |  -134.1457   86.95541    -1.54   0.138    -314.9794    46.68799
           year3mean |   34.52447   149.3492     0.23   0.819    -276.0643    345.1132
           year4mean |          0  (omitted)
           year5mean |          0  (omitted)
               _cons |  -.8818338   35.81467    -0.02   0.981    -75.36252    73.59885
        ------------------------------------------------------------------------------
        Moral of the story: We again replicated the estimates from the fixed effects and the Mundlak regression on the main variables of interest, and notice that two of the time averages of the time dummies were not dropped. Two were dropped because I generated fairly elementary pattern of unbalancedness.

        What happens had we omitted the time averages of the time dummies? Here:

        Code:
        . reg invest mvalue kstock year2-year5 mvaluemean kstockmean
        
              Source |       SS           df       MS      Number of obs   =        32
        -------------+----------------------------------   F(8, 23)        =     21.68
               Model |  438652.336         8   54831.542   Prob > F        =    0.0000
            Residual |  58156.9119        23  2528.56139   R-squared       =    0.8829
        -------------+----------------------------------   Adj R-squared   =    0.8422
               Total |  496809.248        31  16026.1048   Root MSE        =    50.285
        
        ------------------------------------------------------------------------------
              invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              mvalue |   .0397905   .0304812     1.31   0.205    -.0232648    .1028457
              kstock |  -1.088032   .4070048    -2.67   0.014    -1.929985    -.246078
               year2 |  -3.964683   28.29357    -0.14   0.890     -62.4944    54.56503
               year3 |   33.02513   31.11164     1.06   0.299    -31.33419    97.38445
               year4 |    38.4382   29.83349     1.29   0.210    -23.27709    100.1535
               year5 |   49.85638   33.72022     1.48   0.153    -19.89921     119.612
          mvaluemean |   .0347599   .0312413     1.11   0.277    -.0298677    .0993875
          kstockmean |   1.352167   .4171152     3.24   0.004     .4892986    2.215036
               _cons |  -36.11844   20.56722    -1.76   0.092    -78.66497    6.428094
        ------------------------------------------------------------------------------
        The estimates on the main variables of interest are no longer equivalent in the fixed effects and the Mundlak regressions.
        Last edited by Joro Kolev; 11 Sep 2021, 02:34.

        Comment


        • #5
          Overall moral of the story: Always include the time averages of the time dummies in your Mundlak regression, unless you are absolutely sure that your panel is balanced. Depending on your pattern of unbalancedness, some of the time averages might be dropped due to multicollinearity... but omitting them results in a procedure which is not equivalent to the fixed effects estimator.

          Comment


          • #6
            Thank you Joro, that is super helpful!

            Comment

            Working...
            X