Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimating fixed-effects panel linear regression using gmm

    I am trying to use `gmm` to replicate the results of using `xtreg, fe`. In short, I'm looking for a way to control for group fixed-effects in `gmm`. Here is an example using Stata's built-in data:


    Code:
    webuse nlswork, clear
    xtset idcode
    //OLS regression
    regress ln_wage age collgrad 
    //fixed-effects linear regression
    xtreg ln_wage age collgrad, fe
    //gmm estimation, results match these of OLS, how to include idcode fixed-effects to match xtreg?
    gmm (ln_wage - {xb: age collgrad _cons}), instruments(age collgrad)

    Of course, in this simple example, I can use `xtreg` but I need `gmm` for a more involved example where I have multiple equations with endogenous variables in a panel data.

    The results of the code above are pasted below.



    Code:
    . webuse nlswork, clear
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . xtset idcode
           panel variable:  idcode (unbalanced)
    
    . //OLS regression
    . regress ln_wage age collgrad 
    
          Source |       SS           df       MS      Number of obs   =    28,510
    -------------+----------------------------------   F(2, 28507)     =   2965.19
           Model |  1122.22339         2  561.111694   Prob > F        =    0.0000
        Residual |  5394.46676    28,507  .189233057   R-squared       =    0.1722
    -------------+----------------------------------   Adj R-squared   =    0.1721
           Total |  6516.69015    28,509  .228583611   Root MSE        =    .43501
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0162795   .0003895    41.79   0.000      .015516     .017043
        collgrad |   .3988395   .0069794    57.15   0.000     .3851595    .4125194
           _cons |   1.135051    .011479    98.88   0.000     1.112552    1.157551
    ------------------------------------------------------------------------------
    
    . //fixed-effects linear regression
    . xtreg ln_wage age collgrad, fe
    note: collgrad omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs     =     28,510
    Group variable: idcode                          Number of groups  =      4,710
    
    R-sq:                                           Obs per group:
         within  = 0.1026                                         min =          1
         between = 0.0877                                         avg =        6.1
         overall = 0.0774                                         max =         15
    
                                                    F(1,23799)        =    2720.20
    corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0181349   .0003477    52.16   0.000     .0174534    .0188164
        collgrad |          0  (omitted)
           _cons |   1.148214   .0102579   111.93   0.000     1.128107     1.16832
    -------------+----------------------------------------------------------------
         sigma_u |  .40635023
         sigma_e |  .30349389
             rho |  .64192015   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(4709, 23799) = 8.81                 Prob > F = 0.0000
    
    . //gmm estimations, results match these of OLS, how to include idcode fixed-effects?
    . gmm (ln_wage - {xb: age collgrad _cons}), instruments(age collgrad)
    
    Step 1
    Iteration 0:   GMM criterion Q(b) =  2.8447984  
    Iteration 1:   GMM criterion Q(b) =  7.453e-28  
    Iteration 2:   GMM criterion Q(b) =  8.448e-33  
    
    Step 2
    Iteration 0:   GMM criterion Q(b) =  4.585e-32  
    Iteration 1:   GMM criterion Q(b) =  4.585e-32  (backed up)
    
    note: model is exactly identified
    
    GMM estimation 
    
    Number of parameters =   3
    Number of moments    =   3
    Initial weight matrix: Unadjusted                 Number of obs   =     28,510
    GMM weight matrix:     Robust
    
    ------------------------------------------------------------------------------
                 |               Robust
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0162795   .0004112    39.59   0.000     .0154737    .0170854
        collgrad |   .3988395   .0071752    55.59   0.000     .3847763    .4129026
           _cons |   1.135051   .0114914    98.77   0.000     1.112529    1.157574
    ------------------------------------------------------------------------------
    Instruments for equation 1: age collgrad _cons

  • #2
    You can estimate the fixed effects model using OLS (the so called least squares dummy variables or LSDV estimator). OLS requires you to invert a matrix as the OLS estimator of beta is

    $$\hat{\beta} = \left(X^{\prime}X\right)^{-1} X^{\prime}Y.$$

    So you need to include the dummies in the GMM syntax, but the difficulty lies in inversion (LSDV) or having a large set of instruments (GMM) with too many dummies.


    Code:
    webuse grunfeld
    xtreg invest mvalue kstock, fe
    gmm (invest - {xb: mvalue kstock i.company} - {b0}), ///
    instruments(mvalue kstock i.company)

    Result:

    Code:
    . xtreg invest mvalue kstock, fe
    
    Fixed-effects (within) regression               Number of obs     =        200
    Group variable: company                         Number of groups  =         10
    
    R-sq:                                           Obs per group:
         within  = 0.7668                                         min =         20
         between = 0.8194                                         avg =       20.0
         overall = 0.8060                                         max =         20
    
                                                    F(2,188)          =     309.01
    corr(u_i, Xb)  = -0.1517                        Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
          invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          mvalue |   .1101238   .0118567     9.29   0.000     .0867345    .1335131
          kstock |   .3100653   .0173545    17.87   0.000     .2758308    .3442999
           _cons |  -58.74393   12.45369    -4.72   0.000    -83.31086     -34.177
    -------------+----------------------------------------------------------------
         sigma_u |  85.732501
         sigma_e |  52.767964
             rho |  .72525012   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(9, 188) = 49.18                     Prob > F = 0.0000
    
    .
    . gmm (invest - {xb: mvalue kstock i.company} - {b0}), instruments(mvalue kstock i.company)
    
    Step 1
    Iteration 0:   GMM criterion Q(b) =   65486.14  
    Iteration 1:   GMM criterion Q(b) =  1.134e-19  
    Iteration 2:   GMM criterion Q(b) =  9.790e-27  
    
    Step 2
    Iteration 0:   GMM criterion Q(b) =  3.332e-28  
    Iteration 1:   GMM criterion Q(b) =  6.322e-32  
    
    note: model is exactly identified
    
    GMM estimation
    
    Number of parameters =  12
    Number of moments    =  12
    Initial weight matrix: Unadjusted                 Number of obs   =        200
    GMM weight matrix:     Robust
    
    ------------------------------------------------------------------------------
                 |               Robust
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          mvalue |   .1101238   .0187878     5.86   0.000     .0733004    .1469472
          kstock |   .3100653   .0414913     7.47   0.000     .2287439    .3913868
                 |
         company |
              2  |   172.2025   43.95731     3.92   0.000     86.04776    258.3573
              3  |  -165.2751   42.02834    -3.93   0.000    -247.6492   -82.90111
              4  |    42.4874   63.02744     0.67   0.500    -81.04411    166.0189
              5  |  -44.32013    70.1248    -0.63   0.527    -181.7622    93.12195
              6  |   47.13539   68.03018     0.69   0.488    -86.20131    180.4721
              7  |   3.743212    71.2363     0.05   0.958    -135.8774    143.3638
              8  |   12.75103   64.24155     0.20   0.843    -113.1601    138.6621
              9  |  -16.92558   68.03471    -0.25   0.804    -150.2712      116.42
             10  |   63.72884   75.32962     0.85   0.398    -83.91449    211.3722
    -------------+----------------------------------------------------------------
             /b0 |  -70.29669   76.65958    -0.92   0.359    -220.5467    79.95333
    ------------------------------------------------------------------------------
    Instruments for equation 1: mvalue kstock 1b.company 2.company 3.company 4.company 5.company 6.company 7.company 8.company
        9.company 10.company _cons
    Last edited by Andrew Musau; 16 Jun 2019, 14:57.

    Comment


    • #3
      Thank you for your answer. Indeed I have a large number of dummies (1357). gmm is chocking when explicitly adding them to the equations and instruments.

      I'm wondering if it is possible to avoid going this route, through writing a custom moment program. I found one online for xtpoisson (Slide 47), but I don't know how/if this can be adapted to a linear link function.

      Comment


      • #4
        In the case of a balanced panel, it is not difficult to construct the within estimator. The fixed effects model is

        $$y_{it}= \beta^{\prime}x_{it}+\gamma^{\prime}z_{i}+ \eta_{i}+ u_{it}\;\;(i=1, \cdots, N; \;t= 1, \cdots, T)$$

        where the \(z\) variables are time invariant, the \(x\) variables are time varying and \(\eta\) are unobserved individual characteristics. The within estimator takes deviations from the individual specific means

        $$\bar{y_{i}} = \frac{1}{T}\sum_{t=1}^{T} y_{it} \;\text{and}\: \bar{x_{i}} = \frac{1}{T}\sum_{t=1}^{T} x_{it}$$

        to wipe out the unobserved individual characteristics. Applying the within transformation, the model becomes

        $$y_{i}^{\ast}= X_{i}^{\ast}\beta_{i} + u_{i}^{\ast}\; (i= 1, \cdots, N)$$

        where $$y_{i}^{\ast} = y_{i}-\bar{y_{i}}l_{T}\; \text{and}\; X_{i}^{\ast} = X_{i}-l_{T}\bar{x_{i}}^{\prime}$$

        and \(l_{T}\) is a vector of ones. Therefore, we can simply apply the within transformation beforehand and estimate \(y_{i}^{\ast}\) using OLS or GMM.


        Code:
        webuse grunfeld
        *WITHIN TRANSFORMATION
        foreach var in invest mvalue kstock{
        bys company: egen m`var'= mean(`var')
        gen md`var'=`var' -m`var'
        }
        *FIXED EFFECTS UNTRANSFORMED VARS (XTREG)
        xtreg invest mvalue kstock, fe
        *GMM MEAN DEVIATED VARS
        gmm (mdinvest - {xb: mdmvalue mdkstock} - {b0}), instruments(mdmvalue mdkstock)

        Result:

        Code:
        . xtreg invest mvalue kstock, fe
        
        Fixed-effects (within) regression               Number of obs     =        200
        Group variable: company                         Number of groups  =         10
        
        R-sq:                                           Obs per group:
             within  = 0.7668                                         min =         20
             between = 0.8194                                         avg =       20.0
             overall = 0.8060                                         max =         20
        
                                                        F(2,188)          =     309.01
        corr(u_i, Xb)  = -0.1517                        Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
              invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              mvalue |   .1101238   .0118567     9.29   0.000     .0867345    .1335131
              kstock |   .3100653   .0173545    17.87   0.000     .2758308    .3442999
               _cons |  -58.74393   12.45369    -4.72   0.000    -83.31086     -34.177
        -------------+----------------------------------------------------------------
             sigma_u |  85.732501
             sigma_e |  52.767964
                 rho |  .72525012   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(9, 188) = 49.18                     Prob > F = 0.0000
        
        .  
        . gmm (mdinvest - {xb: mdmvalue mdkstock} - {b0}), instruments(mdmvalue mdkstock)
        
        Step 1
        Iteration 0:   GMM criterion Q(b) =  8604.3705  
        Iteration 1:   GMM criterion Q(b) =  6.568e-23  
        Iteration 2:   GMM criterion Q(b) =  2.256e-29  
        
        Step 2
        Iteration 0:   GMM criterion Q(b) =  1.507e-33  
        Iteration 1:   GMM criterion Q(b) =  1.458e-33  
        
        note: model is exactly identified
        
        GMM estimation
        
        Number of parameters =   3
        Number of moments    =   3
        Initial weight matrix: Unadjusted                 Number of obs   =        200
        GMM weight matrix:     Robust
        
        ------------------------------------------------------------------------------
                     |               Robust
                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
            mdmvalue |   .1101238   .0187877     5.86   0.000     .0733006     .146947
            mdkstock |   .3100653   .0414913     7.47   0.000     .2287439    .3913868
        -------------+----------------------------------------------------------------
                 /b0 |   1.93e-08    3.61759     0.00   1.000    -7.090345    7.090345
        ------------------------------------------------------------------------------
        Instruments for equation 1: mdmvalue mdkstock _cons
        Last edited by Andrew Musau; 17 Jun 2019, 04:57.

        Comment


        • #5
          Again, thank you for your detailed response. I applied the technique with my data and the results of gmm are very close to xtreg, fe but are slightly off. I'm assuming that this is because the panel is imbalanced.
          1. What are the hazards of applying this technique with an imbalanced panel?
          2. Why does the intercept (cons) differ widely across the two estimations?

          Code:
          . xtdescribe
          
                 p:  1, 2, ..., 1357                                   n =       1357
                 q:  1996q2, 1996q3, ..., 2016q4                       T =         83
                     Delta(q) = 1 quarter
                     Span(q)  = 83 periods
                     (p*q uniquely identifies each observation)
          
          Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                                   1       8      12        17        25      45      79
          
               Freq.  Percent    Cum. |  Pattern
           ---------------------------+-------------------------------------------------------------------------------------
                 42      3.10    3.10 |  ....................................................................111111111111111
                 41      3.02    6.12 |  ......................................................................1111111111111
                 40      2.95    9.06 |  .......................................................................111111111111
                 39      2.87   11.94 |  ...........................................................................11111111
                 37      2.73   14.66 |  ........................................................................11111111111
                 37      2.73   17.39 |  .................................................................111111111111111111
                 35      2.58   19.97 |  ..............................................................111111111111111111111
                 34      2.51   22.48 |  ...................................................................1111111111111111
                 33      2.43   24.91 |  ................................................................1111111111111111111
               1019     75.09  100.00 | (other patterns)
           ---------------------------+-------------------------------------------------------------------------------------
               1357    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
          
          . xtreg DEBT COORD EVOL, fe
          
          Fixed-effects (within) regression               Number of obs     =     25,219
          Group variable: p                               Number of groups  =      1,344
          
          R-sq:                                           Obs per group:
               within  = 0.1172                                         min =          1
               between = 0.1118                                         avg =       18.8
               overall = 0.0945                                         max =         78
          
                                                          F(2,23873)        =    1584.10
          corr(u_i, Xb)  = 0.1391                         Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
                  DEBT |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                 COORD |    .045273   .0071443     6.34   0.000     .0312696    .0592763
                  EVOL |  -.9192439   .0166203   -55.31   0.000    -.9518207   -.8866671
                 _cons |   9.609221   .0085186  1128.02   0.000     9.592524    9.625918
          -------------+----------------------------------------------------------------
               sigma_u |  1.3278641
               sigma_e |  .69662955
                   rho |  .78417198   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(1343, 23873) = 60.54                Prob > F = 0.0000
          
          . gmm (mdDEBT - {x: mdCOORD mdEVOL _cons}), instruments(mdCOORD mdEVOL)
          note: 58 missing values returned for equation 1 at initial values
          
          Step 1
          Iteration 0:   GMM criterion Q(b) =   .0653803  
          Iteration 1:   GMM criterion Q(b) =  4.890e-27  
          Iteration 2:   GMM criterion Q(b) =  2.564e-34  
          
          Step 2
          Iteration 0:   GMM criterion Q(b) =  4.461e-34  
          Iteration 1:   GMM criterion Q(b) =  3.827e-34  
          
          note: model is exactly identified
          
          GMM estimation 
          
          Number of parameters =   3
          Number of moments    =   3
          Initial weight matrix: Unadjusted                 Number of obs   =     25,219
          GMM weight matrix:     Robust
          
          ------------------------------------------------------------------------------
                       |               Robust
                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
               mdCOORD |   .0447505   .0080898     5.53   0.000     .0288947    .0606063
                mdEVOL |  -.9188268   .0233375   -39.37   0.000    -.9645675   -.8730861
                 _cons |    .064924   .0043247    15.01   0.000     .0564477    .0734004
          ------------------------------------------------------------------------------
          Instruments for equation 1: mdCOORD mdEVOL _cons

          Comment


          • #6
            For unbalanced panels, demean using xtreg's estimation sample to get the same results.

            Code:
            webuse nlswork, clear
            xtset idcode year
            xtreg ln_w age ttl_exp tenure not_smsa south, fe
            preserve
            keep if e(sample)
            foreach var in ln_wage age ttl_exp tenure not_smsa south{
            bys idcode: egen m`var'= mean(`var')
            gen md`var'=`var' -m`var'
            }
            regress mdln_wage mdage mdttl_exp mdtenure mdnot_smsa mdsouth
            restore
            Why does the intercept (cons) differ widely across the two estimations?
            The constant is meaningless in fixed effects. Bill Gould has written an FAQ on how the constant in xtreg,fe is computed, but this is just Stata's arbitrary way of computing it.

            Comment


            • #7
              In addition to Andrew's many good comments:
              • xtreg, fe computes the constant as the sample average for the residuals of the untransformed model (before taking deviations from within-group means).
              • gmm in your case effectively computes the constant as the sample average for the residuals of the transformed model, which is zero by construction if you have computed the within-group means correctly. The reason that it is nonzero in your case is most likely that you have missing values in some of your variables DEBT COORD EVOL but not all of them. As a consequence, you have computed the within-group means for at least one variable based on a sample that is larger than the estimation sample. The deviations from this wihin-group mean then do not necessarily have a mean zero in the estimation sample.
              https://www.kripfganz.de/stata/

              Comment


              • #8
                Thank you Andrew and Sebastian,

                I have two followup questions on gmm:
                1. Is it possible to transform the data to get results similar to random-effects xtreg, re?
                2. It seems that Hansen J is the only goodness-of-fit measure that gmm returns. Is it acceptable to use it to assess the model overall fitness? It is usually used to determine the quality of the instruments. Also, what if Hansen J rejects H0. This means the instruments are not good because they correlate with the error term, but what does it mean for the model overall?
                3. Finally, I'm experimenting with sem and gsem that allow for specifying a system of equations just like gmm. I'm not an expert in SEM but the manual mentions that the method adf fits the model using gmm. I'm wondering if so, one can claim that a model fitted is gmm. I'm asking because gsem allows for including a latent variable at the panel level.

                Comment


                • #9
                  FYI

                  It looks like Stata has a built-in command implementing the within transformation coded above

                  Code:
                  xtdata, fe

                  Comment

                  Working...
                  X