Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • David Roodman's xtabond2: why difference in no of observations between GMMdiff and GMMsys

    Using the dataset abdata.dta I ran two commands, one with the noleveleq option and one without. The sample size changes from 751 with the noleveleq option to 891 without it.
    Code:
    webuse abdata
    .
    xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, robust small gmmstyle(L.n w k) ivstyle(yr1978-yr1984, equation(level)) noleveleq h(2)
    
    Group variable: id                              Number of obs      =       751
    Time variable : year                            Number of groups   =       140
    Number of instruments = 98                      Obs per group: min =         5
    F(12, 140)    =     70.71                                      avg =      5.36
    Prob > F      =     0.000                                      max =         7
    . 
     xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, robust small gmmstyle(L.n w k) ivstyle(yr1978-yr1984, equation(level)) h(2)
    Group variable: id                              Number of obs      =       891
    Time variable : year                            Number of groups   =       140
    Number of instruments = 129                     Obs per group: min =         6
    F(12, 139)    =    374.43                                      avg =      6.36
    Prob > F      =     0.000                                      max =         8

  • #2
    The reported number of observations in system GMM corresponds to that used by the level equation (in establishing orthogonality conditions). For the FD equation, you lose a cross-section with differencing. So you have 140 firms (groups) in your example, and the level equation uses 891 observations, implying that the FD equation uses 891-140 = 751 observations.

    Comment


    • #3
      To add a bit of confusion, the following command line replicates the xtabond2 results for the difference GMM estimator with xtdpdgmm:
      Code:
      . xtdpdgmm n L.n L(0/1).(w k) yr1978-yr1984, vce(robust) gmm(L.n w k, lag(1 .) model(diff)) nocons
      
      Group variable: id                           Number of obs         =       891
      Time variable: year                          Number of groups      =       140
      
      Moment conditions:     linear =      98      Obs per group:    min =         6
                          nonlinear =       0                        avg =  6.364286
                              total =      98                        max =         8
      The reported number of observations is 891, the same as for the system GMM estimator. The argument here is that the difference GMM moment conditions can be rewritten as moment conditions for the level model, utilizing the full number of observations available in levels.

      I personally find the number of observations reported by xtabond2 misleading because it sometimes refers to the levels and sometimes to the differences. For example, imagine you are estimating the same model but with a constant term in the level model. You can achieve this with xtabond2 as follows:
      Code:
      . xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, robust small gmmstyle(L.n w k, equation(diff)) h(2)
      
      Group variable: id                              Number of obs      =       891
      Time variable : year                            Number of groups   =       140
      Number of instruments = 99                      Obs per group: min =         6
      F(12, 139)    =     70.80                                      avg =      6.36
      Prob > F      =     0.000                                      max =         8
      The coefficient estimates are identical to the pure difference GMM estimator because the extra moment condition for the intercept is orthogonal to the other moment conditions. Yet, xtabond2 now reports again 891 observations.
      https://twitter.com/Kripfganz

      Comment


      • #4
        I had read Andrew's reply but was not convinced because the moment conditions for GMMsys include the moment conditions for GMMdiff. I just prepared a simple illustrations with a balanced panel (-webuse grunfeld-) and a simple AR(1) model and got a similar difference in observation count as in my original post. Looking at the list of instruments at the bottom of the results shows that xtabond2 is mis-counting the observations.
        Code:
        . webuse grunfeld
        . xtabond2 invest L.invest, robust small gmmstyle(L.invest)  noleveleq h(2)
        Dynamic panel-data estimation, one-step difference GMM
        ------------------------------------------------------------------------------
        Group variable: company                         Number of obs      =       180
        Time variable : year                            Number of groups   =        10
        Number of instruments = 135                     Obs per group: min =        18
        F(1, 10)      =     41.26                                      avg =     18.00
        Prob > F      =     0.000                                      max =        18
        ------------------------------------------------------------------------------
                     |               Robust
              invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              invest |
                 L1. |   1.064796   .1657782     6.42   0.000     .6954187    1.434173
        ------------------------------------------------------------------------------
        Instruments for first differences equation
          GMM-type (missing=0, separate instruments for each period unless collapsed)
            L(1/19).L.invest
        ------------------------------------------------------------------------------
         
        . xtabond2 invest L.invest, robust small gmmstyle(L.invest) h(2)
        Dynamic panel-data estimation, one-step system GMM
        ------------------------------------------------------------------------------
        Group variable: company                         Number of obs      =       190
        Time variable : year                            Number of groups   =        10
        Number of instruments = 154                     Obs per group: min =        19
        F(1, 9)       =    169.08                                      avg =     19.00
        Prob > F      =     0.000                                      max =        19
        ------------------------------------------------------------------------------
                     |               Robust
              invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              invest |
                 L1. |   1.110371   .0853918    13.00   0.000     .9172014    1.303541
                     |
               _cons |  -4.786261    9.99476    -0.48   0.643    -27.39598    17.82346
        ------------------------------------------------------------------------------
        Instruments for first differences equation
          GMM-type (missing=0, separate instruments for each period unless collapsed)
            L(1/19).L.invest
        Instruments for levels equation
          Standard
            _cons
          GMM-type (missing=0, separate instruments for each period unless collapsed)
            D.L.invest
        ------------------------------------------------------------------------------

        Comment


        • #5
          Eric, in observations where a "GMM-style" instrument cannot be computed, the value 0 is imputed, as noted in your output. So the seeming unavailability of a GMM-style instrument does not restrict sample size.

          Comment


          • #6
            David, thanks for coming in. I don't see why the observations per group should be different in the two estimations in my post #4. In both cases, one observations will be dropped because of the first difference and another because of the lagged dependent variable. This means that in both cases there should be 18 effective observations per group.

            Comment


            • #7
              In the first case, the data only enter the regression in differences, so the lagged, differenced dependent variable is not available till the third period. In the second case, the data also enter undifferenced, so the lagged dependent variable is available in the second period. The "GMM-style" instruments for the levels observations in the second period might be all zero--but whether they are or are not depends on precisely how they are defined. And "IV-style" instruments, including the constant term, can also be available in the second period. So the second period is included in the definition of the "sample."

              I think the interesting question is how "effective observations" could and should be defined. System GMM complicates the abstraction, "number of observations." Is it the number of observations in the difference equation? In the levels equation? The sum of the two? Almost invariably when an abstraction gets complicated upon operationalization, the "right" way to operationalize it depends on what you're trying to do with the result. In this case, we're actually doing almost nothing with the output, i.e., the number of observations! The only place it enters is in one of the factors in the small-sample correction to the error covariance matrix, which is G/(G-1) * N/(N-k) where G is the number of clusters, N the number of observations, and k the number of parameters. So switching between 180 and 190 here won't make much difference. Moreover, this formula is a Stata convention--a convention precisely because there are good arguments for it, and good arguments for alternatives. A similarly reasonable but debatable convention applies to how the standard errors so computed are interpreted--as producing t statistics with a particular number of degrees of freedom. See https://www.stata.com/support/faqs/s...te-of-variance.

              A deep answer might try to articulate how much information is added by going from difference GMM to system GMM. Even that gets complicated because while for each group/individual, we may be adding more information, we also usually cluster observations within groups/individuals to adjust for the possibility that observations within them, and across equations, are not fully independent.

              In view of this muddiness, I just made xtabond2 report the sample as the set of observations that enter in some way, and the number of observations is the size of that sample. In particular, in difference GMM it reports the number in the difference equation. In system GMM it reports the number in the levels equation.

              --David

              Comment


              • #8
                David: Thanks for the detail. Shall read carefully. Eric

                Comment

                Working...
                X