David Roodman's xtabond2: why difference in no of observations between GMMdiff and GMMsys

Eric de Souza

Join Date: Mar 2014
Posts: 587

David Roodman's xtabond2: why difference in no of observations between GMMdiff and GMMsys

04 Feb 2019, 09:49

Using the dataset abdata.dta I ran two commands, one with the noleveleq option and one without. The sample size changes from 751 with the noleveleq option to 891 without it.

Code:

webuse abdata
.
xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, robust small gmmstyle(L.n w k) ivstyle(yr1978-yr1984, equation(level)) noleveleq h(2)

Group variable: id                              Number of obs      =       751
Time variable : year                            Number of groups   =       140
Number of instruments = 98                      Obs per group: min =         5
F(12, 140)    =     70.71                                      avg =      5.36
Prob > F      =     0.000                                      max =         7
. 
 xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, robust small gmmstyle(L.n w k) ivstyle(yr1978-yr1984, equation(level)) h(2)
Group variable: id                              Number of obs      =       891
Time variable : year                            Number of groups   =       140
Number of instruments = 129                     Obs per group: min =         6
F(12, 139)    =    374.43                                      avg =      6.36
Prob > F      =     0.000                                      max =         8

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 10285
#2

04 Feb 2019, 15:29

The reported number of observations in system GMM corresponds to that used by the level equation (in establishing orthogonality conditions). For the FD equation, you lose a cross-section with differencing. So you have 140 firms (groups) in your example, and the level equation uses 891 observations, implying that the FD equation uses 891-140 = 751 observations.
1 like
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#3

05 Feb 2019, 07:19

To add a bit of confusion, the following command line replicates the xtabond2 results for the difference GMM estimator with xtdpdgmm:

Code:

. xtdpdgmm n L.n L(0/1).(w k) yr1978-yr1984, vce(robust) gmm(L.n w k, lag(1 .) model(diff)) nocons Group variable: id Number of obs = 891 Time variable: year Number of groups = 140 Moment conditions: linear = 98 Obs per group: min = 6 nonlinear = 0 avg = 6.364286 total = 98 max = 8

The reported number of observations is 891, the same as for the system GMM estimator. The argument here is that the difference GMM moment conditions can be rewritten as moment conditions for the level model, utilizing the full number of observations available in levels.

I personally find the number of observations reported by xtabond2 misleading because it sometimes refers to the levels and sometimes to the differences. For example, imagine you are estimating the same model but with a constant term in the level model. You can achieve this with xtabond2 as follows:

Code:

. xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, robust small gmmstyle(L.n w k, equation(diff)) h(2) Group variable: id Number of obs = 891 Time variable : year Number of groups = 140 Number of instruments = 99 Obs per group: min = 6 F(12, 139) = 70.80 avg = 6.36 Prob > F = 0.000 max = 8

The coefficient estimates are identical to the pure difference GMM estimator because the extra moment condition for the intercept is orthogonal to the other moment conditions. Yet, xtabond2 now reports again 891 observations.

https://www.kripfganz.de/stata/
Comment

Eric de Souza

Join Date: Mar 2014
Posts: 587

05 Feb 2019, 08:10

I had read Andrew's reply but was not convinced because the moment conditions for GMMsys include the moment conditions for GMMdiff. I just prepared a simple illustrations with a balanced panel (-webuse grunfeld-) and a simple AR(1) model and got a similar difference in observation count as in my original post. Looking at the list of instruments at the bottom of the results shows that xtabond2 is mis-counting the observations.

Code:

. webuse grunfeld
. xtabond2 invest L.invest, robust small gmmstyle(L.invest)  noleveleq h(2)
Dynamic panel-data estimation, one-step difference GMM
------------------------------------------------------------------------------
Group variable: company                         Number of obs      =       180
Time variable : year                            Number of groups   =        10
Number of instruments = 135                     Obs per group: min =        18
F(1, 10)      =     41.26                                      avg =     18.00
Prob > F      =     0.000                                      max =        18
------------------------------------------------------------------------------
             |               Robust
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      invest |
         L1. |   1.064796   .1657782     6.42   0.000     .6954187    1.434173
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(1/19).L.invest
------------------------------------------------------------------------------
 
. xtabond2 invest L.invest, robust small gmmstyle(L.invest) h(2)
Dynamic panel-data estimation, one-step system GMM
------------------------------------------------------------------------------
Group variable: company                         Number of obs      =       190
Time variable : year                            Number of groups   =        10
Number of instruments = 154                     Obs per group: min =        19
F(1, 9)       =    169.08                                      avg =     19.00
Prob > F      =     0.000                                      max =        19
------------------------------------------------------------------------------
             |               Robust
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      invest |
         L1. |   1.110371   .0853918    13.00   0.000     .9172014    1.303541
             |
       _cons |  -4.786261    9.99476    -0.48   0.643    -27.39598    17.82346
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(1/19).L.invest
Instruments for levels equation
  Standard
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    D.L.invest
------------------------------------------------------------------------------

Comment

David Roodman

Join Date: Jul 2014

Posts: 475
#5

05 Feb 2019, 08:36

Eric, in observations where a "GMM-style" instrument cannot be computed, the value 0 is imputed, as noted in your output. So the seeming unavailability of a GMM-style instrument does not restrict sample size.
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#6

05 Feb 2019, 09:11

David, thanks for coming in. I don't see why the observations per group should be different in the two estimations in my post #4. In both cases, one observations will be dropped because of the first difference and another because of the lagged dependent variable. This means that in both cases there should be 18 effective observations per group.
Comment
David Roodman

Join Date: Jul 2014

Posts: 475
#7

05 Feb 2019, 09:49

In the first case, the data only enter the regression in differences, so the lagged, differenced dependent variable is not available till the third period. In the second case, the data also enter undifferenced, so the lagged dependent variable is available in the second period. The "GMM-style" instruments for the levels observations in the second period might be all zero--but whether they are or are not depends on precisely how they are defined. And "IV-style" instruments, including the constant term, can also be available in the second period. So the second period is included in the definition of the "sample."

I think the interesting question is how "effective observations" could and should be defined. System GMM complicates the abstraction, "number of observations." Is it the number of observations in the difference equation? In the levels equation? The sum of the two? Almost invariably when an abstraction gets complicated upon operationalization, the "right" way to operationalize it depends on what you're trying to do with the result. In this case, we're actually doing almost nothing with the output, i.e., the number of observations! The only place it enters is in one of the factors in the small-sample correction to the error covariance matrix, which is G/(G-1) * N/(N-k) where G is the number of clusters, N the number of observations, and k the number of parameters. So switching between 180 and 190 here won't make much difference. Moreover, this formula is a Stata convention--a convention precisely because there are good arguments for it, and good arguments for alternatives. A similarly reasonable but debatable convention applies to how the standard errors so computed are interpreted--as producing t statistics with a particular number of degrees of freedom. See https://www.stata.com/support/faqs/s...te-of-variance.

A deep answer might try to articulate how much information is added by going from difference GMM to system GMM. Even that gets complicated because while for each group/individual, we may be adding more information, we also usually cluster observations within groups/individuals to adjust for the possibility that observations within them, and across equations, are not fully independent.

In view of this muddiness, I just made xtabond2 report the sample as the set of observations that enter in some way, and the number of observations is the size of that sample. In particular, in difference GMM it reports the number in the difference equation. In system GMM it reports the number in the levels equation.

--David
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#8

05 Feb 2019, 09:55

David: Thanks for the detail. Shall read carefully. Eric
Comment

Announcement

David Roodman's xtabond2: why difference in no of observations between GMMdiff and GMMsys

Comment

Comment

Comment

Comment

Comment

Comment

Comment