Hi everyone,
I am using xtabond2 to run difference and system GMM regressions. I'm working in Stata version 15.1, and I'm using, as far as I can tell, the most recent version of xtabond2 (3.6.3 30 September 2015).
I was hoping someone would be able to help me understand why results change when I run a system GMM regression and drop the first 2 periods in the data. In the example below I use twice lagged levels as instruments for the differenced equations, and once lagged differences as instruments for the equations in levels. Since constructing these instruments requires observations in t-2, I assumed that excluding the first two periods would not affect results. However, the example below, based on Arellano and Bond’s data, seems to suggest otherwise. This is an annual dataset of firms for the period 1976-1984. As you can see, excluding years 1976 and 1977 changes results.
I looked at the instrument matrix e(Z) in both cases. When the first two years are not dropped, the constant shows up as an instrument for the equation in levels in 1977 (_cons has a value of 1) while all the other instruments are "zeroed out". When the first two years are dropped, _cons takes on a value of 0 for the equation in levels in 1977, as do the other instruments. I paste an extract of the instrument matrices below, for the firm with id = 5 (as this is a firm with data going back to 1976). I highlight the discrepancy in red.
First estimation:
Now with the first two periods excluded (second estimation):
How should I interpret what is happening? I realise that GMM "zeroes out" missing values for the instruments but I thought this left the moment conditions, and hence results, unaffected? Or does this only leave results unchanged asymptotically? I was not expecting the zeroeing out to bring an extra year of data into play; it seems as if the equation in levels for 1977 is used as part of the estimation, using only the constant as an instrument? Here, the effect on results is small, but I have other examples where the changes in results are more substantial (e.g. a static version of the above model).
For difference GMM excluding the first two years makes no difference to results. Of course, when I use xtabond2 to estimate only the equation in levels then again results change when I drop the first two periods.
Perhaps this is also relevant: using xtabond2 to estimate a static model with system GMM on only the first two years in the dataset returns some results:
Here, the only available instrument is the constant for the equations in levels in 1976 and 1977 -- see this extract from the instrument matrix e(Z) (again for the firm with id = 5):
I would be grateful for any comments/answers that can help me understand this issue better.
Best,
Nicolas
I am using xtabond2 to run difference and system GMM regressions. I'm working in Stata version 15.1, and I'm using, as far as I can tell, the most recent version of xtabond2 (3.6.3 30 September 2015).
I was hoping someone would be able to help me understand why results change when I run a system GMM regression and drop the first 2 periods in the data. In the example below I use twice lagged levels as instruments for the differenced equations, and once lagged differences as instruments for the equations in levels. Since constructing these instruments requires observations in t-2, I assumed that excluding the first two periods would not affect results. However, the example below, based on Arellano and Bond’s data, seems to suggest otherwise. This is an annual dataset of firms for the period 1976-1984. As you can see, excluding years 1976 and 1977 changes results.
Code:
. clear all . webuse abdata . xtabond2 n L.n w k, gmm(n w k, laglimits(2 2) collapse) twostep robust small svmat Favoring speed over space. To switch, type or click on mata: mata set matafavor space, perm. Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------ Group variable: id Number of obs = 891 Time variable : year Number of groups = 140 Number of instruments = 7 Obs per group: min = 6 F(3, 139) = 196.28 avg = 6.36 Prob > F = 0.000 max = 8 ------------------------------------------------------------------------------ | Corrected n | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- n | L1. | .6422413 .2478509 2.59 0.011 .1521961 1.132286 | w | -1.031833 .402151 -2.57 0.011 -1.826957 -.2367086 k | .2615404 .0976378 2.68 0.008 .0684931 .4545878 _cons | 3.701805 1.545091 2.40 0.018 .6468852 6.756725 ------------------------------------------------------------------------------ Instruments for first differences equation GMM-type (missing=0, separate instruments for each period unless collapsed) L2.(n w k) collapsed Instruments for levels equation Standard _cons GMM-type (missing=0, separate instruments for each period unless collapsed) DL.(n w k) collapsed ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -2.66 Pr > z = 0.008 Arellano-Bond test for AR(2) in first differences: z = -1.35 Pr > z = 0.176 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(3) = 6.86 Prob > chi2 = 0.077 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(3) = 2.40 Prob > chi2 = 0.494 (Robust, but weakened by many instruments.) Difference-in-Hansen tests of exogeneity of instrument subsets: GMM instruments for levels Hansen test excluding group: chi2(0) = 0.00 Prob > chi2 = . Difference (null H = exogenous): chi2(3) = 2.40 Prob > chi2 = 0.494 * MODEL WITH FIRST TWO PERIODS EXCLUDED . xtabond2 n L.n w k if year > 1977, gmm(n w k, laglimits(2 2) collapse) twostep robust small svmat Favoring speed over space. To switch, type or click on mata: mata set matafavor space, perm. Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------ Group variable: id Number of obs = 811 Time variable : year Number of groups = 140 Number of instruments = 7 Obs per group: min = 5 F(3, 139) = 214.70 avg = 5.79 Prob > F = 0.000 max = 7 ------------------------------------------------------------------------------ | Corrected n | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- n | L1. | .6765348 .2384935 2.84 0.005 .2049907 1.148079 | w | -.9759851 .3861351 -2.53 0.013 -1.739443 -.2125274 k | .2468775 .0970336 2.54 0.012 .0550248 .4387303 _cons | 3.476404 1.482807 2.34 0.020 .5446311 6.408178 ------------------------------------------------------------------------------ Instruments for first differences equation GMM-type (missing=0, separate instruments for each period unless collapsed) L2.(n w k) collapsed Instruments for levels equation Standard _cons GMM-type (missing=0, separate instruments for each period unless collapsed) DL.(n w k) collapsed ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -2.73 Pr > z = 0.006 Arellano-Bond test for AR(2) in first differences: z = -1.40 Pr > z = 0.161 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(3) = 8.12 Prob > chi2 = 0.044 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(3) = 2.95 Prob > chi2 = 0.399 (Robust, but weakened by many instruments.) Difference-in-Hansen tests of exogeneity of instrument subsets: GMM instruments for levels Hansen test excluding group: chi2(0) = 0.00 Prob > chi2 = . Difference (null H = exogenous): chi2(3) = 2.95 Prob > chi2 = 0.399
First estimation:
Code:
Diff eq: Diff eq: Diff eq: Levels eq: Levels eq: Levels eq:
L2. L2. L2. LD. LD. LD.
_cons n w k n w k
5, 1976 0 0 0 0 0 0 0
5, 1977 0 0 0 0 0 0 0
5, 1978 0 4.4621887 3.0268579 3.1081855 0 0 0
5, 1979 0 4.4670568 2.905709 3.1032538 0 0 0
5, 1980 0 4.4659081 2.8979485 3.2255337 0 0 0
5, 1981 0 4.5042443 2.9008501 3.2328379 0 0 0
5, 1982 0 4.490881 2.9595506 3.3407183 0 0 0
5, 1983 0 0 0 0 0 0 0
5, 1984 0 0 0 0 0 0 0
5, 1976 0 0 0 0 0 0 0
5, 1977 1 0 0 0 0 0 0
5, 1978 1 0 0 0 .00486803 -.12114882 -.00493169
5, 1979 1 0 0 0 -.0011487 -.00776052 .12227988
5, 1980 1 0 0 0 .03833628 .00290155 .00730419
5, 1981 1 0 0 0 -.01336336 .05870056 .10788035
5, 1982 1 0 0 0 -.07566118 .02735281 -.09050274
5, 1983 0 0 0 0 0 0 0
5, 1984 0 0 0 0 0 0 0
Code:
Diff eq: Diff eq: Diff eq: Levels eq: Levels eq: Levels eq:
L2. L2. L2. LD. LD. LD.
_cons n w k n w k
5, 1976 0 0 0 0 0 0 0
5, 1977 0 0 0 0 0 0 0
5, 1978 0 4.4621887 3.0268579 3.1081855 0 0 0
5, 1979 0 4.4670568 2.905709 3.1032538 0 0 0
5, 1980 0 4.4659081 2.8979485 3.2255337 0 0 0
5, 1981 0 4.5042443 2.9008501 3.2328379 0 0 0
5, 1982 0 4.490881 2.9595506 3.3407183 0 0 0
5, 1983 0 0 0 0 0 0 0
5, 1984 0 0 0 0 0 0 0
5, 1976 0 0 0 0 0 0 0
5, 1977 0 0 0 0 0 0 0
5, 1978 1 0 0 0 .00486803 -.12114882 -.00493169
5, 1979 1 0 0 0 -.0011487 -.00776052 .12227988
5, 1980 1 0 0 0 .03833628 .00290155 .00730419
5, 1981 1 0 0 0 -.01336336 .05870056 .10788035
5, 1982 1 0 0 0 -.07566118 .02735281 -.09050274
5, 1983 0 0 0 0 0 0 0
5, 1984 0 0 0 0 0 0 0
For difference GMM excluding the first two years makes no difference to results. Of course, when I use xtabond2 to estimate only the equation in levels then again results change when I drop the first two periods.
Perhaps this is also relevant: using xtabond2 to estimate a static model with system GMM on only the first two years in the dataset returns some results:
Code:
. xtabond2 n w k if year < 1978, gmm(w k, laglimits(2 2) collapse) twostep robust small svmat Favoring speed over space. To switch, type or click on mata: mata set matafavor space, perm. Warning: Two-step estimated covariance matrix of moments is singular. Using a generalized inverse to calculate optimal weighting matrix for two-step estimation. Difference-in-Sargan/Hansen statistics may be negative. Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------ Group variable: id Number of obs = 218 Time variable : year Number of groups = 138 Number of instruments = 1 Obs per group: min = 1 F(2, 137) = 47.72 avg = 1.58 Prob > F = 0.000 max = 2 ------------------------------------------------------------------------------ | Corrected n | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- w | .37529 .0384159 9.77 0.000 .2993251 .4512549 k | 0 (omitted) _cons | 0 (omitted) ------------------------------------------------------------------------------ Instruments for first differences equation GMM-type (missing=0, separate instruments for each period unless collapsed) L2.(w k) collapsed Instruments for levels equation Standard _cons GMM-type (missing=0, separate instruments for each period unless collapsed) DL.(w k) collapsed ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = . Pr > z = . Arellano-Bond test for AR(2) in first differences: z = . Pr > z = . ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(-2) = 0.00 Prob > chi2 = . (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(-2) = 0.00 Prob > chi2 = . (Robust, but weakened by many instruments.)
Code:
Diff eq: Diff eq: Levels eq: Levels eq: L2. L2. LD. LD. _cons w k w k 5, 1976 0 0 0 0 0 5, 1977 0 0 0 0 0 5, 1978 0 0 0 0 0 5, 1979 0 0 0 0 0 5, 1980 0 0 0 0 0 5, 1981 0 0 0 0 0 5, 1982 0 0 0 0 0 5, 1983 0 0 0 0 0 5, 1984 0 0 0 0 0 5, 1976 1 0 0 0 0 5, 1977 1 0 0 0 0 5, 1978 0 0 0 0 0 5, 1979 0 0 0 0 0 5, 1980 0 0 0 0 0 5, 1981 0 0 0 0 0 5, 1982 0 0 0 0 0 5, 1983 0 0 0 0 0 5, 1984 0 0 0 0 0
Best,
Nicolas
Comment