xtdpdgmm estimation and overidentification test

Hein Willems

Join Date: Mar 2023
Posts: 27

xtdpdgmm estimation and overidentification test

21 Mar 2023, 10:32

Dear,

I am estimating a dynamic panel model where T=14 and N>10.000. I am interested in price elasticities of energy consumption and I am using the Arellano-Bond one-step GMM estimator with strictly exogenous covariates and curtailed/collapsed instruments from xtdpdgmm package. The estimated coefficients are in line with my expectations however I think I have a problem regarding the

Code:

estat overid

test. I have read the paper of Roodman (2008) and it mainly tells me to be aware of overidentification problems when the number of instruments is large compared to the number of observations. However, since I have a lot of observations available (e.g. 6500, or a lot more, but I mainly use subsamples of the entire dataset in my analysis), I do not understand why the test gives p-values of 0.000.

This is the code I use:

Code:

clear all

cd "C:\Users\wille\OneDrive\Data\Enexis"
insheet using "hoge_woz.csv", comma clear

format year %ty
encode postcode, generate(id)
egen newid = group(id)
global id id
global year year
sort $id $year
xtset $id $year


tempfile holding
save `holding'
set seed 1234

forval i = 1/1 {
    use `holding', clear
    keep id
    duplicates drop
    sample 500, count
    merge 1:m id using `holding', assert (match using) keep(match) nogenerate
    sort id
    xtdpdgmm L(0/2).consumption gas gdp, gmm(L.consumption, l(2 5)) iv(gas gdp, d) m(d) nocons overid serial
}

And these are the results

Code:


Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  .00615169

Fitting reduced model 2:
Step 1         f(b) =  .00478666

Group variable: id                           Number of obs         =      6000
Time variable: year                          Number of groups      =       500

Moment conditions:     linear =      40      Obs per group:    min =        12
                    nonlinear =       0                        avg =        12
                        total =      40                        max =        12

------------------------------------------------------------------------------
 consumption | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
 consumption |
         L1. |   .4605972   .0377965    12.19   0.000     .3865174    .5346771
         L2. |   .2246857   .0321022     7.00   0.000     .1617665    .2876048
             |
         gas |  -.1170956   .0059441   -19.70   0.000    -.1287458   -.1054454
         gdp |   .1466371   .0239001     6.14   0.000     .0997937    .1934805
------------------------------------------------------------------------------

And or the overid test:

Code:

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

1-step moment functions, 1-step weighting matrix       chi2(36)    =  306.6397
note: *                                                Prob > chi2 =    0.0000

1-step moment functions, 2-step weighting matrix       chi2(36)    =  132.0808
note: *                                                Prob > chi2 =    0.0000

* asymptotically invalid if the one-step weighting matrix is not optimal

Also, the test statistic grows even larger when I increase the number of observations, which I dont understand..

Any help is welcome!

Sebastian Kripfganz

Last edited by Hein Willems; 21 Mar 2023, 10:37.

Tags: Overidentification, xtdpdgmm

Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#2

21 Mar 2023, 11:59

That's not a rare phenomenon when you have many observations.The power of the test to detect model misspecification increases in the sample size: If there is even just a small model misspecification - say, a small correlation of (one of) the instruments with the error term - many observations allow us to detect such a misspecification. With few observations, we may not be able to separate such a small violation from the general estimation uncertainty. This can explain why the test statistic becomes larger with increasing sample size.

More observations allow us to estimate the parameters of interest more precisely. This comes at the "cost" that it can also become more difficult to pass misspecification tests. On the other hand, if specification tests do not reject the correct specification of the model even with a large number of observations, this then would provide strong confidence into the specification, while with relatively few observations we would need to wonder if we just cannot reject the correct specification because there is so much noise in the data.

https://www.kripfganz.de/stata/
Comment

Hein Willems

Join Date: Mar 2023
Posts: 27

22 Mar 2023, 03:06

Sebastian Kripfganz Thank you for the very clear explanation.

So if I understand correctly, the very large test statistic could be the result of a very small missspecification and the large number of observations? However, this large test statistic could also exist because of a serious misspecification right? Is there a way to check this? Should I try to run the regression with a low (what is low? maybe N=T?) number of observations and see whether it rejects the null in this case.

I tried this, first with N=T=14, both tests now indeed reject the null:

Code:

. estat overid

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

1-step moment functions, 1-step weighting matrix       chi2(36)    =   43.9629
note: *                                                Prob > chi2 =    0.1700

1-step moment functions, 2-step weighting matrix       chi2(36)    =   14.0000
note: *                                                Prob > chi2 =    0.9996

* asymptotically invalid if the one-step weighting matrix is not optimal

The first test is still rejecting the null at N = 80:

Code:

. estat overid

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

1-step moment functions, 1-step weighting matrix       chi2(36)    =  128.3450
note: *                                                Prob > chi2 =    0.0000

1-step moment functions, 2-step weighting matrix       chi2(36)    =   49.8498
note: *                                                Prob > chi2 =    0.0622

* asymptotically invalid if the one-step weighting matrix is not optimal

This all seems a little bit ad-hoc. How can I use these results. What is your opinion?

Last edited by Hein Willems; 22 Mar 2023, 03:51.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#4

22 Mar 2023, 04:52

I see your point, but I think it would be difficult to justify using the insignificance of the test from the small sample to argue that the misspecification is not serious. As a minimum, you would need to do the same for many different small samples (different choices of the units), to demonstrate that the result is not an artifact from a particular selection of units. And then, what does it actually mean that a misspecification is not serious? The seriousness of the misspecification depends on the resulting bias. But you do not know how large (economically) the bias is without estimating a correctly specified model.

I am afraid, I do not have a good suggestion on what to do in your situation.

https://www.kripfganz.de/stata/
Comment
Hein Willems

Join Date: Mar 2023

Posts: 27
#5

22 Mar 2023, 08:12

Sebastian Kripfganz Thank you for your response. This is very helpfull. I will run many different regressions selecting different units and use this as an argument of correct specification.

Furthermore, in your lecture you state that 'If u_it is serially uncorrelated, then delta-u_it has negative first-order serial correlation, but no higher-order serial correlation. Absence of higher-order serial correlation of delta-u_it is crucial for the validity of y_i,t-2, y_i,t-3,... as instruments, and similarly for the instruments of predetermined and endogenous x_it.'.

Now with the following regression:

Code:

xtdpdgmm L(0/2).consumption gas gdp, gmm(L.consumption, l(2 5)) iv(gas gdp, d) m(d) nocons overid serial . estat serial Arellano-Bond test for autocorrelation of the first-differenced residuals H0: no autocorrelation of order 1 z = -8.0470 Prob > |z| = 0.0000 H0: no autocorrelation of order 2 z = -0.3493 Prob > |z| = 0.7269

1. I find that there is autocorrelation of order 1, and there is autocorrelation of order 2. How should I exactly interpret these results, should I not expect to also reject the second hypothesis since I am using the first two lags of the dependent variable?

2. Also, when I include a lagged term of the independent variable gas, the results of this test for autocorrelation differ significantly, why is this?

Code:

xtdpdgmm L(0/2).consumption L(0/1).gas gdp, gmm(L.consumption, l(2 5)) iv(L.gas gdp, d) m(d) nocons overid serial . estat serial Arellano-Bond test for autocorrelation of the first-differenced residuals H0: no autocorrelation of order 1 z = -6.5253 Prob > |z| = 0.0000 H0: no autocorrelation of order 2 z = -1.6681 Prob > |z| = 0.0953

3. Lastly, I have a somewhat unrelated question to the above. I am now using xtdpdgmm to estimate long-run price elasticities of energy consumption, but is it also possible to estimate/analyse short-run dynamics in a short-T panel? I do not seem to find a lot of research on this topic.

Kind regards,

Hein Willems
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#6

22 Mar 2023, 08:21

1. The test is on serial correlation of the residuals, not the dependent variable. Adding multiple lags of the dependent variable as regressors can help to remove serial correlation from the errors. Your test clearly rejects no first-order autocorrelation of the residuals, and does not reject second-order autocorrelation of the residuals, both exactly as it should be.

2. We would not normally expect the inclusion of an additional lag to induce serial correlation in the errors. However, especially if the sample size is small, the test statistic might not be estimated very precisely, which could explain the observed variation between these two specifications.

3. If you define long-run elasticities as the coefficient of the independent variable divided by 1 minus the sum of the coefficients of the lagged dependent variable, then the short-run elasticities would simply be the coefficients of the independent variables themselves.

https://www.kripfganz.de/stata/
Comment

Hein Willems

Join Date: Mar 2023
Posts: 27

23 Mar 2023, 04:06

Sebastian Kripfganz Thank you,

I am not exactly sure whether I should apply diff-GMM or sys-GMM. If I understand correctly from your slides, sys-GMM is introduced to introduce extra moment conditions to get a more efficient estimator? Does this mean this is always the better choice? I am wondering since specifying either model(diff) or model(level) gives very different estimators of the coefficients, also for model(level) the test for autocorrelation gives nice results even when the number of groups is very large where it gives p-values of 0 for model(diff):

Code:

xtdpdgmm L(0/2).consumption L(0/1).(gas gdp), model(diff) gmm(consumption, lag(2 .)) gmm(gas, lag(1 .)) gmm(gdp, lag(1 .)) overid serial

Group variable: id                           Number of obs         =     12000
Time variable: year                          Number of groups      =      1000

Moment conditions:     linear =      89      Obs per group:    min =        12
                    nonlinear =       0                        avg =        12
                        total =      89                        max =        12

------------------------------------------------------------------------------
 consumption | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
 consumption |
         L1. |   .2756678    .016646    16.56   0.000     .2430422    .3082933
         L2. |   .0841815   .0101433     8.30   0.000     .0643011    .1040619
             |
         gas |
         --. |  -.0811305    .008687    -9.34   0.000    -.0981567   -.0641043
         L1. |  -.3222405   .0277951   -11.59   0.000    -.3767179    -.267763
             |
         gdp |
         --. |   .1543101   .0203537     7.58   0.000     .1144176    .1942026
         L1. |   .2154226    .027199     7.92   0.000     .1621136    .2687316
             |
       _cons |   2.388899   .2873927     8.31   0.000      1.82562    2.952178
------------------------------------------------------------------------------

. estat overid

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

1-step moment functions, 1-step weighting matrix       chi2(82)    = 1053.1172
note: *                                                Prob > chi2 =    0.0000

1-step moment functions, 2-step weighting matrix       chi2(82)    =  468.7231
note: *                                                Prob > chi2 =    0.0000

* asymptotically invalid if the one-step weighting matrix is not optimal

. estat serial

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1      z =  -23.8497   Prob > |z|  =    0.0000
H0: no autocorrelation of order 2      z =   -4.0646   Prob > |z|  =    0.0000

And for model(level):

Code:

Group variable: id                           Number of obs         =     12000
Time variable: year                          Number of groups      =      1000

Moment conditions:     linear =      90      Obs per group:    min =        12
                    nonlinear =       0                        avg =        12
                        total =      90                        max =        12

------------------------------------------------------------------------------
 consumption | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
 consumption |
         L1. |   .8879421   .0196637    45.16   0.000     .8494019    .9264824
         L2. |   .1129796   .0194574     5.81   0.000     .0748437    .1511155
             |
         gas |
         --. |  -.1105826   .0107692   -10.27   0.000    -.1316898   -.0894754
         L1. |   .2212282   .0255482     8.66   0.000     .1711545    .2713018
             |
         gdp |
         --. |   .0333805   .0253412     1.32   0.188    -.0162873    .0830483
         L1. |  -.1861235   .0287386    -6.48   0.000    -.2424502   -.1297969
             |
       _cons |   1.108836   .3539777     3.13   0.002     .4150523    1.802619



. estat overid

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

1-step moment functions, 1-step weighting matrix       chi2(83)    =  779.9269
note: *                                                Prob > chi2 =    0.0000

1-step moment functions, 2-step weighting matrix       chi2(83)    =  372.9522
note: *                                                Prob > chi2 =    0.0000

* asymptotically invalid if the one-step weighting matrix is not optimal

. estat serial

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1      z =  -22.3557   Prob > |z|  =    0.0000
H0: no autocorrelation of order 2      z =    0.6264   Prob > |z|  =    0.5310

The coefficients-values of the difference model make more sense to me but I do not have the arguments why I should use the differenced model, could you help?

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#8

23 Mar 2023, 12:34

The extra moment conditions for the system GMM estimator do not come for free. They require an additional assumption to ensure that the instruments for the level model are uncorrelated with the error term. For the conventional system GMM estimator, this is essentially a mean stationarity assumption for your variables.

Note that you should not just replace model(diff) with model(level). The same instruments which are valid for the differenced model are usually not valid for the level model. For example, lags of the dependent variable by construction are correlated with the unit-specific error component. You would need to add the diff suboption to all the instruments in order to use first differences of the variables as instruments for the level model. This is not done automatically.

https://www.kripfganz.de/stata/
Comment
Hein Willems

Join Date: Mar 2023

Posts: 27
#9

24 Mar 2023, 04:48

Sebastian Kripfganz Okay clear, does this imply that sys-GMM is not very suitable for short-T panels? Because it is quite hard to show that a variable is stationary when it only has for example 5 time periods, right?

I added the diff suboption in the following manner, but the results are still very odd...

Code:

xtdpdgmm L(0/2).consumption L(0/2).(gas gdp) heatdays, model(level) gmm(consumption, lag(2 .) d) gmm(gas, lag(1 .) d) gmm(gdp, lag(1 .) d) overid serial

Also, to apply sys-GMM, do we have to assume that are variables are I(0)? Or is a combination of I(0) and I(1) variables allowed?

Last edited by Hein Willems; 24 Mar 2023, 04:52.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#10

24 Mar 2023, 05:05

No, system GMM is specifically designed for short-T panels. Instead of testing the mean stationarity of the variables directly, which would indeed be hard with short T, you would test the implications on the validity of the additional moment conditions with a difference-in-Hansen test; see my presentation slides. The distinction between I(0) and I(1) variables does not play much of a role in a short-T context, and it would be hard to establish the integration orders anyway. Generally, looking at it from a time-series perspective, dynamic models can accommodate both I(0) and I(1) variables.

Okay, so now you have actually specified a level GMM estimator, not a system GMM estimator. The latter combines moment conditions for the difference and level model, e.g.:

Code:

xtdpdgmm L(0/2).consumption L(0/2).(gas gdp) heatdays, gmm(consumption, lag(2 .) model(diff)) gmm(gas, lag(1 .) model(diff)) gmm(gdp, lag(1 .) model(diff)) gmm(consumption, lag(1 1) diff model(level)) gmm(gas, lag(0 0) diff model(level)) gmm(gdp, lag(0 0) diff model(level)) overid

Option serial is not having any effect on the model specification. If you want to add nonlinear moment conditions valid under absence of serial correlation, you need to add option nl(noserial), but those are usually redundant for the system GMM estimator and should only be used with the difference GMM estimator.
Also, you have not specified any instruments specifically for the regressor heatdays. This might lead to poor identification.

https://www.kripfganz.de/stata/
Comment

Hein Willems

Join Date: Mar 2023
Posts: 27

#11

24 Mar 2023, 06:34

Sebastian Kripfganz Thank you for your response,

I adapted the code that you suggested but the results are very similar to before. The long-run price elasticities have values upto 27 which is not what we expect, since they should be in the range [-1, 0].
The results from the diff-GMM are as expected, but I would really like to try and replicate the results with the system-GMM.

This is the code I applied for one of the regressions that leads to a elasticity of 27:

Code:

clear all

cd "C:\Users\wille\OneDrive\Documenten\MSc Econometrics\MSc Econometrics & Mathematical Economics\Graduation\Energy Poverty\Stata\xtdpdgmm2"
insheet using "b100.csv", comma clear

format year %ty
encode postcode, generate(id)
egen newid = group(id)
global id id
global year year
sort $id $year
xtset $id $year

matrix elast = (.)
matrix shortrun = (.)

tempfile holding
save `holding'
set seed 1234

forval i = 1/1 {
    use `holding', clear
    keep id
    duplicates drop
    sample 500, count
    merge 1:m id using `holding', assert (match using) keep(match) nogenerate
    sort id
    xtdpdgmm L(0/2).consumption L(0/2).(gas gdp) heatdays, gmm(consumption, lag(2 .) model(diff)) gmm(gas, lag(1 .) model(diff)) gmm(gdp, lag(1 .) model(diff)) gmm(consumption, lag(1 1) diff model(level)) gmm(gas, lag(0 0) diff model(level)) gmm(gdp, lag(0 0) diff model(level)) overid

}
estat overid
estat serial

matrix elast = (elast \ (_b[gas] + _b[L1.gas] + _b[L2.gas])/(1-_b[L1.consumption] - _b[L2.consumption]))
matrix shortrun= (shortrun \ _b[L1.gas])

matrix list elast

Do you know how the results can be very different to the diff-GMM model specified as:

Code:

xtdpdgmm L(0/2).consumption L(0/2).(gas gdp) heatdays, model(level) gmm(consumption, lag(2 .), d) gmm(gas, lag(1 .), d) gmm(gdp, lag(1 .), d) overid

Also, thanks for the hint of the instrument for heatdays, I should indeed include iv() for this variable.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#12

24 Mar 2023, 07:11

There are two potential explanations for the observed differences:
The difference GMM estimator can be seriously biased - downward biased for the coefficients of the lagged dependent variables - when the process is very persistent.

The system GMM estimator might be inconsistent if the additional moment conditions for the level model are invalid, which would be the case if the (initial) change in consumption is correlated with the unobserved unit-specific effects - losely speaking: if units with a larger (or smaller) fixed effect systematically grow faster (or slower), or the other way round.

As an alternative, you could run the difference GMM estimator with the additional nl(noserial) option. This reduces the bias mentioned in point 1, as the estimator remains consistent under large persistence. It also does not rely on the additional assumption needed for the system GMM estimator.

There is another reason for the large long-run coefficients in the system GMM case: The sum of the lagged dependent variables' coefficients essentially equals 1. Then, when calculating long-run effects, you are dividing by something close to 0. If we think about this model as a partial adjustment model, this sum of autoregressive coefficients equal to 1 effectively implies that no adjustment is happening over time. Any transitory shock has a permanent effect. The long-run elasticity goes to infinity. If you compute the long-run effects with Stata's nlcom command, you will probably find a huge confidence interval around the already very large estimate of the long-run coefficient.

https://www.kripfganz.de/stata/
Comment
Hein Willems

Join Date: Mar 2023

Posts: 27
#13

24 Mar 2023, 07:27

When I add

Code:

nl(noserial)

I indeed find very similar results where the coefficients of the lagged dependent variable sum to almost 1 and thus result in a very large long-term effect. As you said, this means that the diff-GMM is seriously downwards biased, but what is there to do for me now? I imagine that it is not favourable to apply sys-GMM to estimate the elasticities since the values will have huge confidence intervals...

Also, could you maybe give me some more intuition on why the diff-GMM is downward biased when the dependent variable is very persistent, or do you have any recommended literature for this?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#14

24 Mar 2023, 12:02

See for example Blundell and Bond (forthcoming), especially Section 2.4, and the other references in that paper. In a nutshell, the instruments for the lagged dependent variable in the differenced model become very weak if the persistence is high.
Blundell, R., and S. Bond (forthcoming). Initial conditions and Blundell–Bond estimators. Journal of Econometrics

It is difficult for me to advice on how you might proceed. The model either does not seem suitable for your research purpose, or - taking the results at face value - it refutes the prior theory you had about the price elasticities.

https://www.kripfganz.de/stata/
Comment
Hein Willems

Join Date: Mar 2023

Posts: 27
#15

05 Apr 2023, 07:59

Dear Sebastian Kripfganz ,

I have an additional question on the xtdpdgmm package. As I explained before above, I am trying to calculate long-run estimates.
These are computed as follows (after the first-difference gmm estimation):

Code:

(_b[gas] + _b[L1.gas] + _b[L2.gas])/(1-_b[L1.consumption] - _b[L2.consumption])

Now, I want to compute the standard error for this long-run elasticity. I have done it using nltest, but since it is a nonlinear function I guess bootstrapping is more accurate.
What I tried was:

Code:

bootstrap theta=(_b[gas] + _b[L1.gas] + _b[L2.gas])/(1-_b[L1.consumption] - _b[L2.consumption]), reps(10) nodrop: xtdpdgmm L(0/2).consumption L(0/2).(gas gdp) heatdays, model(diff) gmm(consumption, lag(2 .)) gmm(gas, lag(1 .)) gmm(gdp, lag(1 .)) overid

But it gives me the error 'insufficient observations to compute bootstrap standard errors no results will be saved', even after adding the nodrop as suggested on other threads.

How would you suggest to compute these standard errors?

Thanks in advance!

Hein Willems
Comment

Announcement

xtdpdgmm estimation and overidentification test

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment