Dynamic Panel data using xtabond2 and/or xtdpdgmm

Tariku Getaneh

Join Date: Nov 2021

Posts: 53
#1

Dynamic Panel data using xtabond2 and/or xtdpdgmm

08 Oct 2023, 13:27

Dear Statalists,
I am using panel data of 120 countries from 1990 – 2020, some countries have missing data on some variables. I want to estimate system GMM estimator either via Stata user written command xtabond2 or xtdpdgmm. My model is:
Yit=aYit-1 + bRit-1 + cXit-1 + dZit + Vi + St + Eit
In this model, Yit-1, Rit-1 and Xit-1 should be treated endogenous in the command.
I want to use 2 period lags and 1 period lags for the diff and level equation.
Question:
(1). Could you please help me with the exact syntax to perform system GMM?
(2). As a further check I would like to conduct the same analysis by first creating a five-year non-overlapping data, taking values every five years, and five years averaging all my variables. How should the code at (1) should change? My confusion is the level and lag values of a variable could be the same when conducting the five years average.
Please help, I have tried to read the help files, but still couldn’t get the different terminologies involved.

Thank you in advance,
Tags: None
Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#2

10 Oct 2023, 07:45

Please have a look at the examples listed in the xtdpdgmm help file and the presentation referenced below. Also note that the latest version of the xtdpdgmm package contains the new xtdpdgmmfe command, which has a simpler syntax for specifying such models: It allows you to simply specify which variables are endogenous etc. Again, see the help file for details as well as the following Statalist topic: https://www.statalist.org/forums/for...84#post1675484

A lagged variable for data with five-year averages obviously has a different implication than a lagged variable with annual data. Whether one or the other is appropriate is something you might find in the related literature of your field. Personally, I am not a big fan of taking five-year averages.

More on GMM estimation of linear dynamic panel data models:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

https://www.kripfganz.de/stata/
Comment
Tariku Getaneh

Join Date: Nov 2021

Posts: 53
#3

11 Oct 2023, 13:18

ello Sebastian,
Thank you a lot. I have looked at your materials and answers to others in this forum. In the first place, my understanding is really poor, specially because I have seen different answers which are in different way. When lagged variables are considered endogenous, I am uncertain how they should be specified in the gmm option. Can you tell from the example below If they are equivalent?

To perform system GMM:

xtabond2 L.(0/1).Yit L.(Rit-1 Xit-1 ) Zit year*, ///
gmm(Yit Rit Xit, lag(2 2) collapse eq(diff)) ///
gmm(Yit Rit Xit, lag(1 1) collapse eq(level)) ///
iv(Zit year*, eq(level)) twostep robust nodiffsargan
}

xtdpdgmm L.(0/1).Yit L.(Rit-1 Xit-1 ) Zit year*, ///
model(diff) gmm(Yit Rit Xit, lag(2 2) collapse) iv(Zit year*, diff) small noconstant vce(robust) twostep
}

Are the two equivalent?

I really appreciate your feedback!

BW
Comment

Sebastian Kripfganz

Join Date: May 2014
Posts: 2606

12 Oct 2023, 03:52

The following two specifications should be equivalent:

Code:

xtabond2 L(0/1).Y L.(R X) Z year*, ///
   gmm(Y R X, lag(2 2) collapse eq(diff)) ///
   gmm(Y R X, lag(1 1) collapse eq(level)) ///
   iv(Z year*, eq(level)) twostep robust nodiffsargan

xtdpdgmm L(0/1).Y L.(R X) Z year*, model(diff) ///
   gmm(Y R X, lag(2 2) collapse) ///
   gmm(Y R X, diff lag(1 1) collapse model(level)) ///
   iv(Z year*, model(level)) twostep vce(robust)

https://www.kripfganz.de/stata/

Comment

Tariku Getaneh

Join Date: Nov 2021

Posts: 53
#5

12 Oct 2023, 09:45

Thank you so much for this very helpful syntax. It turns out similar. I picked xtdpdgmm and I noticed that the assumption on AR(2) doesn't hold.
So, I added the second lag of the dependent variable in the model. I am uncertain if I should also add second lag of other endogenous variables.
Does it make sense to add second lags of the endogenous variables too?
Does the modification of the above code to account for this new change correct?
New code:
xtdpdgmm L(0/2).Y L.(R X) Z year*, model(diff) ///
gmm(Y R X, lag(3 3) collapse) ///
gmm(Y R X, diff lag(2 2) collapse model(level)) ///
iv(Z year*, model(level)) twostep vce(robust)
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#6

13 Oct 2023, 03:20

It can make sense to also add further lags of the other regressors.

With the extra lags of the lagged dependent variable (and other regressors), you might already get satisfying AR(2) test results. In that case, you would not need to change the lag orders for the instruments.

Furthermore, I would recommend to use further lags of the instruments; e.g., lag(2 3) or lag(2 4). This is especially important when you add further lags as regressors, because you might otherwise have insufficient instruments for all of the regressors.

https://www.kripfganz.de/stata/
2 likes
Comment
Tariku Getaneh

Join Date: Nov 2021

Posts: 53
#7

16 Oct 2023, 02:55

Thank you so much really. Surprisingly the problem persist.

To be clear I run: I have 120 countries, time 1990 - 2020

xtdpdgmm L(0/2).MORTALITY L.(ihs_AID ihs_GDP fertility ihs_population) ihs_population_density Co2 age_dependency yr*, model(diff) ///
gmm(MORTALITY ihs_AID ihs_GDP fertility ihs_population, lag(2 4) collapse) ///
gmm(MORTALITY ihs_AID ihs_GDP fertility ihs_population, diff lag(1 1) collapse model(level)) ///
iv(ihs_population_density Co2 age_dependency yr*, model(level)) twostep vce(robust)
}

IHS: inverse hyperbolic transformation
Does this specification looks alright? would you suggest any improvement?
Thank you in advance for your time.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#8

16 Oct 2023, 05:22

I am afraid it is difficult to give a general answer as to why you might still get unsatisfying results for your serial correlation test. Every data set is different.

There could still be further omitted variables. You might try adding interaction terms between some of the variables, if that makes economic sense. You can of course also add further lags, but this approach would at some point run into the risk of overparameterization.

https://www.kripfganz.de/stata/
Comment
Tariku Getaneh

Join Date: Nov 2021

Posts: 53
#9

16 Oct 2023, 05:59

Thanks A lot. Is it right to use lag(1 1) in the level equation when 2nd lagged of dep variable is added? Plus do you think the variables in the IV() looks right and not causing any problem?
As a side question, why is most of time - Sargan test of overid (reject HO) and Hansen test of overid (accept HO) leads me to different conclusion?

Kind regards
tg

Last edited by Tariku Getaneh; 16 Oct 2023, 06:07.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#10

16 Oct 2023, 06:56

lag(1 1) for the level model is fine if there is no serial correlation. If there is evidence of serial correlation, then you would potentially need to use the second (or even third) lag instead, similar to starting with higher lags for the differenced model. Given that you are not using all of the lags for the differenced model, you could also try lag(1 2) [or lag(2 3) in case of serial correlation].

For two-step system GMM estimation, the Sargan test is asymptotically invalid because it uses an incorrect weighting matrix. You should just ignore it.

https://www.kripfganz.de/stata/
Comment
Tariku Getaneh

Join Date: Nov 2021

Posts: 53
#11

16 Oct 2023, 07:49

Thank you so much Sebastian.

BW
Comment

Announcement

Dynamic Panel data using xtabond2 and/or xtdpdgmm

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment