Replicating xtabond2 results with xtdpdgmm

Tom Ford

Join Date: Mar 2021

Posts: 85
#1

Replicating xtabond2 results with xtdpdgmm

16 Mar 2021, 05:40

Hello,
I am using the following two-step difference GMM model:

Code:

xtabond2 y l.y l.x1 l.x2 l.x3 $controls yeardum* if group==1, gmm(y x1 x2 x3, lag(2 10) collapse ) iv(yeardum* $controls) noleveleq small noconstant robust twostep

Where y is my dependent variable x1 x2 and x3 are endogenous variable that for theoretical reasons I include as lagged variables in the main model, $controls is the vector of exogenous controls. This model provides good results that are in line with my expectations

I would like to estimate the same model xtdpdgmm and I tried the following code, but the estimated coefficients are completely different. What am I doing wrong?

Code:

xtdpdgmm y l.y l.x1 l.x2 l.x3 $controls yeardum* if group==1, model(diff) gmm(y x1 x2 x3, lag(2 10) collapse) iv($controls) twostep vce(r) small noconstant teffects

Finally, I would like to estimate the same model using an iterated GMM estimators. I tried the following, but I fear I am doing the same mistake that for the two-step model

Code:

xtdpdgmm y l.y l.x1 l.x2 l.x3 $controls yeardum* if group==1, model(diff) gmm(y x1 x2 x3, lag(2 10) collapse) iv($controls) igmm vce(r) small noconstant teffects igmmiterate(100)

Does anyone know how I can replicate my results using xtdpdgmm and estimate the same model as the one specified in xtabond using an iterated gmm estimator?

Thanks a lot in advance for your help

Best
Tags: None
Sebastian Kripfganz

Join Date: May 2014

Posts: 2581
#2

16 Mar 2021, 07:21

There are two reasons why the results do not coincide:
1. xtabond2 automatically transforms the instruments in the iv() option into first differences for the first-differenced model. With xtdpdgmm, you need to explicitly specify the diff suboption.
2. The teffects option with xtdpdgmm always creates instruments for the time dummies for the level model, even if model(diff) is specified. To replicate the xtabond2 results, specify time dummies explicitly in option iv() instead of the teffects option.

The following command line should replicate your xtabond2 results:

Code:

xtdpdgmm y l.y l.x1 l.x2 l.x3 $controls yeardum* if group==1, model(diff) gmm(y x1 x2 x3, lag(2 10) collapse) iv(yeardum* $controls, diff) small noconstant vce(robust) twostep

In the next step, you can then simply replace twostep by igmm.

https://www.kripfganz.de/stata/
Comment

Tom Ford

Join Date: Mar 2021
Posts: 85

16 Mar 2021, 10:33

Dear Prof Kripfganz,
hank you very much for your prompt and clarifying explanation.

I tried to follow your suggestions, but the estimated coefficients remain different. Surprisingly xtdpdgmm appears to rely on more observations than the xtabond2. If you have any idea of what may be causing this issue, I would be extremely grateful.

I will show the actual code and results I ran in Stata16 to illustrate the issue. I have renamed the key variables as y, x1, x2 and x3 to simplify the interpretation. lag_* variables are variables I have previously lagged.

Code:

xtabond2 y lag_y lag_x1 lag_x2 lag_x3 $controls yeardum* if group==1, noleveleq ///
gmm(y x1 x2 x3, lag(2 16) collapse) iv(yeardum*  $controls) small noconstant robust twostep
eststo gmmab

xtdpdgmm y lag_y lag_x1 lag_x2 lag_x3 $controls yeardum* if group==1, model(diff) /// 
gmm(y x1 x2 x3, lag(2 16) collapse) iv(yeardum* $controls, diff) small noconstant vce(r) twostep
eststo gmmxt


esttab gmmab gmmxt, ///
 se star(* 0.10 ** 0.05 *** 0.01) b(3) se(3) compress ///
drop(yeardum*) mtitle ("xtabond2" "xtdpdgmm")

------------------------------------
                 (1)          (2)   
            xtabond2     xtdpdgmm   
------------------------------------
lag_y          0.359***     0.025   
             (0.096)      (0.042)   

lag_x1         2.176**      0.793   
             (1.020)      (0.482)   

lag_x2         0.305        0.051   
             (0.356)      (0.157)   

lag_x3        -0.714**     -0.239*  
             (0.318)      (0.138)   

lag_ex_s~P    -0.001       -0.001   
             (0.001)      (0.001)   

lag_FDI       -0.000        0.000   
             (0.000)      (0.000)   

lag_log_~p     0.349       -0.221*  
             (0.242)      (0.124)   

lag_log_~C    -0.059       -0.040** 
             (0.037)      (0.018)   

lag_Growth    -0.000        0.001   
             (0.001)      (0.001)   

lag_INGOs     -0.002       -0.000   
             (0.003)      (0.003)   

lag_left       0.017        0.014   
             (0.015)      (0.009)   

lag_poli~2     0.000       -0.001   
             (0.004)      (0.002)   

lag_C087      -0.070       -0.034   
             (0.095)      (0.068)   
------------------------------------
N               1346         1439   
------------------------------------
Standard errors in parentheses
* p<0.10, ** p<0.05, *** p<0.01

I am not sure what I am doing differently with the xtdpdgmm command. It appears to use more observations and surprisingly my lag_y is no longer significant.

I thank you very much in advance for your help

Best

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2581
#4

16 Mar 2021, 11:08

The difference in the number of observations is expected. xtabond2 with option noleveleq reports the number of observations for the first-differenced model, while xtdpdgmm always reports the number of observations for the level model. The difference between them should exactly equal your number of cross-sectional units.

From the code, I cannot see any reason why the results should not coincide. In the following example, the results are the same:

Code:

webuse abdata xtabond2 n L.n L.w L.k ys yr*, noleveleq gmm(n w k, lag(2 .) collapse) iv(yr* ys) small noconstant robust twostep xtdpdgmm n L.n L.w L.k ys yr*, model(diff) gmm(n w k, lag(2 .) collapse) iv(yr* ys, diff) small noconstant vce(r) twostep

Do the results still differ if you do not use the restriction if group==1? If you have unbalanced panel data, there is the rare chance that the differences are due to the issue described in point 4 of my post in the following thread: https://www.statalist.org/forums/for...mm#post1576543 Do your xtabond2 results differ when you specify L.y with the lag operator notation instead of lag_y, where lag_y is a new variable created beforehand? The magnitude of the differences between xtabond2 and xtdpdgmm is still surprising.

https://www.kripfganz.de/stata/
Comment

Tom Ford

Join Date: Mar 2021
Posts: 85

16 Mar 2021, 11:46

Indeed, I have an unbalanced panel with N=130 and T=28.
You are right, the problem seems to arise from the restriction (if group==1). I have tried using l.* rather than previously defined lags (lag_*) and coefficients of xtabond remain the same.
Below are some estimations I ran that should illustrate the issue

Code:

*=============*
* Full sample *
*=============*
xtabond2 y lag_y lag_x1 lag_x2 lag_x3 $controls yeardum*, noleveleq ///
gmm(y x1 x2 x3, lag(2 16) collapse) iv(yeardum*  $controls) small noconstant robust twostep
eststo gmmab_fs

xtdpdgmm y lag_y lag_x1 lag_x2 lag_x3 $controls yeardum*, model(diff) ///
gmm(y x1 x2 x3, lag(2 16) collapse) iv(yeardum* $controls, diff) small noconstant vce(r) twostep
eststo gmmxt_fs

*=============*
* if group==1 *
*=============*
xtabond2 y lag_y lag_x1 lag_x2 lag_x3 $controls yeardum* if group==1, noleveleq ///
gmm(y x1 x2 x3, lag(2 16) collapse) iv(yeardum*  $controls) small noconstant robust twostep
eststo gmmab

xtdpdgmm y lag_y lag_x1 lag_x2 lag_x3 $controls yeardum* if group==1, model(diff) ///
gmm(y x1 x2 x3, lag(2 16) collapse) iv(yeardum* $controls, diff) small noconstant vce(r) twostep
eststo gmmxt

*==================*
* keep if group==1 *
*==================*

keep if group==1

(2,568 observations deleted)

xtabond2 y lag_y lag_x1 lag_x2 lag_x3 $controls yeardum*, noleveleq ///
gmm(y x1 x2 x3, lag(2 16) collapse) iv(yeardum*  $controls) small noconstant robust twostep
eststo gmmab_g1

xtdpdgmm y lag_y lag_x1 lag_x2 lag_x3 $controls yeardum*, model(diff) ///
gmm(y x1 x2 x3, lag(2 16) collapse) iv(yeardum* $controls, diff) small noconstant vce(r) twostep
eststo gmmxt_g1

*=================*
* present results *
*=================*

esttab gmmab_fs gmmxt_fs gmmab gmmxt gmmab_g1 gmmxt_g1, ///
 se star(* 0.10 ** 0.05 *** 0.01) b(3) se(3) compress ///
drop(yeardum*) mtitle ("xtabond2-FS" "xtdpdgmm-FS" "xtabond2" "xtdpdgmm" "xtabond2-g1" "xtdpdgmm-g1")

----------------------------------------------------------------------------------------
                 (1)          (2)          (3)          (4)          (5)          (6)  
           xtabond~S    xtdpdgm~S     xtabond2     xtdpdgmm    xtabond~1    xtdpdgm~1  
----------------------------------------------------------------------------------------
lag_y          0.243***     0.243***     0.330***     0.017        0.013        0.013  
             (0.044)      (0.044)      (0.086)      (0.044)      (0.041)      (0.041)  

lag_x1         0.087        0.086        1.844*       0.897**      1.089        1.089  
             (0.747)      (0.747)      (0.985)      (0.372)      (0.791)      (0.787)  

lag_x2         0.125        0.125        0.298        0.126        0.213        0.213  
             (0.236)      (0.236)      (0.354)      (0.156)      (0.247)      (0.246)  

lag_x3        -0.029       -0.029       -0.606**     -0.263**     -0.331       -0.331  
             (0.231)      (0.231)      (0.302)      (0.115)      (0.246)      (0.245)  

ex_sh_GDP      0.000        0.000        0.000        0.000        0.001        0.001  
             (0.001)      (0.001)      (0.001)      (0.001)      (0.001)      (0.001)  

FDI            0.000        0.000        0.000       -0.000       -0.000       -0.000  
             (0.001)      (0.001)      (0.000)      (0.000)      (0.000)      (0.000)  

log_pop       -0.198*      -0.198*       0.699**     -0.167       -0.044       -0.044  
             (0.115)      (0.115)      (0.274)      (0.121)      (0.150)      (0.150)  

log_GDPxC      0.009        0.009        0.066        0.007        0.013        0.013  
             (0.029)      (0.029)      (0.040)      (0.022)      (0.025)      (0.025)  

Growth         0.002        0.002       -0.001       -0.000       -0.001       -0.001  
             (0.001)      (0.001)      (0.001)      (0.001)      (0.001)      (0.001)  

INGOs          0.000        0.000        0.002        0.003        0.001        0.001  
             (0.002)      (0.002)      (0.006)      (0.003)      (0.004)      (0.004)  

left           0.004        0.004        0.000       -0.007       -0.005       -0.005  
             (0.016)      (0.016)      (0.017)      (0.007)      (0.009)      (0.009)  

polity2        0.005        0.005        0.001       -0.002       -0.000       -0.000  
             (0.003)      (0.003)      (0.004)      (0.002)      (0.002)      (0.002)  
----------------------------------------------------------------------------------------
N               2435         2588         1336         1435         1085         1435  
----------------------------------------------------------------------------------------
Standard errors in parentheses

Changing restrictions, results change a lot. if I keep the full sample (model 1 and 2) or if I only keep observations where group==1 (model 5 and 6), the results are consistent between xtabond2 and xtdpdgmm.
wo inconsistencies I do not understand:
1) Why if I use restriction (if group==1) results are widely different between xtabond2 and xtdpdgmm.
2) Why if I keep observations where group ==1 or if I use the command if group==1 results are different even within the same command. Shouldn't the estimates of model 3 and 5 that are calculated in xtabond2 show identical results? Similarly, shouldn't the estimates of model 4 and 6 that are calculated in xtdpdgmm show identical results? I am sorry if these are trivial question but I really do not know how to go about this.

Thanks a lot in advance for your help

p.s. I am also adding the codebook of the group variable if it may help

Code:

-------------------------------------------------------------------------------------------------------------
group                     is the country in group 1? 1=yes, 0= otherwise
-------------------------------------------------------------------------------------------------------------

                  type:  numeric (double)

                 range:  [0,1]                        units:  1
         unique values:  2                        missing .:  1,162/4,823

            tabulation:  Freq.  Value
                         1,406  0
                         2,255  1
                         1,162  .

Last edited by Tom Ford; 16 Mar 2021, 12:10.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2581
#6

16 Mar 2021, 12:47

This is not a trivial question. My first question would be: What exactly does the group variable capture? If your cross-sectional units (countries, I guess) are nested within groups, then there should not be any difference between the if-restriction and only keeping those observations. From this, I conjecture that the group membership must change over time, i.e. some countries fall into group 1 in some years but not all. In that case, the if-condition only restricts the sample for the dependent variable and regressors but not for the instruments. It then becomes quite tricky which observations get used at what point in the estimation. Note that L.y may still exist for a given observation in the full data set but no longer after you removed some observations with the keep command. xtabond2 and xtdpdgmm handle these if-conditions differently. (Also notice the different number of observations with xtabond2, while xtdpdgmm makes use of the same sample size.) In my view, in most cases it is recommendable to restrict the sample with the keep command to make sure that you are really only using those observations that you intend to use.

https://www.kripfganz.de/stata/
1 like
Comment
Tom Ford

Join Date: Mar 2021

Posts: 85
#7

18 Mar 2021, 12:14

Dear Prof Kripfganz,
Thank you very much for your clarification.
Indeed, you are right in supposing that my group variable is time-varying. It captures whether a state is under "intense trade competition" and hence it is time-varying for every state. My aim is to examine if my variables of interest x1, x2 and x3 affect y when a state is under "intense trade competition". If I correctly understood your post, this means that in the xtabond2 specification, I am using as instruments the lagged values of y, x1, x2 and x3 even if the state was not consistently in the "intense trade competition group". I am not sure what this means substantially. Indeed, my intuition is that I should use the power of instrumented lags even if beforehand the state was not part of the group. My question is what does this means in terms of inference and whether it violates some key assumption. I really am not sure whether I should go ahead with this, or whether it would be a threat to the validity of my findings.

Thanks a lot again for your help

Best Regards
Comment
Jane Quan

Join Date: Jun 2021

Posts: 60
#8

24 Sep 2023, 22:49

Hi Sebastian Kripfganz, Tom Ford

This is Jane, and it's a pleasure to e-meet you.

I've encountered an issue with Stata coding while using the xtdpdgmm command. I was wondering if you could offer some assistance.

My original code was:
xtabond2 logrealagritotal L.logrealagritotal logimmi logdist logecondist logexrate i.year, gmm(L.(logrealagritotal logimmi)) iv(logdist logecondist logexrate i.year, equation(diff)) robust h(2) nocons artests(3)

Now, I'd like to use the xtdpdgmm command instead of xtabond2. Here are my revised codes:
xtdpdgmm logrealagritotal L.logrealagritotal logimmi logdist logecondist logexrate i.year, model(diff) gmm(L.(logrealagritotal logimmi)) iv(logdist logecondist logexrate i.year, diff) nocons vce(r)

However, the results are not the same, and I'm unsure where I went wrong. I'm hoping to receive some guidance from you.

Thank you^^
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2581
#9

25 Sep 2023, 02:42

You can compare the lists of instruments below the regression outputs. As you will notice, your xtabond2 specification creates GMM-type instruments for the level model, while xtdpdgmm does not. I recommend to always explicitly specify the model/equation to make sure that you get what you want.

https://www.kripfganz.de/stata/
Comment

Announcement