XTSEQREG: new Stata command for sequential / two-stage (GMM) estimation of linear panel models

Sebastian Kripfganz replied

10 Oct 2021, 09:41
Yes, that is absolutely fine.
Leave a comment:
Jerry Kim replied

09 Oct 2021, 17:49
Dear Sebastian,

Thanks a lot ! However if the time invariant regressor maybe correlated with the unobserved effect alpha_i, can I still use this method although with a different IV that is uncorrelated with alpha_i ? The code then should be as follows where third_party_IV is a different IV for sftlen, the time-invariant regressor.

Code:

eststo md_diffgmm_sl2, title("Estimator: FD-GMM including shift length"): /// qui xtdpdgmm s_it s_itlag1 eta_it elective /// afterchangeseq beforecancel age i.prcdr2 i.bin2hr /// sftlen, collapse model(diff) /// gmm(s_itlag1, lag(1 2)) gmm(eta_it, lag(2 3)) /// gmm(elective beforecancel afterchangeseq age i.prcdr2 i.bin2hr, lag(0 1)) /// iv(third_party_IV, model(level)) nocons two vce(clu surgeon2)
Last edited by Jerry Kim; 09 Oct 2021, 17:52.
Leave a comment:
Sebastian Kripfganz replied

09 Oct 2021, 06:23
In your case, where all instruments for the time-varying regressors are specified for the first-differenced model, and the coefficients of the time-invariant regressors are just-identified (i.e. there are as many instruments as time-invariant regressors), the two-stage approach with xtseqreg is not needed. The presence of the time-invariant regressors in this case does not affect the estimates of the coefficients for the time-varying regressors, and a second-stage regression would not yield any improvement. This is discussed in more technical terms in Appendix C.4 of the Supplementary Appendix for our JAE article:
Kripfganz, S. and C. Schwarz (2019). Estimation of linear dynamic panel data models with time-invariant regressors. Journal of Applied Econometrics 34 (4), 526-546.

So, yes, your approach is absolutely fine. Just keep in mind that your identifying assumption for the coefficients of the time-invariant regressors is that those are uncorrelated with the unobserved individual/group-specific effects.
Leave a comment:

Jerry Kim replied

09 Oct 2021, 00:06

Dear Sebastian,

The versions are all up-to-date as you indicated.

However, I read again your slides about

Code:

xtdpdgmm

package in 2019 conference (the longer and more detailed one) and on page 86 "Estimation with time-invariant regressors in Stata". It shows the estimation with exogenous industry dummy (quote as follows):

Code:

xtdpdgmm L(0/1).n w k i.ind, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) ///
> iv(i.ind, model(level)) nl(noserial) teffects igmm vce(r)
(Some output omitted)
Instruments corresponding to the linear moment conditions:
1, model(diff):
L2.n L3.n L4.n
2, model(diff):
L1.w L2.w L3.w L1.k L2.k L3.k
3, model(level):
2bn.ind 3.ind 4.ind 5.ind 6.ind 7.ind 8.ind 9.ind
4, model(level):
1978bn.year 1979.year 1980.year 1981.year 1982.year 1983.year 1984.year
5, model(level):
_cons

I am now trying to use the following code and everything seems right now.

Code:

eststo md_diffgmm_sl2: ///
    qui xtdpdgmm s_it s_itlag1 eta_it elective ///
    afterchangeseq beforecancel age i.prcdr2 i.bin2hr ///
    sftlen, collapse model(diff) ///
    gmm(s_itlag1, lag(1 2)) gmm(eta_it, lag(2 3)) ///
    gmm(elective beforecancel afterchangeseq age i.prcdr2 i.bin2hr, lag(0 1)) ///
    iv(sftlen, model(level)) nocons two vce(clu surgeon2)

Is this way okay for estimation of time-invariant variables coefficients by just using

Code:

xtdpdgmm

?
Thank you!

Leave a comment:

Sebastian Kripfganz replied

07 Oct 2021, 04:04
Please check whether you have the latest version of both packages:

Code:

which xtdpdgmm which xtseqreg

These should be 2.3.9 for xtdpdgmm and 1.2.4 for xtseqreg. If necessary, please update the commands:

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace net install xtseqreg, from(http://www.kripfganz.de/stata/) replace

If the problem persists with the latest versions, I am afraid I would need to see the actual data you used to replicate the problem. If you are able to share the data, please can you send it to me by e-mail?
Leave a comment:
Jerry Kim replied

06 Oct 2021, 13:06
Dear Sebastian,

Thanks for the suggesion! I reivsed the code however this time there is another error as follow:

Code:

not sorted r(5);

Is it because the data not sorted by certain sequence?
Leave a comment:
Sebastian Kripfganz replied

06 Oct 2021, 04:02
The problem is with the nocons option. Since the first stage does not contain a constant, you need to specify first(md_gmm_s1, copy nocons). You then probably want to include a constant in your second stage, so do not specify nocons at the end of the command line, i.e.

Code:

xtseqreg s_it (s_itlag1 eta_it elective afterchangeseq beforecancel /// age) sftlen surgnum, first(md_gmm_s1, copy nocons) /// iv(sftlen surgnum) vce(clu surgeon2)
Leave a comment:

Jerry Kim replied

05 Oct 2021, 17:58

Dear Sebastian,

Thanks for the reply. However I encountered the error as follows:

Code:

option first() incorrectly specified
r(322);

My first stage code is:

Code:

eststo md_gmm_s1: xtdpdgmm s_it s_itlag1 eta_it elective ///
    afterchangeseq beforecancel age, collapse model(diff) ///
    gmm(s_itlag1, lag(1 2)) gmm(eta_it, lag(2 3)) ///
    gmm(elective sftlen beforecancel afterchangeseq age, lag(0 1)) ///
    nocons two vce(clu surgeon2) auxiliary

and the output in stata is like this:

Code:

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  5.2919369
Step 2         f(b) =   .0051382

Group variable: sftidx                       Number of obs         =      8961
Time variable: newseq                        Number of groups      =      2055

Moment conditions:     linear =      13      Obs per group:    min =         2
nonlinear =       0                        avg =  4.360584
total =      13                        max =         9


s_it       Coef.   Std. Err.      z    P>z     [95% Conf. Interval]

/s_itlag1    .9226237   .1198261     7.70   0.000     .6877688    1.157479
/eta_it    .8967318   .1621367     5.53   0.000     .5789497    1.214514
/elective   -.8163833   3.944977    -0.21   0.836    -8.548397     6.91563
/afterchange~q   -.1239053   2.830054    -0.04   0.965    -5.670709    5.422898
/beforecancel    3.123461   4.865123     0.64   0.521    -6.412005    12.65893
/age    .3229168   .1191281     2.71   0.007     .0894301    .5564035

Instruments corresponding to the linear moment conditions:
1, model(diff):
L1.s_itlag1 L2.s_itlag1
2, model(diff):
L2.eta_it L3.eta_it
3, model(diff):
elective L1.elective sftlen beforecancel L1.beforecancel afterchangeseq
L1.afterchangeseq age L1.age

My second stage code is:

Code:

xtseqreg s_it (s_itlag1 eta_it elective afterchangeseq beforecancel ///
    age ) sftlen surgnum, first(md_gmm_s1, copy) ///
    iv(sftlen surgnum) vce(clu surgeon2) nocons

where the two variables

Code:

sftlen surgnum

are two time-invariant variables.

Could you please let me know the potential errors in my code? Thank you!

Last edited by Jerry Kim; 05 Oct 2021, 18:04.

Leave a comment:

Sebastian Kripfganz replied

05 Oct 2021, 01:48
You need to add the option auxiliary to your xtdpdgmm estimation. Then it should work.

Code:

xtdpdgmm ..., auxiliary ... eststo md_st1 xtseqreg ..., first(md_st1, copy) ...
Leave a comment:
Jerry Kim replied

05 Oct 2021, 01:40
Dear Prof. Kripfganz

Thanks for developing the xtseqreg command! I have a question regarding the storage of first stage's results. Can I utilize

Code:

eststo md_st1

to store the first stage estimation results (using

Code:

xtdpdgmm

) and then set

Code:

md_st1

as

Code:

first(md_st1,copy)

in the

Code:

first(,)

argument of the second stage's estimation?
Leave a comment:
Sebastian Kripfganz replied

02 Sep 2021, 07:55
I have fixed a minor bug in xtseqreg that could result in an incorrect error message in rare instances when using option vce(cluster) on a subsample of the data. The new version 1.2.4 is available on my website:

Code:

net install xtseqreg, from(http://www.kripfganz.de/stata/) replace
Leave a comment:
Sebastian Kripfganz replied

19 Aug 2021, 08:58
Following up on a discussion I just had elsewhere, here is a word of caution on predictions after the xtseqreg two-stage estimation: The postestimation predict command by default only creates predictions based on the first-stage estimates. To generate predictions based on the full model, predictions need to be obtained for both stages separately (using the equation() suboption of predict) and then added up:

Code:

webuse psidextract xtseqreg lwage (wks south smsa ms exp exp2 occ ind union) fem blk ed, both predict yhat1, xb equation(_first) predict yhat2, xb equation(_second) gen yhat = yhat1 + yhat2
Leave a comment:
Liza Vieira replied

05 Apr 2021, 05:56
Dear Sebastian,

many thanks for the comments.

I am going to address your alternative.
Leave a comment:
Sebastian Kripfganz replied

03 Apr 2021, 06:09
You already looked at all my usual recommendations. I am afraid I do not see anything obvious that could still be done here. With such a large sample size, the Arellano-Bond test might already pick up small deviations from the null hypothesis of no serial correlation. This could be due to any omitted variables. It then requires your judgement whether you worry much about such a small misspecification. If you can show that your results are robust to different specifications (e.g. different lag orders), and ideally the Hansen test does not reject the overidentifying restrictions, it might be okay to nevertheless accept this specification.

An alternative might be to use higher-order starting lags, say lagrange(3 .) instead of lagrange(2 .) which would allow for second-order serial correlation in the first-differenced errors, although at the cost of using weaker instruments.
Leave a comment:

Liza Vieira replied

02 Apr 2021, 11:19

Dear Statalisters,

I am using the command xtseqreg in my research.

My data set is characterized by a large (N aproximately equal to 130 000) and small T (year 2000 to 2017) when compared to N. Additionally, I have a set of time-varying independent variables (ln_pub_c1 ln_pub_c2 ln_icts ln_dhdi ln_dwgi ln_dspc ln_dtop10 ln_dneig) and time-invariant variables (ln_distcap ln_contig ln_comlang_off ln_colony ln_comcol). My dependent variable is ln_pub_col.

I have tried the diff-GMM and system-GMM specification and followed the sequential selection process adapted from Kiviet (2019,) as explained by Sebastian Kripfganz in his presentation at London Stata Conference in 2019.

As an example, I show the code used in one of the specifications and the obtained output.

HTML Code:

xtseqreg L(0/3).ln_pub_col L(0/3).(ln_pub_c1 ln_pub_c2 ln_icts ln_dhdi ln_dwgi ln_dspc ln_dtop10 ln_dneig), gmmiv(ln_pub_col, model(difference) lagrange(2 .)) gmmiv(ln_pub_col, model(level) difference lagrange(1 .)) gmmiv(ln_pub_c1 ln_pub_c2 ln_icts ln_dhdi ln_dwgi ln_dspc ln_dtop10 ln_dneig, model(difference) lagrange(2 .)) gmmiv(ln_pub_c1 ln_pub_c2 ln_icts ln_dhdi ln_dwgi ln_dspc ln_dtop10 ln_dneig, model(level) difference lagrange(1 .)) twostep vce(robust) teffects

Group variable: pair_n                       Number of obs         =    110764
Time variable: year                          Number of groups      =      8372

                                             Obs per group:    min =         1
                                                               avg =  13.23029
                                                               max =        15

                                             Number of instruments =      2266

                             (Std. Err. adjusted for 8,372 clusters in pair_n)
------------------------------------------------------------------------------
             |              WC-Robust
  ln_pub_col |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  ln_pub_col |
         L1. |   .4688867   .0063557    73.77   0.000     .4564297    .4813436
         L2. |   .2857157   .0072898    39.19   0.000     .2714279    .3000035
         L3. |   .2246439   .0064371    34.90   0.000     .2120273    .2372605
             |
   ln_pub_c1 |
         --. |  -.0027787   .0067151    -0.41   0.679    -.0159401    .0103827
         L1. |  -.0046365   .0046309    -1.00   0.317    -.0137129    .0044399
         L2. |   .0124372   .0039266     3.17   0.002     .0047412    .0201332
         L3. |  -.0027408   .0037742    -0.73   0.468    -.0101381    .0046565
             |
   ln_pub_c2 |
         --. |  -.0253773   .0059413    -4.27   0.000    -.0370221   -.0137325
         L1. |   .0097872   .0043259     2.26   0.024     .0013086    .0182657
         L2. |   .0116265   .0037105     3.13   0.002     .0043541     .018899
         L3. |   .0096613   .0033984     2.84   0.004     .0030006     .016322
             |
     ln_icts |
         --. |   .0001502   .0109222     0.01   0.989    -.0212568    .0215573
         L1. |  -.0197857   .0112639    -1.76   0.079    -.0418626    .0022911
         L2. |  -.0008872   .0068086    -0.13   0.896    -.0142318    .0124574
         L3. |   .0079444   .0050573     1.57   0.116    -.0019677    .0178565
             |
     ln_dhdi |
         --. |   .0515274   .3173365     0.16   0.871    -.5704408    .6734956
         L1. |  -.2789475    .412341    -0.68   0.499    -1.087121     .529226
         L2. |   .4715926   .2849979     1.65   0.098     -.086993    1.030178
         L3. |  -.3044555   .1959573    -1.55   0.120    -.6885248    .0796138
             |
     ln_dwgi |
         --. |   .0627715   .0237404     2.64   0.008     .0162412    .1093018
         L1. |  -.0416349   .0228634    -1.82   0.069    -.0864464    .0031766
         L2. |   .0023067   .0135162     0.17   0.864    -.0241846     .028798
         L3. |   .0129192   .0094862     1.36   0.173    -.0056733    .0315118
             |
     ln_dspc |
         --. |   .0098149   .0047202     2.08   0.038     .0005635    .0190663
         L1. |    .006029   .0030625     1.97   0.049     .0000265    .0120314
         L2. |   .0115451   .0030407     3.80   0.000     .0055854    .0175047
         L3. |   .0034157   .0029659     1.15   0.249    -.0023974    .0092289
             |
   ln_dtop10 |
         --. |   .0208839   .0030799     6.78   0.000     .0148473    .0269205
         L1. |   .0088835   .0017673     5.03   0.000     .0054198    .0123473
         L2. |  -.0003043   .0017246    -0.18   0.860    -.0036845     .003076
         L3. |   .0062354    .001771     3.52   0.000     .0027643    .0097064
             |
    ln_dneig |
         --. |  -2.427052   .0530159   -45.78   0.000    -2.530961   -2.323143
         L1. |   .7764577   .0392983    19.76   0.000     .6994344    .8534809
         L2. |   .4338752   .0387189    11.21   0.000     .3579875    .5097629
         L3. |   .3515533   .0376412     9.34   0.000     .2777778    .4253288
             |
        year |
       2004  |  -.0049636   .0049394    -1.00   0.315    -.0146446    .0047175
       2005  |   .0017011   .0053225     0.32   0.749    -.0087308     .012133
       2006  |  -.0053781   .0057684    -0.93   0.351    -.0166839    .0059277
       2007  |    .007022   .0056924     1.23   0.217    -.0041349    .0181789
       2008  |  -.0036337   .0053883    -0.67   0.500    -.0141946    .0069272
       2009  |    .001857   .0058449     0.32   0.751    -.0095987    .0133127
       2010  |  -.0038453   .0057092    -0.67   0.501    -.0150352    .0073446
       2011  |   .0010227   .0058642     0.17   0.862     -.010471    .0125164
       2012  |  -.0052501   .0061473    -0.85   0.393    -.0172985    .0067983
       2013  |  -.0173922   .0062736    -2.77   0.006    -.0296883   -.0050962
       2014  |   -.007815   .0066605    -1.17   0.241    -.0208694    .0052393
       2015  |   .0108063   .0069097     1.56   0.118    -.0027364     .024349
       2016  |   .0111071   .0073656     1.51   0.132    -.0033291    .0255433
       2017  |  -.0446102   .0077037    -5.79   0.000    -.0597092   -.0295112
             |
       _cons |   .2825371   .0291722     9.69   0.000     .2253607    .3397135
------------------------------------------------------------------------------

. estat serial

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1:     z =  -55.4828   Prob > |z|  =    0.0000
H0: no autocorrelation of order 2:     z =    6.0365   Prob > |z|  =    0.0000

However, I found the same behaviour in all the specifications I have tried: the Arellano-Bond test for the second order is always rejected. To overcome this issue, I used the collapse option, reduced the number of lags of the instruments and also considered the nonlinear Ahn and Schmidt (1995) moment conditions. I also tried to include in the model different lags for both dependent and independent variables. Still, in all cases, the test continues to be rejected.

Comments and help regarding these results are very welcome.

Thanks in advance.

Kiviet, J. F. (2019). Microeconometric dynamic panel data methods: Model specification and selection issues. MPRA Paper 95159, Munich Personal RePEc Archive.

Ahn, S. C., and P. Schmidt (1995). Efficient estimation of models for dynamic panel data. Journal of Econometrics 68 (1): 5–27

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: