XTSEQREG: new Stata command for sequential / two-stage (GMM) estimation of linear panel models

Sebastian Kripfganz

Join Date: May 2014
Posts: 2437

XTSEQREG: new Stata command for sequential / two-stage (GMM) estimation of linear panel models

12 Feb 2017, 15:15

Dear Statalisters,

I have just released a new Stata command for the estimation of linear panel data models. The main purpose of the xtseqreg command is the implementation of the two-stage estimation procedure described in my working paper with Claudia Schwarz in the context of linear (dynamic) panel data models with time-invariant regressors. In that paper, we suggest to run in a first stage a regression of the dependent variable on the time-varying regressors only, and to subsequently regress the first-stage residuals on the time-invariant regressors in a second stage. Instruments can be used at both stages and efficient estimation can be achieved with two-step GMM. At the second stage, the usual standard errors are invalid and need to be corrected. The respective analytical standard-error correction is the main purpose of this new command. For full details about the methodology and its benefits, please have a look at the paper.

Yet, the new command itself is much more flexible because it can also be used for IV/GMM estimation of a single stage only. It then mimics (part of) the behavior of existing commands for instrumental variable and GMM estimation of linear panel data models, in particular xtdpd and xtabond2 in the context of dynamic models. In part, the other commands achieve things that my command cannot deliver, but mine also adds some flexibility that the others do not offer. However, I want to emphasize that it is not my intention to introduce this new command as a competitor for the existing ones. The re-implementation of these GMM estimators was simply a necessary requirement to achieve the above-mentioned standard-error correction.

The new command is currently only available for installation from my own website and not yet from SSC:

Code:

. net install xtseqreg, from(http://www.kripfganz.de/stata/)

After the installation, detailed documentation of the syntax and available options can be found in the help files:

Code:

. help xtseqreg
. help xtseqreg postestimation

As always, comments and suggestions are welcome and highly appreciated.

Here is a brief example for a two-stage estimation of a dynamic Mincer equation. At the first stage, the log-wages are regressed on the time-varying regressors. The estimator is a two-step difference-GMM estimator (Arellano/Bond) with collapsed GMM-type instruments for the 2 lags of the dependent variable, standard instruments for the strictly exogenous regressors, and Windmeijer-corrected robust standard errors.

Code:

. xtseqreg L(0/2).lwage exp exp2 occ ind union, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp exp2 occ ind union, difference model(difference)) twostep vce(robust)

Group variable: id                           Number of obs         =      2975
Time variable: t                             Number of groups      =       595

                                             Obs per group:    min =         5
                                                               avg =         5
                                                               max =         5

                                             Number of instruments =        10

                                     (Std. Err. adjusted for clustering on id)
------------------------------------------------------------------------------
             |              WC-Robust
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       lwage |
         L1. |    .365887   .1722314     2.12   0.034     .0283197    .7034543
         L2. |   .1009276   .0732219     1.38   0.168    -.0425848    .2444399
             |
         exp |   .0501576   .0282205     1.78   0.076    -.0051536    .1054688
        exp2 |   -.000206    .000148    -1.39   0.164     -.000496     .000084
         occ |  -.0428486   .0283624    -1.51   0.131    -.0984379    .0127406
         ind |   .0481791   .0305408     1.58   0.115    -.0116798     .108038
       union |    .006991   .0288093     0.24   0.808    -.0494742    .0634562
       _cons |   2.737719   1.088102     2.52   0.012     .6050775     4.87036
------------------------------------------------------------------------------

With the following syntax, we can then run a second-stage instrumental-variables regression of the first-stage residuals on some time-invariant regressors. The first-stage results are automatically taken from the previous estimation. Just as an illustration, ed is assumed to be endogenous and instrumented with occ.

Code:

. xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, iv(occ fem blk, model(level)) vce(robust)

Group variable: id                           Number of obs         =      2975
Time variable: t                             Number of groups      =       595

------------------------------------------------------------------------------
Equation _first                              Equation _second
Number of obs         =      2975            Number of obs         =      2975
Number of groups      =       595            Number of groups      =       595

Obs per group:    min =         5            Obs per group:    min =         5
                  avg =         5                              avg =         5
                  max =         5                              max =         5

Number of instruments =        10            Number of instruments =         4

                                     (Std. Err. adjusted for clustering on id)
------------------------------------------------------------------------------
             |               Robust
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_first       |
       lwage |
         L1. |    .365887   .1722314     2.12   0.034     .0283197    .7034543
         L2. |   .1009276   .0732219     1.38   0.168    -.0425848    .2444399
             |
         exp |   .0501576   .0282205     1.78   0.076    -.0051536    .1054688
        exp2 |   -.000206    .000148    -1.39   0.164     -.000496     .000084
         occ |  -.0428486   .0283624    -1.51   0.131    -.0984379    .0127406
         ind |   .0481791   .0305408     1.58   0.115    -.0116798     .108038
       union |    .006991   .0288093     0.24   0.808    -.0494742    .0634562
       _cons |   2.737719   1.088102     2.52   0.012     .6050775     4.87036
-------------+----------------------------------------------------------------
_second      |
          ed |   .0634885   .0348497     1.82   0.068    -.0048158    .1317927
         fem |  -.0967082   .0575629    -1.68   0.093    -.2095295     .016113
         blk |  -.1531252   .1010073    -1.52   0.130     -.351096    .0448456
       _cons |  -.7936727   .4419754    -1.80   0.073    -1.659929    .0725831
------------------------------------------------------------------------------

Instead of specifying both stages one after the other, with some more complicated syntax the same results can also be obtained with a single command line:

Code:

. xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse equation(#1)) iv(exp exp2 occ ind union, difference model(difference) equation(#1)) iv(occ fem blk, model(level) equation(#2)) twostep vce(robust) both

As a postestimation command, estat overid provides Hansen's J-test for the validity of the overidentifying restrictions for both stages. (In the current example, the second stage is exactly identified.)

Code:

. estat overid

Hansen's J-test for equation _first                    chi2(2)     =    0.2935
H0: overidentifying restrictions are valid             Prob > chi2 =    0.8635

Hansen's J-test for equation _second                   chi2(0)     =    0.0000
note: coefficients are exactly identified              Prob > chi2 =         .

The following command line exactly replicates the above results for the first stage with xtabond2, including Hansen's J-test:

Code:

. xtabond2 L(0/2).lwage exp exp2 occ ind union, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp exp2 occ ind union, equation(diff)) twostep robust

Notice that the reported results for Hansen's J-test would differ between xtseqreg and xtabond2 if the one-step GMM estimator was used (the above example without option twostep) because xtabond2 silently still estimates the two-step estimator for this purpose while xtseqreg evaluates the first-step moment functions while still using an optimal weighting matrix (that would have been used in a second step).

Finally, also notice that xtabond2 might in some situations report an incorrect number of instruments because it does not always detect a linear relationship between instruments specified for the first-differenced and those for the levels model. This can happen in particular with time dummies if they are specified for both the first-differenced and the levels model (which is something that actually should not be done). In the following example (output omitted), xtabond2 reports 17 instruments, while xtseqreg obtains the identical result but correctly reports only 13 instruments. (This can happen when xtabond2 is used with either the option h(1) or h(2). It does not happen with the default option h(3). The default weighting matrix of xtseqreg corresponds to h(2) of xtabond2.) This has an important consequence because the degrees of freedom used for Hansen's J-test depend on the number of instruments. Hence, the reported J-test by xtabond2 might be misleading.

Code:

. xtseqreg L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp2 occ ind union, difference model(difference)) iv(tdum4-tdum7, model(diff)) iv(tdum4-tdum7, model(level)) twostep vce(robust)
. xtabond2 L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp2 occ ind union, equation(diff)) ivstyle(tdum4-tdum7, equation(diff)) ivstyle(tdum4-tdum7, equation(level)) twostep robust h(2)

You can read more about this last observation in another Statalist topic: System GMM - Time Dummies.

Reference:

Kripfganz, S., and C. Schwarz (2015). Estimation of linear dynamic panel data models with time-invariant regressors. ECB Working Paper 1838, European Central Bank.

https://twitter.com/Kripfganz

Tags: fixed effects, gmm, panel data, standard errors, two-stage estimation

Sebastian Kripfganz

Join Date: May 2014

Posts: 2437
#2

12 Feb 2017, 18:28

I need to add that I have used the following data set to generate the above examples:

Code:

. webuse psidextract

Also, my statement about the default weighting matrix used by xtseqreg in the last paragraph was wrong. (It is easy to get lost with all the available options.) The default is the same as with xtabond2. In the last example above, for the two commands to be equivalent, the option wmatrix(independent) needs to be added to the xtseqreg command line. (In this particular example, however, the estimates remain the same.)

https://twitter.com/Kripfganz
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2437
#3

14 Feb 2017, 17:19

There is already a first update available:

Code:

adoupdate xtseqreg, update

As I have experienced that many (or at least some) people seem to struggle with the correct specification of time-fixed effects in estimation commands for (dynamic) panel data models and motivated by the discussion mentioned at the very end of my opening post, I have added the teffects option to my xtseqreg command. This option adds time-fixed effects to the model and makes sure that the correct number of dummy variables is added as well as the correct type and number of corresponding instruments.

https://twitter.com/Kripfganz
1 like
Comment

Sebastian Kripfganz

Join Date: May 2014
Posts: 2437

27 Feb 2017, 15:29

Another update is available that adds the Arellano-Bond test for absence of serial correlation in the first-differenced errors as a postestimation command. To continue the example from above:

Code:

. webuse psidextract

. quietly xtseqreg L(0/2).lwage exp exp2 occ ind union, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp exp2 occ ind union, difference model(difference)) twostep vce(robust)

. estat serial, ar(1/3)

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1:     z =   -3.3576   Prob > |z|  =    0.0008
H0: no autocorrelation of order 2:     z =   -0.4852   Prob > |z|  =    0.6275
H0: no autocorrelation of order 3:     z =    0.2946   Prob > |z|  =    0.7683

https://twitter.com/Kripfganz

Comment

Nadia Oue

Join Date: Mar 2017

Posts: 15
#5

20 Mar 2017, 06:30

Originally posted by Sebastian Kripfganz View Post

There is already a first update available:

Code:

adoupdate xtseqreg, update

As I have experienced that many (or at least some) people seem to struggle with the correct specification of time-fixed effects in estimation commands for (dynamic) panel data models and motivated by the discussion mentioned at the very end of my opening post, I have added the teffects option to my xtseqreg command. This option adds time-fixed effects to the model and makes sure that the correct number of dummy variables is added as well as the correct type and number of corresponding instruments.

Dear Sebastian,

Thanks a lot for this very helpful thread.
I am one of those people who struggle with the correct specification of time fixed-effects. I try to replicate the xtabond2 (estimation for a different dataset). However, I struggle to specify the tdum and so cannot run the estimation.
I basically duplicate your code with my own variables and stata do not recognize the tdum. I noted that the psidextract data set has tdum1-7 in the list of the variables. How could I generate or specified the time fixed effects?

Thanks for your help.
BR

Nadia
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2437
#6

21 Mar 2017, 04:43

Nadia Oue,
Welcome to Statalist. Could you please show us the command lines that you have typed in Stata as well as Stata's output when you type xtset. Otherwise, it is hard to give specific advice. (Please see the FAQ of this forum, in particular Section 12: http://www.statalist.org/forums/help#stata)

https://twitter.com/Kripfganz
Comment
Nadia Oue

Join Date: Mar 2017

Posts: 15
#7

21 Mar 2017, 05:59

Originally posted by Sebastian Kripfganz View Post

Nadia Oue,
Welcome to Statalist. Could you please show us the command lines that you have typed in Stata as well as Stata's output when you type xtset. Otherwise, it is hard to give specific advice. (Please see the FAQ of this forum, in particular Section 12: http://www.statalist.org/forums/help#stata)

Dear Sebastian,

Thanks a lot for your quick reply. Thanks for the FAQ link.

Here my xtset:

xtset
panel variable: ccode (unbalanced)
time variable: year, 1946 to 2016
delta: 1 unit

And my code:

Code:

[xtabond2 L(0/2).fh_polity2 Quantity2 wdi_gdpcapcur eu_demd3dens wdi_imigs al_ethnic ross_oil_prod tdum4-tdum7, gmmstyle(L.fh_polity2, equation(diff) lag limits (1 4) collapse) ivstyle (Quantity2 wdi_gdpcapcur eu_demd3dens wdi_imigs al_ethnic ross_oil_prod, difference equation(diff)) ivstyle (tdum4-tdum7, equation(diff)) ivstyle (tdum4-tdum7, equation(level)) twostep robust h(2)

Many thanks for your kind assistance

Nadia

Last edited by Nadia Oue; 21 Mar 2017, 06:02.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2437
#8

21 Mar 2017, 06:13

Thanks for the additional information.

If you do not have time dummies yet in your data set, you can generate them with the following command:

Code:

tabulate year, generate(tdum)

Please note two further remarks:
Your time span ranges from 1946 to 2016. This is a rather large time dimension and the GMM estimators as implemented by xtabond2, xtdpd, and xtseqreg are usually not appropriate for such "large T" circumstances.

When you read again the last paragraph of my opening post and follow the link that follows the example there, please notice in particular the recommendation NOT to use instruments for the time dummies in both the first-differenced and the levels equation:

Originally posted by Sebastian Kripfganz View Post

Finally, also notice that xtabond2 might in some situations report an incorrect number of instruments because it does not always detect a linear relationship between instruments specified for the first-differenced and those for the levels model. This can happen in particular with time dummies if they are specified for both the first-differenced and the levels model (which is something that actually should not be done). In the following example (output omitted), xtabond2 reports 17 instruments, while xtseqreg obtains the identical result but correctly reports only 13 instruments. (This can happen when xtabond2 is used with either the option h(1) or h(2). It does not happen with the default option h(3). The default weighting matrix of xtseqreg corresponds to h(2) of xtabond2.) This has an important consequence because the degrees of freedom used for Hansen's J-test depend on the number of instruments. Hence, the reported J-test by xtabond2 might be misleading.

Code:

. xtseqreg L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp2 occ ind union, difference model(difference)) iv(tdum4-tdum7, model(diff)) iv(tdum4-tdum7, model(level)) twostep vce(robust) . xtabond2 L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp2 occ ind union, equation(diff)) ivstyle(tdum4-tdum7, equation(diff)) ivstyle(tdum4-tdum7, equation(level)) twostep robust h(2)

You can read more about this last observation in another Statalist topic: System GMM - Time Dummies.

https://twitter.com/Kripfganz
Comment
Nadia Oue

Join Date: Mar 2017

Posts: 15
#9

21 Mar 2017, 08:03

Dear Sebastian,
Thanks again for your kind assistance.
1. I took a smaller subset of my data (1990-2016)
2. I introduce the time dummies only at the first differenced equation only.
Now I obtain "No observations r(2000)"

Code:

xtabond2 L(0/2).fh_polity2 Quantity Quantity2 wdi_gdpcapcur eu_demd3dens wdi_imigs al_ethnic ross_oil_prod, gmmstyle (L.fh_polity2, equation(diff) lag limits(1 4) collapse) ivstyle ( Quantity Quantity2 wdi_gdpcapcur eu_demd3dens wdi_imigs al_ethnic ross_oil_prod, difference equation(diff)) ivstyle (tdum4-tdum27, equation(diff)) ivstyle (equation(level)) twostep robust h(2)

I will have two additional silly questions. How do I determine the lag limits and the time dummies lags in this case?
Many thanks for your help

Br,
Nadia
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2437
#10

21 Mar 2017, 08:32

Did you really specify both of the following options?

Code:

ivstyle(tdum4-tdum27, equation(diff)) ivstyle(equation(level))

The second one should result in an error because no variables are specified. You should remove it if not needed. In addition, you also need to specify the time dummies explicitly as independent variables before the first comma in the command line.

It is hard to say why you get the r(2000) error message. Could you please report the Stata output of the tabulate command that you used to generate the time dummies? Also, does the xtabond2 command produce proper output without the time dummies?

The question about the selection of lag limits is less trivial to answer. I recommend that you have a look at David Roodman's paper on How to do xtabond2. The time dummies should not be lagged.

https://twitter.com/Kripfganz
Comment
Nadia Oue

Join Date: Mar 2017

Posts: 15
#11

22 Mar 2017, 08:39

Dear Sebastian,

Thanks.

1. The r(2000) was due to the choice of my variables that I corrected.

Now I get proper outputs (even for the longer time span range 1946-2016)

2.On the time dummies, I meant, what did guide your choice of tdum4-tdum7 (tdum7 I get it but why from tdum4)?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2437
#12

22 Mar 2017, 09:12

If you have time periods 1 to 7, the first two time dummies (tdum1 and tdum2) cannot be included because the first two periods are removed from the estimation sample due to the two lags of the dependent variable in the above example. The third time dummy (tdum3) cannot be included because otherwise there would be perfect colinearity of all time dummies together with the regression intercept ("dummy trap"). Hence, only tdum4 to tdum7 can be used. (Of course, instead of tdum3 any other time dummy could be excluded which only changes the reference period.)

https://twitter.com/Kripfganz
Comment
Nadia Oue

Join Date: Mar 2017

Posts: 15
#13

23 Mar 2017, 09:11

Dear Sebastian,
Many thanks for your help throughout. This is the first time I am using Xtabond and you have been very helpful.
Can I ask a favor that you have a look to my code and tell me if it does look good? Thanks

So
My dependent variable is Democraty (ft_polity2) and my regressors are: laggedDemo , Energy consumption and my control variables are: gdp per capita, Oil Production, and ethnic fragmentation.

panel variable: ccode (unbalanced)
time variable: year, 1990 to 2016
delta: 1 unit

my code is:

Code:

xtabond2 L(0/2).fh_polity2 Ener Ener2 wdi_gdpcapcur al_ethnic ross_oil_prod tdum4-tdum27, gmmstyle(L.fh_polity2, equation(diff) lag(1 .) collapse) ivstyle(Ener Ener2 wdi_gdpcapcur ross_oil_prod al_ethnic, equation(diff)) ivstyle(tdum4-tdum27, equation(level)) twostep robust h(2)

Many thanks.

BR,
Nadia
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2437
#14

23 Mar 2017, 09:48

Indeed, your specification looks good, provided that you can assume that your regressors and control variables (besides the lagged dependent variable) are strictly exogenous. If that is a good assumption or not depends on your underlying economic theory. In addition, you would need to check the usual specification tests (Arellano-Bond test, Hansen test).

If you have further questions that are specific to the xtabond2 command, I would recommend to start a new Statalist topic because this topic is primarily about the new xtseqreg command.

https://twitter.com/Kripfganz
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2437
#15

07 Jun 2017, 16:25

xtseqreg has been updated to version 1.1.2.

Code:

adoupdate xtseqreg, update

In combination with my other new command xtdpdgmm, the xtseqreg command can now also be used for two-stage estimation based on a first-stage Ahn-Schmidt GMM estimator with nonlinear moment conditions. Continuing the example from the beginning of this topic based on the psidextract data set:

Code:

. xtdpdgmm L(0/2).lwage exp exp2 occ ind union, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp exp2 occ ind union, difference model(difference)) twostep vce(robust) noserial aux . xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, first(, copy) iv(occ fem blk, model(level)) vce(robust).

With this new version, the Arellano-Bond test statistic (estat serial) after two-step robust estimation might now slightly differ from previous versions (and the one reported by xtabond2 or xtdpd) because xtseqreg now fully accounts for the finite-sample Windmeijer correction in the computation of this test statistic (while other commands do not). Postestimation statistics now further include the possibility to compute difference-in-Hansen tests and generalized Hausman tests. The help file and the Statalist topic on the xtdpdgmm command provide further information on these tests. (xtseqreg and xtdpdgmm produce identical results for one-stage GMM estimation with linear moment conditions only.)

Thanks to Kit Baum, this version of xtseqreg is now also available for installation from SSC:

Code:

ssc install xtseqreg

The SSC version of this command might see less frequent updates than the version on my own website. Yet, some users may not be able to install the package directly from my website due to corporate firewall restrictions, etc.

https://twitter.com/Kripfganz
1 like
Comment

Announcement

XTSEQREG: new Stata command for sequential / two-stage (GMM) estimation of linear panel models

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment