XTSEQREG: new Stata command for sequential / two-stage (GMM) estimation of linear panel models

Sebastian Kripfganz started a topic XTSEQREG: new Stata command for sequential / two-stage (GMM) estimation of linear panel models

12 Feb 2017, 15:15

XTSEQREG: new Stata command for sequential / two-stage (GMM) estimation of linear panel models

Dear Statalisters,

I have just released a new Stata command for the estimation of linear panel data models. The main purpose of the xtseqreg command is the implementation of the two-stage estimation procedure described in my working paper with Claudia Schwarz in the context of linear (dynamic) panel data models with time-invariant regressors. In that paper, we suggest to run in a first stage a regression of the dependent variable on the time-varying regressors only, and to subsequently regress the first-stage residuals on the time-invariant regressors in a second stage. Instruments can be used at both stages and efficient estimation can be achieved with two-step GMM. At the second stage, the usual standard errors are invalid and need to be corrected. The respective analytical standard-error correction is the main purpose of this new command. For full details about the methodology and its benefits, please have a look at the paper.

Yet, the new command itself is much more flexible because it can also be used for IV/GMM estimation of a single stage only. It then mimics (part of) the behavior of existing commands for instrumental variable and GMM estimation of linear panel data models, in particular xtdpd and xtabond2 in the context of dynamic models. In part, the other commands achieve things that my command cannot deliver, but mine also adds some flexibility that the others do not offer. However, I want to emphasize that it is not my intention to introduce this new command as a competitor for the existing ones. The re-implementation of these GMM estimators was simply a necessary requirement to achieve the above-mentioned standard-error correction.

The new command is currently only available for installation from my own website and not yet from SSC:

Code:

. net install xtseqreg, from(http://www.kripfganz.de/stata/)

After the installation, detailed documentation of the syntax and available options can be found in the help files:

Code:

. help xtseqreg
. help xtseqreg postestimation

As always, comments and suggestions are welcome and highly appreciated.

Here is a brief example for a two-stage estimation of a dynamic Mincer equation. At the first stage, the log-wages are regressed on the time-varying regressors. The estimator is a two-step difference-GMM estimator (Arellano/Bond) with collapsed GMM-type instruments for the 2 lags of the dependent variable, standard instruments for the strictly exogenous regressors, and Windmeijer-corrected robust standard errors.

Code:

. xtseqreg L(0/2).lwage exp exp2 occ ind union, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp exp2 occ ind union, difference model(difference)) twostep vce(robust)

Group variable: id                           Number of obs         =      2975
Time variable: t                             Number of groups      =       595

                                             Obs per group:    min =         5
                                                               avg =         5
                                                               max =         5

                                             Number of instruments =        10

                                     (Std. Err. adjusted for clustering on id)
------------------------------------------------------------------------------
             |              WC-Robust
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       lwage |
         L1. |    .365887   .1722314     2.12   0.034     .0283197    .7034543
         L2. |   .1009276   .0732219     1.38   0.168    -.0425848    .2444399
             |
         exp |   .0501576   .0282205     1.78   0.076    -.0051536    .1054688
        exp2 |   -.000206    .000148    -1.39   0.164     -.000496     .000084
         occ |  -.0428486   .0283624    -1.51   0.131    -.0984379    .0127406
         ind |   .0481791   .0305408     1.58   0.115    -.0116798     .108038
       union |    .006991   .0288093     0.24   0.808    -.0494742    .0634562
       _cons |   2.737719   1.088102     2.52   0.012     .6050775     4.87036
------------------------------------------------------------------------------

With the following syntax, we can then run a second-stage instrumental-variables regression of the first-stage residuals on some time-invariant regressors. The first-stage results are automatically taken from the previous estimation. Just as an illustration, ed is assumed to be endogenous and instrumented with occ.

Code:

. xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, iv(occ fem blk, model(level)) vce(robust)

Group variable: id                           Number of obs         =      2975
Time variable: t                             Number of groups      =       595

------------------------------------------------------------------------------
Equation _first                              Equation _second
Number of obs         =      2975            Number of obs         =      2975
Number of groups      =       595            Number of groups      =       595

Obs per group:    min =         5            Obs per group:    min =         5
                  avg =         5                              avg =         5
                  max =         5                              max =         5

Number of instruments =        10            Number of instruments =         4

                                     (Std. Err. adjusted for clustering on id)
------------------------------------------------------------------------------
             |               Robust
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_first       |
       lwage |
         L1. |    .365887   .1722314     2.12   0.034     .0283197    .7034543
         L2. |   .1009276   .0732219     1.38   0.168    -.0425848    .2444399
             |
         exp |   .0501576   .0282205     1.78   0.076    -.0051536    .1054688
        exp2 |   -.000206    .000148    -1.39   0.164     -.000496     .000084
         occ |  -.0428486   .0283624    -1.51   0.131    -.0984379    .0127406
         ind |   .0481791   .0305408     1.58   0.115    -.0116798     .108038
       union |    .006991   .0288093     0.24   0.808    -.0494742    .0634562
       _cons |   2.737719   1.088102     2.52   0.012     .6050775     4.87036
-------------+----------------------------------------------------------------
_second      |
          ed |   .0634885   .0348497     1.82   0.068    -.0048158    .1317927
         fem |  -.0967082   .0575629    -1.68   0.093    -.2095295     .016113
         blk |  -.1531252   .1010073    -1.52   0.130     -.351096    .0448456
       _cons |  -.7936727   .4419754    -1.80   0.073    -1.659929    .0725831
------------------------------------------------------------------------------

Instead of specifying both stages one after the other, with some more complicated syntax the same results can also be obtained with a single command line:

Code:

. xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse equation(#1)) iv(exp exp2 occ ind union, difference model(difference) equation(#1)) iv(occ fem blk, model(level) equation(#2)) twostep vce(robust) both

As a postestimation command, estat overid provides Hansen's J-test for the validity of the overidentifying restrictions for both stages. (In the current example, the second stage is exactly identified.)

Code:

. estat overid

Hansen's J-test for equation _first                    chi2(2)     =    0.2935
H0: overidentifying restrictions are valid             Prob > chi2 =    0.8635

Hansen's J-test for equation _second                   chi2(0)     =    0.0000
note: coefficients are exactly identified              Prob > chi2 =         .

The following command line exactly replicates the above results for the first stage with xtabond2, including Hansen's J-test:

Code:

. xtabond2 L(0/2).lwage exp exp2 occ ind union, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp exp2 occ ind union, equation(diff)) twostep robust

Notice that the reported results for Hansen's J-test would differ between xtseqreg and xtabond2 if the one-step GMM estimator was used (the above example without option twostep) because xtabond2 silently still estimates the two-step estimator for this purpose while xtseqreg evaluates the first-step moment functions while still using an optimal weighting matrix (that would have been used in a second step).

Finally, also notice that xtabond2 might in some situations report an incorrect number of instruments because it does not always detect a linear relationship between instruments specified for the first-differenced and those for the levels model. This can happen in particular with time dummies if they are specified for both the first-differenced and the levels model (which is something that actually should not be done). In the following example (output omitted), xtabond2 reports 17 instruments, while xtseqreg obtains the identical result but correctly reports only 13 instruments. (This can happen when xtabond2 is used with either the option h(1) or h(2). It does not happen with the default option h(3). The default weighting matrix of xtseqreg corresponds to h(2) of xtabond2.) This has an important consequence because the degrees of freedom used for Hansen's J-test depend on the number of instruments. Hence, the reported J-test by xtabond2 might be misleading.

Code:

. xtseqreg L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp2 occ ind union, difference model(difference)) iv(tdum4-tdum7, model(diff)) iv(tdum4-tdum7, model(level)) twostep vce(robust)
. xtabond2 L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp2 occ ind union, equation(diff)) ivstyle(tdum4-tdum7, equation(diff)) ivstyle(tdum4-tdum7, equation(level)) twostep robust h(2)

You can read more about this last observation in another Statalist topic: System GMM - Time Dummies.

Reference:

Kripfganz, S., and C. Schwarz (2015). Estimation of linear dynamic panel data models with time-invariant regressors. ECB Working Paper 1838, European Central Bank.

Tags: fixed effects, gmm, panel data, standard errors, two-stage estimation

Dario Maimone Ansaldo Patti replied

22 Jun 2023, 06:45
Sebastian Kripfganz Solved!! In xtdpdqml I used time2-time19, while in xtseqreg I used time3-time19. Estimates are identical. Just for someone that may encounter a similar problem. I guess you are more expert than me to understand why it works in this way. Thanks for your great help through all my problems.
Leave a comment:
Sebastian Kripfganz replied

22 Jun 2023, 06:43
I suspect that something odd is happening due to the severely unbalanced nature of the panel. The QML estimator is probably not ideal under these circumstances.
Leave a comment:
Dario Maimone Ansaldo Patti replied

22 Jun 2023, 06:39
Sebastian Kripfganz indeed the problem is with the time dummies, as everything runs smoothly if I remove them at all.
Leave a comment:
Sebastian Kripfganz replied

22 Jun 2023, 06:25
I am afraid in that case I am out of ideas.
Leave a comment:
Dario Maimone Ansaldo Patti replied

22 Jun 2023, 06:16
Sebastian Kripfganz Thanks. I tried but still I get the same error message. Apparently in the estimation only time3 to time19 are included. I tried using time3-time19 and I also listed all the included time dummies but the result does not change.
Leave a comment:
Sebastian Kripfganz replied

22 Jun 2023, 06:10
That is probably because of the omitted time dummies. Try replacing time* in the xtseqreg command line with the unabreviated list of non-omitted dummies.
Leave a comment:

Dario Maimone Ansaldo Patti replied

22 Jun 2023, 05:45

Hi All,

I am afraid I still have an issue. I run the following commands as indicated in the last threads:

Code:

xtdpdqml investment l.mtb time*, fe mlparams vce(robust)

note: time1 omitted because of collinearity.
note: time20 omitted because of collinearity.
note: 824 groups are dropped due to gaps or insufficient number of observations
note: time2 omitted because of collinearity

Quasi-maximum likelihood estimation
Iteration 0:  f(p) =  30369.288  
Iteration 1:  f(p) =  30390.753  
Iteration 2:  f(p) =   30439.82  
Iteration 3:  f(p) =  30444.373  
Iteration 4:  f(p) =  30444.472  
Iteration 5:  f(p) =  30444.472  

Group variable: id                           Number of obs         =     13898
Time variable: year                          Number of groups      =      1245

Fixed effects                                Obs per group:    min =         2
                                                               avg =  11.16305
                                                               max =        18
                                     (Std. err. adjusted for clustering on id)
------------------------------------------------------------------------------
             |               Robust
D.investment | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
_model       |
  investment |
         LD. |   .4813976   .0250801    19.19   0.000     .4322415    .5305538
             |
         mtb |
         LD. |   .0036694   .0004569     8.03   0.000      .002774    .0045649

est store qml

Then i run:

Code:

xtseqreg investment (L.investment l.mtb time*) l.efi, first(qml, equation(#1) nocons)

option first() incorrectly specified -- variable names do not match
r(322); t=0.10 13:44:48

Which is the reason of the error message above? I cannot figure it out.

Thanks for your suggestions.

Dario

Last edited by Dario Maimone Ansaldo Patti; 22 Jun 2023, 06:01.

Leave a comment:

Dario Maimone Ansaldo Patti replied

21 Jun 2023, 06:14
Sebastian Kripfganz Thanks a lot!!
Leave a comment:
Sebastian Kripfganz replied

21 Jun 2023, 06:12
Indeed, xtdpdbc cannot (currently) be used with xtseqreg. (It is on my to-do list, but not as a matter of urgency.) You can use my xtdpdqml command instead, which performs ML estimation of linear dynamic panel data models and has been shown to perform very similar to xtdpdbc. Example code:

Code:

xtdpdqml lwage wks south smsa ms exp exp2 occ ind union, fe mlparams estimates store qml xtseqreg lwage (L.lwage wks south smsa ms exp exp2 occ ind union) fem blk ed, first(qml, equation(#1) noconstant)
Leave a comment:
Dario Maimone Ansaldo Patti replied

21 Jun 2023, 06:04
Sebastian Kripfganz thanks a lot! Last question and I would not bother you anymore. I tried to use xtdpdbc to correct the bias in the dynamic estimation. If i am correct,i cannot use it before xtseqreg. Is there any way to replicate the estimation of xtdpdbc maybe using xtdpdgmm?
Leave a comment:

Sebastian Kripfganz replied

21 Jun 2023, 05:45

You xtseqreg specification is just a pooled OLS estimator. You cannot directly replicate the FE estimator with xtseqreg (unless you explicitly throw in all the group-specific dummy variables). However, you can use the xtdpdgmm command for this task. Example:

Code:

. webuse psidextract

. xtdpdgmm lwage wks south smsa ms exp exp2 occ ind union, model(mdev) norescale iv(wks south smsa ms exp exp2 occ ind union) auxiliary
note: conventional one-step standard errors may not be valid

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  9.041e-24

Group variable: id                           Number of obs         =      4165
Time variable: t                             Number of groups      =       595

Moment conditions:     linear =      10      Obs per group:    min =         7
                    nonlinear =       0                        avg =         7
                        total =      10                        max =         7

------------------------------------------------------------------------------
       lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        /wks |   .0008359   .0005989     1.40   0.163    -.0003379    .0020098
      /south |  -.0018612    .034256    -0.05   0.957    -.0690018    .0652794
       /smsa |  -.0424691   .0194039    -2.19   0.029       -.0805   -.0044383
         /ms |  -.0297259   .0189596    -1.57   0.117     -.066886    .0074343
        /exp |   .1132083   .0024679    45.87   0.000     .1083712    .1180453
       /exp2 |  -.0004184   .0000545    -7.67   0.000    -.0005252   -.0003115
        /occ |  -.0214765   .0137663    -1.56   0.119    -.0484579     .005505
        /ind |   .0192101   .0154268     1.25   0.213    -.0110259    .0494461
      /union |   .0327849    .014904     2.20   0.028     .0035735    .0619962
      /_cons |   4.648767   .0459639   101.14   0.000      4.55868    4.738855
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(mdev):
   wks south smsa ms exp exp2 occ ind union
 2, model(level):
   _cons

. estimates store first

. xtseqreg lwage (wks south smsa ms exp exp2 occ ind union) fem blk ed, first(first, copy)

Group variable: id                           Number of obs         =      4165
Time variable: t                             Number of groups      =       595

------------------------------------------------------------------------------
Equation _first                              Equation _second
Number of obs         =      4165            Number of obs         =      4165
Number of groups      =       595            Number of groups      =       595

Obs per group:    min =         7            Obs per group:    min =         7
                  avg =         7                              avg =         7
                  max =         7                              max =         7

                                             Number of instruments =         4

------------------------------------------------------------------------------
       lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
_first       |
         wks |   .0008359   .0005989     1.40   0.163    -.0003379    .0020098
       south |  -.0018612    .034256    -0.05   0.957    -.0690018    .0652794
        smsa |  -.0424691   .0194039    -2.19   0.029       -.0805   -.0044383
          ms |  -.0297259   .0189596    -1.57   0.117     -.066886    .0074343
         exp |   .1132083   .0024679    45.87   0.000     .1083712    .1180453
        exp2 |  -.0004184   .0000545    -7.67   0.000    -.0005252   -.0003115
         occ |  -.0214765   .0137663    -1.56   0.119    -.0484579     .005505
         ind |   .0192101   .0154268     1.25   0.213    -.0110259    .0494461
       union |   .0327849    .014904     2.20   0.028     .0035735    .0619962
       _cons |   4.648767   .0459639   101.14   0.000      4.55868    4.738855
-------------+----------------------------------------------------------------
_second      |
         fem |  -.1300288   .0511237    -2.54   0.011    -.2302295   -.0298281
         blk |  -.2750723   .0593419    -4.64   0.000    -.3913803   -.1587644
          ed |   .1443834   .0057975    24.90   0.000     .1330206    .1557462
       _cons |  -1.820138   .0767924   -23.70   0.000    -1.970648   -1.669628
------------------------------------------------------------------------------

Last edited by Sebastian Kripfganz; 21 Jun 2023, 05:47.

Leave a comment:

Dario Maimone Ansaldo Patti replied

21 Jun 2023, 05:12
Or maybe it is because the group fixed effects are removed in the first stage and the model is estimated using simply an OLS?
Leave a comment:
Dario Maimone Ansaldo Patti replied

21 Jun 2023, 05:06
Or maybe it is because the group fixed effects are removed in the first stage and the model is estimated using simply an OLS?
Leave a comment:

Dario Maimone Ansaldo Patti replied

21 Jun 2023, 05:03

Dear All,

I read in one of the thread above (but I cannot find it anymore) that it is possible to reproduce fixed effects panel data model using xtseqreg, which then I can use with first option to estimate the time-invariant regressor. Suppose I estimate the following:

Code:

xtreg investment l.investment l.mtb time*, fe r

where time* is a set of time dummies.

I got the following:

Code:

investment | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  investment |
         L1. |   .3639427   .0172223    21.13   0.000     .3301698    .3977155
             |
         mtb |
         L1. |   .0029748   .0004972     5.98   0.000     .0019998    .0039497
             |
       time1 |          0  (omitted)
       time2 |   .0021145   .0011499     1.84   0.066    -.0001404    .0043694
       time3 |  -.0038704   .0011685    -3.31   0.001    -.0061618   -.0015789
       time4 |  -.0023118    .001148    -2.01   0.044    -.0045629   -.0000606
       time5 |   .0015994   .0011059     1.45   0.148    -.0005692     .003768
       time6 |   .0036055   .0010818     3.33   0.001     .0014841    .0057268
       time7 |    .006171    .001096     5.63   0.000     .0040218    .0083203
       time8 |   .0061567   .0011815     5.21   0.000     .0038397    .0084736
       time9 |   .0073502   .0011562     6.36   0.000     .0050829    .0096174
      time10 |  -.0072064   .0011901    -6.06   0.000    -.0095403   -.0048726
      time11 |   .0006876   .0010464     0.66   0.511    -.0013644    .0027396
      time12 |   .0061671    .001007     6.12   0.000     .0041924    .0081418
      time13 |   .0066317   .0010373     6.39   0.000     .0045974    .0086659
      time14 |    .004543   .0009889     4.59   0.000     .0026038    .0064822
      time15 |   .0061171   .0009873     6.20   0.000      .004181    .0080533
      time16 |    .004222   .0009958     4.24   0.000     .0022693    .0061748
      time17 |     .00086   .0009237     0.93   0.352    -.0009515    .0026715
      time18 |   .0029557   .0009289     3.18   0.001     .0011342    .0047773
      time19 |   .0041424   .0008427     4.92   0.000     .0024898     .005795
      time20 |          0  (omitted)
       _cons |   .0217482   .0013653    15.93   0.000     .0190709    .0244255
-------------+----------------------------------------------------------------
     sigma_u |  .03176246
     sigma_e |  .02558575
         rho |  .60647023   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Then I try to replicate the same estimation using xtseqreg:

Code:

xtseqreg investment l.investment l.mtb time*, vce(robust)

           |               Robust
  investment | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
  investment |
         L1. |    .806008   .0105316    76.53   0.000     .7853665    .8266496
             |
         mtb |
         L1. |   .0010061   .0003902     2.58   0.010     .0002413     .001771
             |
       time1 |          0  (omitted)
       time2 |          0  (omitted)
       time3 |  -.0048769   .0014796    -3.30   0.001    -.0077769    -.001977
       time4 |  -.0012647   .0012213    -1.04   0.300    -.0036584    .0011291
       time5 |   .0038623   .0012063     3.20   0.001     .0014979    .0062267
       time6 |    .004198    .001179     3.56   0.000     .0018872    .0065089
       time7 |   .0049791   .0011667     4.27   0.000     .0026925    .0072658
       time8 |   .0032672   .0012606     2.59   0.010     .0007965    .0057378
       time9 |   .0039502   .0013825     2.86   0.004     .0012406    .0066598
      time10 |  -.0121011   .0013418    -9.02   0.000     -.014731   -.0094713
      time11 |   .0030101    .001204     2.50   0.012     .0006504    .0053699
      time12 |   .0076012   .0011493     6.61   0.000     .0053487    .0098537
      time13 |   .0045574     .00116     3.93   0.000     .0022838    .0068309
      time14 |   .0016179   .0012408     1.30   0.192    -.0008139    .0040498
      time15 |   .0037481   .0012114     3.09   0.002     .0013737    .0061225
      time16 |   .0004825   .0012314     0.39   0.695    -.0019309    .0028959
      time17 |  -.0028496   .0012576    -2.27   0.023    -.0053145   -.0003848
      time18 |   .0023119   .0011754     1.97   0.049     8.17e-06    .0046157
      time19 |   .0037335   .0011869     3.15   0.002     .0014073    .0060598
      time20 |  -.0017875   .0011705    -1.53   0.127    -.0040816    .0005066
       _cons |   .0061575   .0010938     5.63   0.000     .0040137    .0083013
------------------------------------------------------------------------------

The results are totally different. I am probably missing something....but what? Any suggestion would be highly appreciated.

Thanks,

Dario

Announcement

XTSEQREG: new Stata command for sequential / two-stage (GMM) estimation of linear panel models

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: