Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • XTSEQREG: new Stata command for sequential / two-stage (GMM) estimation of linear panel models

    Dear Statalisters,

    I have just released a new Stata command for the estimation of linear panel data models. The main purpose of the xtseqreg command is the implementation of the two-stage estimation procedure described in my working paper with Claudia Schwarz in the context of linear (dynamic) panel data models with time-invariant regressors. In that paper, we suggest to run in a first stage a regression of the dependent variable on the time-varying regressors only, and to subsequently regress the first-stage residuals on the time-invariant regressors in a second stage. Instruments can be used at both stages and efficient estimation can be achieved with two-step GMM. At the second stage, the usual standard errors are invalid and need to be corrected. The respective analytical standard-error correction is the main purpose of this new command. For full details about the methodology and its benefits, please have a look at the paper.

    Yet, the new command itself is much more flexible because it can also be used for IV/GMM estimation of a single stage only. It then mimics (part of) the behavior of existing commands for instrumental variable and GMM estimation of linear panel data models, in particular xtdpd and xtabond2 in the context of dynamic models. In part, the other commands achieve things that my command cannot deliver, but mine also adds some flexibility that the others do not offer. However, I want to emphasize that it is not my intention to introduce this new command as a competitor for the existing ones. The re-implementation of these GMM estimators was simply a necessary requirement to achieve the above-mentioned standard-error correction.

    The new command is currently only available for installation from my own website and not yet from SSC:
    Code:
    . net install xtseqreg, from(http://www.kripfganz.de/stata/)
    After the installation, detailed documentation of the syntax and available options can be found in the help files:
    Code:
    . help xtseqreg
    . help xtseqreg postestimation
    As always, comments and suggestions are welcome and highly appreciated.

    Here is a brief example for a two-stage estimation of a dynamic Mincer equation. At the first stage, the log-wages are regressed on the time-varying regressors. The estimator is a two-step difference-GMM estimator (Arellano/Bond) with collapsed GMM-type instruments for the 2 lags of the dependent variable, standard instruments for the strictly exogenous regressors, and Windmeijer-corrected robust standard errors.
    Code:
    . xtseqreg L(0/2).lwage exp exp2 occ ind union, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp exp2 occ ind union, difference model(difference)) twostep vce(robust)
    
    Group variable: id                           Number of obs         =      2975
    Time variable: t                             Number of groups      =       595
    
                                                 Obs per group:    min =         5
                                                                   avg =         5
                                                                   max =         5
    
                                                 Number of instruments =        10
    
                                         (Std. Err. adjusted for clustering on id)
    ------------------------------------------------------------------------------
                 |              WC-Robust
           lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           lwage |
             L1. |    .365887   .1722314     2.12   0.034     .0283197    .7034543
             L2. |   .1009276   .0732219     1.38   0.168    -.0425848    .2444399
                 |
             exp |   .0501576   .0282205     1.78   0.076    -.0051536    .1054688
            exp2 |   -.000206    .000148    -1.39   0.164     -.000496     .000084
             occ |  -.0428486   .0283624    -1.51   0.131    -.0984379    .0127406
             ind |   .0481791   .0305408     1.58   0.115    -.0116798     .108038
           union |    .006991   .0288093     0.24   0.808    -.0494742    .0634562
           _cons |   2.737719   1.088102     2.52   0.012     .6050775     4.87036
    ------------------------------------------------------------------------------
    With the following syntax, we can then run a second-stage instrumental-variables regression of the first-stage residuals on some time-invariant regressors. The first-stage results are automatically taken from the previous estimation. Just as an illustration, ed is assumed to be endogenous and instrumented with occ.
    Code:
    . xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, iv(occ fem blk, model(level)) vce(robust)
    
    Group variable: id                           Number of obs         =      2975
    Time variable: t                             Number of groups      =       595
    
    ------------------------------------------------------------------------------
    Equation _first                              Equation _second
    Number of obs         =      2975            Number of obs         =      2975
    Number of groups      =       595            Number of groups      =       595
    
    Obs per group:    min =         5            Obs per group:    min =         5
                      avg =         5                              avg =         5
                      max =         5                              max =         5
    
    Number of instruments =        10            Number of instruments =         4
    
                                         (Std. Err. adjusted for clustering on id)
    ------------------------------------------------------------------------------
                 |               Robust
           lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    _first       |
           lwage |
             L1. |    .365887   .1722314     2.12   0.034     .0283197    .7034543
             L2. |   .1009276   .0732219     1.38   0.168    -.0425848    .2444399
                 |
             exp |   .0501576   .0282205     1.78   0.076    -.0051536    .1054688
            exp2 |   -.000206    .000148    -1.39   0.164     -.000496     .000084
             occ |  -.0428486   .0283624    -1.51   0.131    -.0984379    .0127406
             ind |   .0481791   .0305408     1.58   0.115    -.0116798     .108038
           union |    .006991   .0288093     0.24   0.808    -.0494742    .0634562
           _cons |   2.737719   1.088102     2.52   0.012     .6050775     4.87036
    -------------+----------------------------------------------------------------
    _second      |
              ed |   .0634885   .0348497     1.82   0.068    -.0048158    .1317927
             fem |  -.0967082   .0575629    -1.68   0.093    -.2095295     .016113
             blk |  -.1531252   .1010073    -1.52   0.130     -.351096    .0448456
           _cons |  -.7936727   .4419754    -1.80   0.073    -1.659929    .0725831
    ------------------------------------------------------------------------------
    Instead of specifying both stages one after the other, with some more complicated syntax the same results can also be obtained with a single command line:
    Code:
    . xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse equation(#1)) iv(exp exp2 occ ind union, difference model(difference) equation(#1)) iv(occ fem blk, model(level) equation(#2)) twostep vce(robust) both
    As a postestimation command, estat overid provides Hansen's J-test for the validity of the overidentifying restrictions for both stages. (In the current example, the second stage is exactly identified.)
    Code:
    . estat overid
    
    Hansen's J-test for equation _first                    chi2(2)     =    0.2935
    H0: overidentifying restrictions are valid             Prob > chi2 =    0.8635
    
    Hansen's J-test for equation _second                   chi2(0)     =    0.0000
    note: coefficients are exactly identified              Prob > chi2 =         .
    The following command line exactly replicates the above results for the first stage with xtabond2, including Hansen's J-test:
    Code:
    . xtabond2 L(0/2).lwage exp exp2 occ ind union, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp exp2 occ ind union, equation(diff)) twostep robust
    Notice that the reported results for Hansen's J-test would differ between xtseqreg and xtabond2 if the one-step GMM estimator was used (the above example without option twostep) because xtabond2 silently still estimates the two-step estimator for this purpose while xtseqreg evaluates the first-step moment functions while still using an optimal weighting matrix (that would have been used in a second step).

    Finally, also notice that xtabond2 might in some situations report an incorrect number of instruments because it does not always detect a linear relationship between instruments specified for the first-differenced and those for the levels model. This can happen in particular with time dummies if they are specified for both the first-differenced and the levels model (which is something that actually should not be done). In the following example (output omitted), xtabond2 reports 17 instruments, while xtseqreg obtains the identical result but correctly reports only 13 instruments. (This can happen when xtabond2 is used with either the option h(1) or h(2). It does not happen with the default option h(3). The default weighting matrix of xtseqreg corresponds to h(2) of xtabond2.) This has an important consequence because the degrees of freedom used for Hansen's J-test depend on the number of instruments. Hence, the reported J-test by xtabond2 might be misleading.
    Code:
    . xtseqreg L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp2 occ ind union, difference model(difference)) iv(tdum4-tdum7, model(diff)) iv(tdum4-tdum7, model(level)) twostep vce(robust)
    . xtabond2 L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp2 occ ind union, equation(diff)) ivstyle(tdum4-tdum7, equation(diff)) ivstyle(tdum4-tdum7, equation(level)) twostep robust h(2)
    You can read more about this last observation in another Statalist topic: System GMM - Time Dummies.

    Reference:
    • Kripfganz, S., and C. Schwarz (2015). Estimation of linear dynamic panel data models with time-invariant regressors. ECB Working Paper 1838, European Central Bank.

  • Dario Maimone Ansaldo Patti
    replied
    Sebastian Kripfganz Solved!! In xtdpdqml I used time2-time19, while in xtseqreg I used time3-time19. Estimates are identical. Just for someone that may encounter a similar problem. I guess you are more expert than me to understand why it works in this way. Thanks for your great help through all my problems.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    I suspect that something odd is happening due to the severely unbalanced nature of the panel. The QML estimator is probably not ideal under these circumstances.

    Leave a comment:


  • Dario Maimone Ansaldo Patti
    replied
    Sebastian Kripfganz indeed the problem is with the time dummies, as everything runs smoothly if I remove them at all.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    I am afraid in that case I am out of ideas.

    Leave a comment:


  • Dario Maimone Ansaldo Patti
    replied
    Sebastian Kripfganz Thanks. I tried but still I get the same error message. Apparently in the estimation only time3 to time19 are included. I tried using time3-time19 and I also listed all the included time dummies but the result does not change.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    That is probably because of the omitted time dummies. Try replacing time* in the xtseqreg command line with the unabreviated list of non-omitted dummies.

    Leave a comment:


  • Dario Maimone Ansaldo Patti
    replied
    Hi All,

    I am afraid I still have an issue. I run the following commands as indicated in the last threads:

    Code:
    xtdpdqml investment l.mtb time*, fe mlparams vce(robust)
    
    note: time1 omitted because of collinearity.
    note: time20 omitted because of collinearity.
    note: 824 groups are dropped due to gaps or insufficient number of observations
    note: time2 omitted because of collinearity
    
    Quasi-maximum likelihood estimation
    Iteration 0:  f(p) =  30369.288  
    Iteration 1:  f(p) =  30390.753  
    Iteration 2:  f(p) =   30439.82  
    Iteration 3:  f(p) =  30444.373  
    Iteration 4:  f(p) =  30444.472  
    Iteration 5:  f(p) =  30444.472  
    
    Group variable: id                           Number of obs         =     13898
    Time variable: year                          Number of groups      =      1245
    
    Fixed effects                                Obs per group:    min =         2
                                                                   avg =  11.16305
                                                                   max =        18
                                         (Std. err. adjusted for clustering on id)
    ------------------------------------------------------------------------------
                 |               Robust
    D.investment | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    _model       |
      investment |
             LD. |   .4813976   .0250801    19.19   0.000     .4322415    .5305538
                 |
             mtb |
             LD. |   .0036694   .0004569     8.03   0.000      .002774    .0045649
    
    est store qml
    Then i run:

    Code:
    xtseqreg investment (L.investment l.mtb time*) l.efi, first(qml, equation(#1) nocons)
    
    option first() incorrectly specified -- variable names do not match
    r(322); t=0.10 13:44:48
    Which is the reason of the error message above? I cannot figure it out.

    Thanks for your suggestions.

    Dario
    Last edited by Dario Maimone Ansaldo Patti; 22 Jun 2023, 06:01.

    Leave a comment:


  • Dario Maimone Ansaldo Patti
    replied
    Sebastian Kripfganz Thanks a lot!!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Indeed, xtdpdbc cannot (currently) be used with xtseqreg. (It is on my to-do list, but not as a matter of urgency.) You can use my xtdpdqml command instead, which performs ML estimation of linear dynamic panel data models and has been shown to perform very similar to xtdpdbc. Example code:
    Code:
    xtdpdqml lwage wks south smsa ms exp exp2 occ ind union, fe mlparams
    estimates store qml
    xtseqreg lwage (L.lwage wks south smsa ms exp exp2 occ ind union) fem blk ed, first(qml, equation(#1) noconstant)

    Leave a comment:


  • Dario Maimone Ansaldo Patti
    replied
    Sebastian Kripfganz thanks a lot! Last question and I would not bother you anymore. I tried to use xtdpdbc to correct the bias in the dynamic estimation. If i am correct,i cannot use it before xtseqreg. Is there any way to replicate the estimation of xtdpdbc maybe using xtdpdgmm?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    You xtseqreg specification is just a pooled OLS estimator. You cannot directly replicate the FE estimator with xtseqreg (unless you explicitly throw in all the group-specific dummy variables). However, you can use the xtdpdgmm command for this task. Example:
    Code:
    . webuse psidextract
    
    . xtdpdgmm lwage wks south smsa ms exp exp2 occ ind union, model(mdev) norescale iv(wks south smsa ms exp exp2 occ ind union) auxiliary
    note: conventional one-step standard errors may not be valid
    
    Generalized method of moments estimation
    
    Fitting full model:
    Step 1         f(b) =  9.041e-24
    
    Group variable: id                           Number of obs         =      4165
    Time variable: t                             Number of groups      =       595
    
    Moment conditions:     linear =      10      Obs per group:    min =         7
                        nonlinear =       0                        avg =         7
                            total =      10                        max =         7
    
    ------------------------------------------------------------------------------
           lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            /wks |   .0008359   .0005989     1.40   0.163    -.0003379    .0020098
          /south |  -.0018612    .034256    -0.05   0.957    -.0690018    .0652794
           /smsa |  -.0424691   .0194039    -2.19   0.029       -.0805   -.0044383
             /ms |  -.0297259   .0189596    -1.57   0.117     -.066886    .0074343
            /exp |   .1132083   .0024679    45.87   0.000     .1083712    .1180453
           /exp2 |  -.0004184   .0000545    -7.67   0.000    -.0005252   -.0003115
            /occ |  -.0214765   .0137663    -1.56   0.119    -.0484579     .005505
            /ind |   .0192101   .0154268     1.25   0.213    -.0110259    .0494461
          /union |   .0327849    .014904     2.20   0.028     .0035735    .0619962
          /_cons |   4.648767   .0459639   101.14   0.000      4.55868    4.738855
    ------------------------------------------------------------------------------
    Instruments corresponding to the linear moment conditions:
     1, model(mdev):
       wks south smsa ms exp exp2 occ ind union
     2, model(level):
       _cons
    
    . estimates store first
    
    . xtseqreg lwage (wks south smsa ms exp exp2 occ ind union) fem blk ed, first(first, copy)
    
    Group variable: id                           Number of obs         =      4165
    Time variable: t                             Number of groups      =       595
    
    ------------------------------------------------------------------------------
    Equation _first                              Equation _second
    Number of obs         =      4165            Number of obs         =      4165
    Number of groups      =       595            Number of groups      =       595
    
    Obs per group:    min =         7            Obs per group:    min =         7
                      avg =         7                              avg =         7
                      max =         7                              max =         7
    
                                                 Number of instruments =         4
    
    ------------------------------------------------------------------------------
           lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    _first       |
             wks |   .0008359   .0005989     1.40   0.163    -.0003379    .0020098
           south |  -.0018612    .034256    -0.05   0.957    -.0690018    .0652794
            smsa |  -.0424691   .0194039    -2.19   0.029       -.0805   -.0044383
              ms |  -.0297259   .0189596    -1.57   0.117     -.066886    .0074343
             exp |   .1132083   .0024679    45.87   0.000     .1083712    .1180453
            exp2 |  -.0004184   .0000545    -7.67   0.000    -.0005252   -.0003115
             occ |  -.0214765   .0137663    -1.56   0.119    -.0484579     .005505
             ind |   .0192101   .0154268     1.25   0.213    -.0110259    .0494461
           union |   .0327849    .014904     2.20   0.028     .0035735    .0619962
           _cons |   4.648767   .0459639   101.14   0.000      4.55868    4.738855
    -------------+----------------------------------------------------------------
    _second      |
             fem |  -.1300288   .0511237    -2.54   0.011    -.2302295   -.0298281
             blk |  -.2750723   .0593419    -4.64   0.000    -.3913803   -.1587644
              ed |   .1443834   .0057975    24.90   0.000     .1330206    .1557462
           _cons |  -1.820138   .0767924   -23.70   0.000    -1.970648   -1.669628
    ------------------------------------------------------------------------------
    Last edited by Sebastian Kripfganz; 21 Jun 2023, 05:47.

    Leave a comment:


  • Dario Maimone Ansaldo Patti
    replied
    Or maybe it is because the group fixed effects are removed in the first stage and the model is estimated using simply an OLS?

    Leave a comment:


  • Dario Maimone Ansaldo Patti
    replied
    Or maybe it is because the group fixed effects are removed in the first stage and the model is estimated using simply an OLS?

    Leave a comment:


  • Dario Maimone Ansaldo Patti
    replied
    Dear All,

    I read in one of the thread above (but I cannot find it anymore) that it is possible to reproduce fixed effects panel data model using xtseqreg, which then I can use with first option to estimate the time-invariant regressor. Suppose I estimate the following:

    Code:
    xtreg investment l.investment l.mtb time*, fe r
    where time* is a set of time dummies.

    I got the following:

    Code:
    investment | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      investment |
             L1. |   .3639427   .0172223    21.13   0.000     .3301698    .3977155
                 |
             mtb |
             L1. |   .0029748   .0004972     5.98   0.000     .0019998    .0039497
                 |
           time1 |          0  (omitted)
           time2 |   .0021145   .0011499     1.84   0.066    -.0001404    .0043694
           time3 |  -.0038704   .0011685    -3.31   0.001    -.0061618   -.0015789
           time4 |  -.0023118    .001148    -2.01   0.044    -.0045629   -.0000606
           time5 |   .0015994   .0011059     1.45   0.148    -.0005692     .003768
           time6 |   .0036055   .0010818     3.33   0.001     .0014841    .0057268
           time7 |    .006171    .001096     5.63   0.000     .0040218    .0083203
           time8 |   .0061567   .0011815     5.21   0.000     .0038397    .0084736
           time9 |   .0073502   .0011562     6.36   0.000     .0050829    .0096174
          time10 |  -.0072064   .0011901    -6.06   0.000    -.0095403   -.0048726
          time11 |   .0006876   .0010464     0.66   0.511    -.0013644    .0027396
          time12 |   .0061671    .001007     6.12   0.000     .0041924    .0081418
          time13 |   .0066317   .0010373     6.39   0.000     .0045974    .0086659
          time14 |    .004543   .0009889     4.59   0.000     .0026038    .0064822
          time15 |   .0061171   .0009873     6.20   0.000      .004181    .0080533
          time16 |    .004222   .0009958     4.24   0.000     .0022693    .0061748
          time17 |     .00086   .0009237     0.93   0.352    -.0009515    .0026715
          time18 |   .0029557   .0009289     3.18   0.001     .0011342    .0047773
          time19 |   .0041424   .0008427     4.92   0.000     .0024898     .005795
          time20 |          0  (omitted)
           _cons |   .0217482   .0013653    15.93   0.000     .0190709    .0244255
    -------------+----------------------------------------------------------------
         sigma_u |  .03176246
         sigma_e |  .02558575
             rho |  .60647023   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    Then I try to replicate the same estimation using xtseqreg:

    Code:
    xtseqreg investment l.investment l.mtb time*, vce(robust)
    
               |               Robust
      investment | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      investment |
             L1. |    .806008   .0105316    76.53   0.000     .7853665    .8266496
                 |
             mtb |
             L1. |   .0010061   .0003902     2.58   0.010     .0002413     .001771
                 |
           time1 |          0  (omitted)
           time2 |          0  (omitted)
           time3 |  -.0048769   .0014796    -3.30   0.001    -.0077769    -.001977
           time4 |  -.0012647   .0012213    -1.04   0.300    -.0036584    .0011291
           time5 |   .0038623   .0012063     3.20   0.001     .0014979    .0062267
           time6 |    .004198    .001179     3.56   0.000     .0018872    .0065089
           time7 |   .0049791   .0011667     4.27   0.000     .0026925    .0072658
           time8 |   .0032672   .0012606     2.59   0.010     .0007965    .0057378
           time9 |   .0039502   .0013825     2.86   0.004     .0012406    .0066598
          time10 |  -.0121011   .0013418    -9.02   0.000     -.014731   -.0094713
          time11 |   .0030101    .001204     2.50   0.012     .0006504    .0053699
          time12 |   .0076012   .0011493     6.61   0.000     .0053487    .0098537
          time13 |   .0045574     .00116     3.93   0.000     .0022838    .0068309
          time14 |   .0016179   .0012408     1.30   0.192    -.0008139    .0040498
          time15 |   .0037481   .0012114     3.09   0.002     .0013737    .0061225
          time16 |   .0004825   .0012314     0.39   0.695    -.0019309    .0028959
          time17 |  -.0028496   .0012576    -2.27   0.023    -.0053145   -.0003848
          time18 |   .0023119   .0011754     1.97   0.049     8.17e-06    .0046157
          time19 |   .0037335   .0011869     3.15   0.002     .0014073    .0060598
          time20 |  -.0017875   .0011705    -1.53   0.127    -.0040816    .0005066
           _cons |   .0061575   .0010938     5.63   0.000     .0040137    .0083013
    ------------------------------------------------------------------------------
    The results are totally different. I am probably missing something....but what? Any suggestion would be highly appreciated.

    Thanks,

    Dario

    Leave a comment:

Working...
X