Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • XTSEQREG: new Stata command for sequential / two-stage (GMM) estimation of linear panel models

    Dear Statalisters,

    I have just released a new Stata command for the estimation of linear panel data models. The main purpose of the xtseqreg command is the implementation of the two-stage estimation procedure described in my working paper with Claudia Schwarz in the context of linear (dynamic) panel data models with time-invariant regressors. In that paper, we suggest to run in a first stage a regression of the dependent variable on the time-varying regressors only, and to subsequently regress the first-stage residuals on the time-invariant regressors in a second stage. Instruments can be used at both stages and efficient estimation can be achieved with two-step GMM. At the second stage, the usual standard errors are invalid and need to be corrected. The respective analytical standard-error correction is the main purpose of this new command. For full details about the methodology and its benefits, please have a look at the paper.

    Yet, the new command itself is much more flexible because it can also be used for IV/GMM estimation of a single stage only. It then mimics (part of) the behavior of existing commands for instrumental variable and GMM estimation of linear panel data models, in particular xtdpd and xtabond2 in the context of dynamic models. In part, the other commands achieve things that my command cannot deliver, but mine also adds some flexibility that the others do not offer. However, I want to emphasize that it is not my intention to introduce this new command as a competitor for the existing ones. The re-implementation of these GMM estimators was simply a necessary requirement to achieve the above-mentioned standard-error correction.

    The new command is currently only available for installation from my own website and not yet from SSC:
    Code:
    . net install xtseqreg, from(http://www.kripfganz.de/stata/)
    After the installation, detailed documentation of the syntax and available options can be found in the help files:
    Code:
    . help xtseqreg
    . help xtseqreg postestimation
    As always, comments and suggestions are welcome and highly appreciated.

    Here is a brief example for a two-stage estimation of a dynamic Mincer equation. At the first stage, the log-wages are regressed on the time-varying regressors. The estimator is a two-step difference-GMM estimator (Arellano/Bond) with collapsed GMM-type instruments for the 2 lags of the dependent variable, standard instruments for the strictly exogenous regressors, and Windmeijer-corrected robust standard errors.
    Code:
    . xtseqreg L(0/2).lwage exp exp2 occ ind union, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp exp2 occ ind union, difference model(difference)) twostep vce(robust)
    
    Group variable: id                           Number of obs         =      2975
    Time variable: t                             Number of groups      =       595
    
                                                 Obs per group:    min =         5
                                                                   avg =         5
                                                                   max =         5
    
                                                 Number of instruments =        10
    
                                         (Std. Err. adjusted for clustering on id)
    ------------------------------------------------------------------------------
                 |              WC-Robust
           lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           lwage |
             L1. |    .365887   .1722314     2.12   0.034     .0283197    .7034543
             L2. |   .1009276   .0732219     1.38   0.168    -.0425848    .2444399
                 |
             exp |   .0501576   .0282205     1.78   0.076    -.0051536    .1054688
            exp2 |   -.000206    .000148    -1.39   0.164     -.000496     .000084
             occ |  -.0428486   .0283624    -1.51   0.131    -.0984379    .0127406
             ind |   .0481791   .0305408     1.58   0.115    -.0116798     .108038
           union |    .006991   .0288093     0.24   0.808    -.0494742    .0634562
           _cons |   2.737719   1.088102     2.52   0.012     .6050775     4.87036
    ------------------------------------------------------------------------------
    With the following syntax, we can then run a second-stage instrumental-variables regression of the first-stage residuals on some time-invariant regressors. The first-stage results are automatically taken from the previous estimation. Just as an illustration, ed is assumed to be endogenous and instrumented with occ.
    Code:
    . xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, iv(occ fem blk, model(level)) vce(robust)
    
    Group variable: id                           Number of obs         =      2975
    Time variable: t                             Number of groups      =       595
    
    ------------------------------------------------------------------------------
    Equation _first                              Equation _second
    Number of obs         =      2975            Number of obs         =      2975
    Number of groups      =       595            Number of groups      =       595
    
    Obs per group:    min =         5            Obs per group:    min =         5
                      avg =         5                              avg =         5
                      max =         5                              max =         5
    
    Number of instruments =        10            Number of instruments =         4
    
                                         (Std. Err. adjusted for clustering on id)
    ------------------------------------------------------------------------------
                 |               Robust
           lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    _first       |
           lwage |
             L1. |    .365887   .1722314     2.12   0.034     .0283197    .7034543
             L2. |   .1009276   .0732219     1.38   0.168    -.0425848    .2444399
                 |
             exp |   .0501576   .0282205     1.78   0.076    -.0051536    .1054688
            exp2 |   -.000206    .000148    -1.39   0.164     -.000496     .000084
             occ |  -.0428486   .0283624    -1.51   0.131    -.0984379    .0127406
             ind |   .0481791   .0305408     1.58   0.115    -.0116798     .108038
           union |    .006991   .0288093     0.24   0.808    -.0494742    .0634562
           _cons |   2.737719   1.088102     2.52   0.012     .6050775     4.87036
    -------------+----------------------------------------------------------------
    _second      |
              ed |   .0634885   .0348497     1.82   0.068    -.0048158    .1317927
             fem |  -.0967082   .0575629    -1.68   0.093    -.2095295     .016113
             blk |  -.1531252   .1010073    -1.52   0.130     -.351096    .0448456
           _cons |  -.7936727   .4419754    -1.80   0.073    -1.659929    .0725831
    ------------------------------------------------------------------------------
    Instead of specifying both stages one after the other, with some more complicated syntax the same results can also be obtained with a single command line:
    Code:
    . xtseqreg lwage (L(1/2).lwage exp exp2 occ ind union) ed fem blk, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse equation(#1)) iv(exp exp2 occ ind union, difference model(difference) equation(#1)) iv(occ fem blk, model(level) equation(#2)) twostep vce(robust) both
    As a postestimation command, estat overid provides Hansen's J-test for the validity of the overidentifying restrictions for both stages. (In the current example, the second stage is exactly identified.)
    Code:
    . estat overid
    
    Hansen's J-test for equation _first                    chi2(2)     =    0.2935
    H0: overidentifying restrictions are valid             Prob > chi2 =    0.8635
    
    Hansen's J-test for equation _second                   chi2(0)     =    0.0000
    note: coefficients are exactly identified              Prob > chi2 =         .
    The following command line exactly replicates the above results for the first stage with xtabond2, including Hansen's J-test:
    Code:
    . xtabond2 L(0/2).lwage exp exp2 occ ind union, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp exp2 occ ind union, equation(diff)) twostep robust
    Notice that the reported results for Hansen's J-test would differ between xtseqreg and xtabond2 if the one-step GMM estimator was used (the above example without option twostep) because xtabond2 silently still estimates the two-step estimator for this purpose while xtseqreg evaluates the first-step moment functions while still using an optimal weighting matrix (that would have been used in a second step).

    Finally, also notice that xtabond2 might in some situations report an incorrect number of instruments because it does not always detect a linear relationship between instruments specified for the first-differenced and those for the levels model. This can happen in particular with time dummies if they are specified for both the first-differenced and the levels model (which is something that actually should not be done). In the following example (output omitted), xtabond2 reports 17 instruments, while xtseqreg obtains the identical result but correctly reports only 13 instruments. (This can happen when xtabond2 is used with either the option h(1) or h(2). It does not happen with the default option h(3). The default weighting matrix of xtseqreg corresponds to h(2) of xtabond2.) This has an important consequence because the degrees of freedom used for Hansen's J-test depend on the number of instruments. Hence, the reported J-test by xtabond2 might be misleading.
    Code:
    . xtseqreg L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmiv(L.lwage, model(difference) lagrange(1 4) collapse) iv(exp2 occ ind union, difference model(difference)) iv(tdum4-tdum7, model(diff)) iv(tdum4-tdum7, model(level)) twostep vce(robust)
    . xtabond2 L(0/2).lwage exp2 occ ind union tdum4-tdum7, gmmstyle(L.lwage, equation(diff) laglimits(1 4) collapse) ivstyle(exp2 occ ind union, equation(diff)) ivstyle(tdum4-tdum7, equation(diff)) ivstyle(tdum4-tdum7, equation(level)) twostep robust h(2)
    You can read more about this last observation in another Statalist topic: System GMM - Time Dummies.

    Reference:
    • Kripfganz, S., and C. Schwarz (2015). Estimation of linear dynamic panel data models with time-invariant regressors. ECB Working Paper 1838, European Central Bank.

  • Debajyoti Chakrabarty
    replied
    Hi Sebastian,
    Okay, thank you for your quick response.
    Regards,
    Debajyoti

    Leave a comment:


  • Sebastian Kripfganz
    replied
    It depends how you use it. As long as you make sure that the number of instruments does not become too large, it should not be a problem.

    Leave a comment:


  • Debajyoti Chakrabarty
    replied
    Hi Sebastian,
    I wanted to ask if xtseqreg can be used for large N and large T panels.
    Regards,
    Debajyoti

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Yes, that is absolutely fine.

    Leave a comment:


  • Jerry Kim
    replied
    Dear Sebastian,

    Thanks a lot ! However if the time invariant regressor maybe correlated with the unobserved effect alpha_i, can I still use this method although with a different IV that is uncorrelated with alpha_i ? The code then should be as follows where third_party_IV is a different IV for sftlen, the time-invariant regressor.

    Code:
    eststo md_diffgmm_sl2, title("Estimator: FD-GMM including shift length"): ///
        qui xtdpdgmm s_it s_itlag1 eta_it elective ///
        afterchangeseq beforecancel age i.prcdr2 i.bin2hr ///
        sftlen, collapse model(diff) ///
        gmm(s_itlag1, lag(1 2)) gmm(eta_it, lag(2 3)) ///
        gmm(elective beforecancel afterchangeseq age i.prcdr2 i.bin2hr, lag(0 1)) ///
        iv(third_party_IV, model(level)) nocons two vce(clu surgeon2)
    Last edited by Jerry Kim; 09 Oct 2021, 17:52.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    In your case, where all instruments for the time-varying regressors are specified for the first-differenced model, and the coefficients of the time-invariant regressors are just-identified (i.e. there are as many instruments as time-invariant regressors), the two-stage approach with xtseqreg is not needed. The presence of the time-invariant regressors in this case does not affect the estimates of the coefficients for the time-varying regressors, and a second-stage regression would not yield any improvement. This is discussed in more technical terms in Appendix C.4 of the Supplementary Appendix for our JAE article: So, yes, your approach is absolutely fine. Just keep in mind that your identifying assumption for the coefficients of the time-invariant regressors is that those are uncorrelated with the unobserved individual/group-specific effects.

    Leave a comment:


  • Jerry Kim
    replied
    Dear Sebastian,

    The versions are all up-to-date as you indicated.

    However, I read again your slides about
    Code:
    xtdpdgmm
    package in 2019 conference (the longer and more detailed one) and on page 86 "Estimation with time-invariant regressors in Stata". It shows the estimation with exogenous industry dummy (quote as follows):

    Code:
    xtdpdgmm L(0/1).n w k i.ind, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) ///
    > iv(i.ind, model(level)) nl(noserial) teffects igmm vce(r)
    (Some output omitted)
    Instruments corresponding to the linear moment conditions:
    1, model(diff):
    L2.n L3.n L4.n
    2, model(diff):
    L1.w L2.w L3.w L1.k L2.k L3.k
    3, model(level):
    2bn.ind 3.ind 4.ind 5.ind 6.ind 7.ind 8.ind 9.ind
    4, model(level):
    1978bn.year 1979.year 1980.year 1981.year 1982.year 1983.year 1984.year
    5, model(level):
    _cons
    I am now trying to use the following code and everything seems right now.

    Code:
    eststo md_diffgmm_sl2: ///
        qui xtdpdgmm s_it s_itlag1 eta_it elective ///
        afterchangeseq beforecancel age i.prcdr2 i.bin2hr ///
        sftlen, collapse model(diff) ///
        gmm(s_itlag1, lag(1 2)) gmm(eta_it, lag(2 3)) ///
        gmm(elective beforecancel afterchangeseq age i.prcdr2 i.bin2hr, lag(0 1)) ///
        iv(sftlen, model(level)) nocons two vce(clu surgeon2)
    Is this way okay for estimation of time-invariant variables coefficients by just using
    Code:
    xtdpdgmm
    ?
    Thank you!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Please check whether you have the latest version of both packages:
    Code:
    which xtdpdgmm
    which xtseqreg
    These should be 2.3.9 for xtdpdgmm and 1.2.4 for xtseqreg. If necessary, please update the commands:
    Code:
    net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace
    net install xtseqreg, from(http://www.kripfganz.de/stata/) replace
    If the problem persists with the latest versions, I am afraid I would need to see the actual data you used to replicate the problem. If you are able to share the data, please can you send it to me by e-mail?

    Leave a comment:


  • Jerry Kim
    replied
    Dear Sebastian,

    Thanks for the suggesion! I reivsed the code however this time there is another error as follow:

    Code:
    not sorted
    r(5);
    Is it because the data not sorted by certain sequence?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    The problem is with the nocons option. Since the first stage does not contain a constant, you need to specify first(md_gmm_s1, copy nocons). You then probably want to include a constant in your second stage, so do not specify nocons at the end of the command line, i.e.
    Code:
    xtseqreg s_it (s_itlag1 eta_it elective afterchangeseq beforecancel ///
        age) sftlen surgnum, first(md_gmm_s1, copy nocons) ///
        iv(sftlen surgnum) vce(clu surgeon2)

    Leave a comment:


  • Jerry Kim
    replied
    Dear Sebastian,

    Thanks for the reply. However I encountered the error as follows:
    Code:
    option first() incorrectly specified
    r(322);
    My first stage code is:
    Code:
    eststo md_gmm_s1: xtdpdgmm s_it s_itlag1 eta_it elective ///
        afterchangeseq beforecancel age, collapse model(diff) ///
        gmm(s_itlag1, lag(1 2)) gmm(eta_it, lag(2 3)) ///
        gmm(elective sftlen beforecancel afterchangeseq age, lag(0 1)) ///
        nocons two vce(clu surgeon2) auxiliary
    and the output in stata is like this:

    Code:
    Generalized method of moments estimation
    
    Fitting full model:
    Step 1         f(b) =  5.2919369
    Step 2         f(b) =   .0051382
    
    Group variable: sftidx                       Number of obs         =      8961
    Time variable: newseq                        Number of groups      =      2055
    
    Moment conditions:     linear =      13      Obs per group:    min =         2
    nonlinear =       0                        avg =  4.360584
    total =      13                        max =         9
    
    
    s_it       Coef.   Std. Err.      z    P>z     [95% Conf. Interval]
    
    /s_itlag1    .9226237   .1198261     7.70   0.000     .6877688    1.157479
    /eta_it    .8967318   .1621367     5.53   0.000     .5789497    1.214514
    /elective   -.8163833   3.944977    -0.21   0.836    -8.548397     6.91563
    /afterchange~q   -.1239053   2.830054    -0.04   0.965    -5.670709    5.422898
    /beforecancel    3.123461   4.865123     0.64   0.521    -6.412005    12.65893
    /age    .3229168   .1191281     2.71   0.007     .0894301    .5564035
    
    Instruments corresponding to the linear moment conditions:
    1, model(diff):
    L1.s_itlag1 L2.s_itlag1
    2, model(diff):
    L2.eta_it L3.eta_it
    3, model(diff):
    elective L1.elective sftlen beforecancel L1.beforecancel afterchangeseq
    L1.afterchangeseq age L1.age
    My second stage code is:
    Code:
    xtseqreg s_it (s_itlag1 eta_it elective afterchangeseq beforecancel ///
        age ) sftlen surgnum, first(md_gmm_s1, copy) ///
        iv(sftlen surgnum) vce(clu surgeon2) nocons
    where the two variables
    Code:
    sftlen surgnum
    are two time-invariant variables.

    Could you please let me know the potential errors in my code? Thank you!
    Last edited by Jerry Kim; 05 Oct 2021, 18:04.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    You need to add the option auxiliary to your xtdpdgmm estimation. Then it should work.
    Code:
    xtdpdgmm ..., auxiliary ...
    eststo md_st1
    xtseqreg ..., first(md_st1, copy) ...

    Leave a comment:


  • Jerry Kim
    replied
    Dear Prof. Kripfganz

    Thanks for developing the xtseqreg command! I have a question regarding the storage of first stage's results. Can I utilize
    Code:
    eststo md_st1
    to store the first stage estimation results (using
    Code:
    xtdpdgmm
    ) and then set
    Code:
    md_st1
    as

    Code:
    first(md_st1,copy)
    in the
    Code:
    first(,)
    argument of the second stage's estimation?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    I have fixed a minor bug in xtseqreg that could result in an incorrect error message in rare instances when using option vce(cluster) on a subsample of the data. The new version 1.2.4 is available on my website:
    Code:
    net install xtseqreg, from(http://www.kripfganz.de/stata/) replace

    Leave a comment:

Working...
X