Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • XTDPDBC: new Stata command for bias-corrected estimation of linear dynamic panel data models

    Dear Statalisters,

    Linear dynamic panel data models are commonly estimated by GMM (for example with my command xtdpdgmm). Yet, when all regressors (besides the lagged dependent variable) are strictly exogenous, more efficient alternatives are available. Besides maximum likelihood estimation (for example with my command xtdpdqml), an estimator that directly corrects the dynamic panel data bias (a.k.a. Nickell bias) of the conventional fixed-effects (FE) estimator can be quite attractive because it typically retains the small variance of the FE estimator compared to GMM estimators.

    My new command, xtdpdbc, implements the bias-corrected method of moments estimator described by Breitung, Kripfganz, and Hayakawa (2021). It analytically corrects the first-order condition of the FE estimator, which leads to a set of nonlinear moment conditions that can be solved with conventional numerical methods (Gauss-Newton). Another advantage of this procedure is that a formula of the asymptotic variance-covariance matrix for the calculation of standard errors is readily available, unlike the bias-corrected estimator by Kiviet (1995) that is implemented in the community-contributed xtlsdvc command by Bruno (2005).

    Yet another advantage is that the estimator can accommodate higher-order lags of the dependent variable. Moreover, the moment conditions can be adjusted to create a random-effects (RE) version of the estimator, assuming that all (or some) of the exogenous regressors are uncorrelated with the unobserved group-specific effects. This RE version is not yet implemented in xtdpdbc, but will be added in due course.

    In turns out that under the FE assumption the bias-corrected method of moments estimator is equivalent to the Dhaene and Jochmans (2016) adjusted profile likelihood estimator. Furthermore, if there is only a single lag of the dependent variable, it is also equivalent to the bias-corrected estimator of Bun and Carree (2005).

    It should be noted that due to the nonlinearity of the bias-corrected moment functions, the estimator in general has multiple solutions and the numerical algorithm may not always converge to the correct one. The correct solution is characterized by a negativity condition on the gradient, i.e. all eigenvalues of the gradient should be negative. In the current version of xtdpdbc, the command will display a note if the gradient has positive eigenvalues. In that case, the estimation should be repeated with different starting values (using the from() option) until the correct solution is found. Starting values for the coefficient of the lagged dependent variable should typically be varied over the interval [0, 1]. Starting values for the exogenous regressors do not matter much.

    In some cases, the numerical algorithm might not converge due to an almost flat criterion function. In such a case, it might help to simplify the optimization problem by concentrating out the coefficients of the exogenous regressors with option concentration. If this does not help, formal convergence could possibly be achieved by declaring the option nonrtolerance. However, the results in that case might not be very robust.

    Last but not least, the command also supports unbalanced panel data.

    To install the command, type the following in Stata's command window:
    Code:
    net install xtdpdbc, from(http://www.kripfganz.de/stata/)
    Please see the help file for the fairly standard command syntax and the available options:
    Code:
    help xtdpdbc
    Here is an example with second-order autoregressive dynamics:
    Code:
    . webuse psidextract
    
    . xtdpdbc lwage wks south smsa ms exp exp2 occ ind union, lags(2)
    
    Bias-corrected estimation
    Iteration 0:   f(b) =  .00415219  
    Iteration 1:   f(b) =  7.766e-06  
    Iteration 2:   f(b) =  2.040e-09  
    Iteration 3:   f(b) =  2.132e-16  
    
    Group variable: id                           Number of obs         =      2975
    Time variable: t                             Number of groups      =       595
    
                                                 Obs per group:    min =         5
                                                                   avg =         5
                                                                   max =         5
    
    ------------------------------------------------------------------------------
           lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           lwage |
             L1. |   .2777891   .0713708     3.89   0.000     .1379049    .4176733
             L2. |   .0777857   .0411693     1.89   0.059    -.0029045     .158476
                 |
             wks |  -.0000815   .0014887    -0.05   0.956    -.0029992    .0028363
           south |   .0828634   .0950579     0.87   0.383    -.1034466    .2691735
            smsa |  -.0304335   .0293295    -1.04   0.299    -.0879182    .0270513
              ms |  -.0096381   .0294365    -0.33   0.743    -.0673326    .0480565
             exp |     .06042    .012486     4.84   0.000     .0359478    .0848921
            exp2 |  -.0002095   .0001089    -1.92   0.054    -.0004229    3.86e-06
             occ |   -.029654   .0222952    -1.33   0.183    -.0733517    .0140437
             ind |   .0189437    .025248     0.75   0.453    -.0305414    .0684289
           union |  -.0044655    .030205    -0.15   0.882    -.0636661    .0547351
           _cons |   3.283092   .5078034     6.47   0.000     2.287815    4.278368
    ------------------------------------------------------------------------------
    Any comments and suggestions are welcome.

    References:
    https://twitter.com/Kripfganz

  • #2
    Sebastian,

    Thanks for creating another new Stata command that I believe will gain wide interest and use.

    I have tried using xtdpdbc with a new project of mine that investigates the political determinants of various aspects of human rights in the developing world. However, I have run into a problem, one that I have encountered with other projects whenever I include a lagged dependent variable in my models, regardless of what methodology I use (e.g., xtreg, xtdpdgmm, xtdpdqml, and now, xtdpdbc). The problem is that adding one or more LDVs causes the independent variables of interest coefficients to decline to implausibly low levels (as compared to what I get without an LDV and what I expect from theory).

    I see from p. 91 of your 2019 London Stata Conference presentation explaining xtdpdgmm that you recommend adding one or more lags of the independent variables. Although this recommendation is for a specific reason (possible correlation between instrument lags and the error term), does this recommendation also hold for more general reasons when using xtdpdbc?

    I note that this journal article recommends trying one or more lags of independent variables when using LDVs to solve the problem I am describing: Wilkins, Arjun S. 2018. To Lag or Not to Lag?: Re-Evaluating the Use of Lagged Dependent Variables in Regression Analysis. Political Science Research and Methods 6(2): 393-411.

    I tried adding a first lag to each of my independent variables of interest and doing so brings the coefficients of the unlagged variables up to plausible levels. However, I am concerned about collinearity between the lagged and unlagged variables when I do this because the signs of the coefficients for the lagged variables flip to negative and I get very high VIF scores when I test for multicollinearity. I show below the regression results I get without and with lags for the independent variables and the VIF results.

    Code:
     xtdpdbc y x1 x2 x3  if y_l1~=. & y_l2~=. & x1_l1~=. & x2_l1~=. & x3_l1~=. & year>1979, fe lags(2)
    
    Bias-corrected estimation
    Iteration 0:   f(b) =  .00031196  
    Iteration 1:   f(b) =  5.466e-07  
    Iteration 2:   f(b) =  5.337e-12  
    Iteration 3:   f(b) =  5.415e-22  
    
    Group variable: ccode                        Number of obs         =      5088
    Time variable: year                          Number of groups      =       144
    
                                                 Obs per group:    min =         5
                                                                   avg =  35.33333
                                                                   max =        39
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               y |
             L1. |   .9120774   .0230568    39.56   0.000     .8668869    .9572678
             L2. |  -.0380304    .017669    -2.15   0.031     -.072661   -.0033999
                 |
              x1 |   .0170865   .0051434     3.32   0.001     .0070056    .0271674
              x2 |   .0552508   .0160587     3.44   0.001     .0237763    .0867252
              x3 |   .0622974   .0196529     3.17   0.002     .0237785    .1008163
           _cons |  -.0005421    .010707    -0.05   0.960    -.0215275    .0204432
    ------------------------------------------------------------------------------
    
    . xtdpdbc y x1 x1_l1 x2 x2_l1 x3 x3_l1 if y_l1~=. & y_l2~=. & x1_l1~=. & x2_l1~=. & x3_l1~=. & year>1979, fe lags(2)
    
    Bias-corrected estimation
    Iteration 0:   f(b) =  .00031894  
    Iteration 1:   f(b) =  4.215e-08  
    Iteration 2:   f(b) =  3.750e-13  
    
    Group variable: ccode                        Number of obs         =      5088
    Time variable: year                          Number of groups      =       144
    
                                                 Obs per group:    min =         5
                                                                   avg =  35.33333
                                                                   max =        39
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               y |
             L1. |   .9382101   .0209673    44.75   0.000      .897115    .9793052
             L2. |  -.0031981   .0184993    -0.17   0.863     -.039456    .0330599
                 |
              x1 |   .0460023   .0128886     3.57   0.000     .0207411    .0712635
           x1_l1 |  -.0371751   .0117679    -3.16   0.002    -.0602397   -.0141104
              x2 |   .2712156   .0439953     6.16   0.000     .1849864    .3574447
           x2_l1 |  -.2536297   .0419765    -6.04   0.000    -.3359021   -.1713573
              x3 |   .3128306    .056701     5.52   0.000     .2016987    .4239625
           x3_l1 |  -.2910625   .0517423    -5.63   0.000    -.3924755   -.1896496
           _cons |    .013027   .0058651     2.22   0.026     .0015317    .0245223
    ------------------------------------------------------------------------------
    
    
    
    collin y_l1 y_l2 x1 x1_l1 x2 x2_l1 x3 x3_l1 if year>1979 
    (obs=5,088)
    
      Collinearity Diagnostics
    
                            SQRT                   R-
      Variable      VIF     VIF    Tolerance    Squared
    ----------------------------------------------------
          y_l1     38.99    6.24    0.0256      0.9744
          y_l2     37.66    6.14    0.0266      0.9734
            x1     14.56    3.82    0.0687      0.9313
         x1_l1     14.62    3.82    0.0684      0.9316
            x2     31.44    5.61    0.0318      0.9682
         x2_l1     31.51    5.61    0.0317      0.9683
            x3     29.84    5.46    0.0335      0.9665
         x3_l1     29.84    5.46    0.0335      0.9665
    ----------------------------------------------------
      Mean VIF     28.56
    Based on correspondence I had with Arjun Wilkins, I calculated AIC and BIC scores for the models without and with lagged independent variables. I had to do this using xtreg, fe, because xtdpdbc would not produce results when I ran estat ic. I show the results below. As can be seen, the AIC and BIC scores become substantially better when I add the lagged independent variables. As I understand it, if the AIC and BIC scores improve with the lagged variables present, this indicates a better model fit, thereby lending confidence that the higher coefficients of the unlagged variables are real and not mere artifacts of multicollinearity. But I'm not certain of this and not yet confident that I can rely on the coefficient results I obtain after adding the lags.


    Code:
      qui xtreg y y_l1 y_l2 x1 x2 x3  if y_l1~=. & y_l2~=. & x1_l1~=. & x2_l1~=. & x3_l1~=. & year>1979, fe 
    
    . estat ic
    
    Akaike's information criterion and Bayesian information criterion
    
    -----------------------------------------------------------------------------
           Model |          N   ll(null)  ll(model)      df        AIC        BIC
    -------------+---------------------------------------------------------------
               . |      5,088   3782.729   9240.922       6  -18469.84  -18430.64
    -----------------------------------------------------------------------------
    Note: BIC uses N = number of observations. See [R] BIC note.
    
    . qui xtreg y y_l1 y_l2 x1 x1_l1 x2 x2_l1 x3 x3_l1 if y_l1~=. & y_l2~=. & x1_l1~=. & x2_l1~=. & x3_l1~=. & y
    > ear>1979, fe 
    
    . estat ic
    
    Akaike's information criterion and Bayesian information criterion
    
    -----------------------------------------------------------------------------
           Model |          N   ll(null)  ll(model)      df        AIC        BIC
    -------------+---------------------------------------------------------------
               . |      5,088   3782.729   9569.201       9   -19120.4  -19061.59
    -----------------------------------------------------------------------------
    Note: BIC uses N = number of observations. See [R] BIC note.
    I would appreciate any advice you, or anyone else on Statalist, can provide to me.

    Joe

    Comment


    • #3
      Originally posted by Joseph L. Staats View Post
      I have tried using xtdpdbc with a new project of mine that investigates the political determinants of various aspects of human rights in the developing world. However, I have run into a problem, one that I have encountered with other projects whenever I include a lagged dependent variable in my models, regardless of what methodology I use (e.g., xtreg, xtdpdgmm, xtdpdqml, and now, xtdpdbc). The problem is that adding one or more LDVs causes the independent variables of interest coefficients to decline to implausibly low levels (as compared to what I get without an LDV and what I expect from theory).
      The coefficients have a different interpretation in static models - without a lagged dependent variable - compared to dynamic models - with at least one lag of the dependent variable. In static models, they could be interpreted as long-run coefficients (although their estimates might suffer from omitted-variable bias due to the omitted LDV). In dynamic models, they would be interpreted as short-run coefficients. Corresponding long-run coefficients (which are typically again larger than the short-run coefficients) can be obtained as follows
      Code:
      nlcom (_b[x] + _b[L.x]) / (1 - _b[L.y] - _b[L2.y])
      Originally posted by Joseph L. Staats View Post
      I see from p. 91 of your 2019 London Stata Conference presentation explaining xtdpdgmm that you recommend adding one or more lags of the independent variables. Although this recommendation is for a specific reason (possible correlation between instrument lags and the error term), does this recommendation also hold for more general reasons when using xtdpdbc?
      For xtdpdbc, absence of serial correlation in the error term is similarly important as it is for xtdpdgmm. Otherwise, the lagged dependent variable would be correlated with the idiosyncratic error term. The bias correction does not correct for that. Adding further lags of the dependent variable (and the independent variables) aims to obtain a model that is "dynamically complete". With GMM, we could possibly still find instruments that are valid when the model is not dynamically complete, but the bias correction approach critically depends on this assumption.

      Originally posted by Joseph L. Staats View Post
      I note that this journal article recommends trying one or more lags of independent variables when using LDVs to solve the problem I am describing: Wilkins, Arjun S. 2018. To Lag or Not to Lag?: Re-Evaluating the Use of Lagged Dependent Variables in Regression Analysis. Political Science Research and Methods 6(2): 393-411.
      Including lags of independent variables serves the same purpose of adding further lags of the dependent variable: to obtain a dynamically complete model; in other words: to proxy for the serial correlation in the idiosyncratic error term.

      Originally posted by Joseph L. Staats View Post
      I tried adding a first lag to each of my independent variables of interest and doing so brings the coefficients of the unlagged variables up to plausible levels. However, I am concerned about collinearity between the lagged and unlagged variables when I do this because the signs of the coefficients for the lagged variables flip to negative and I get very high VIF scores when I test for multicollinearity. I show below the regression results I get without and with lags for the independent variables and the VIF results.
      Notice that the coefficient sum of the unlagged and lagged independent variables is of a similar magnitude as in the model without the added lags. Adding lags allows for richer short-term dynamics: You might have a stronger contemporaneous effect and a balancing delayed effect, but the combined effect is still similar to the case where you only allow for a contemporaneous effect. The key quantity of interest often remains the long-run effect as outlined above.

      Originally posted by Joseph L. Staats View Post
      Based on correspondence I had with Arjun Wilkins, I calculated AIC and BIC scores for the models without and with lagged independent variables. I had to do this using xtreg, fe, because xtdpdbc would not produce results when I ran estat ic. I show the results below. As can be seen, the AIC and BIC scores become substantially better when I add the lagged independent variables. As I understand it, if the AIC and BIC scores improve with the lagged variables present, this indicates a better model fit, thereby lending confidence that the higher coefficients of the unlagged variables are real and not mere artifacts of multicollinearity. But I'm not certain of this and not yet confident that I can rely on the coefficient results I obtain after adding the lags.
      Leaving aside the complications of obtaining AIC/BIC criteria after xtdpdbc, these criteria can be useful indeed to decide about the inclusion of lagged regressors. This has a long-standing tradition in time series econometrics. There are similar criteria available after GMM estimation (see estat mmsc after xtdpdgmm). Regarding the collinearity, you can think about it from a different perspective. You can obtain an equivalent regression by including D.x instead of L.x. This would merely be a different parameterization of the same model, but clearly there would be no concern about collinearity among D.x and x. Note: In this model, you would calculate the long-run coefficient only based on the coefficient of x, not D.x:
      Code:
      nlcom _b[x] / (1 - _b[L.y] - _b[L2.y])
      https://twitter.com/Kripfganz

      Comment


      • #4
        Thanks so much for responding in a such a clear and comprehensive way. I am now much better informed about using LDVs and the calculation of short- and long-term effects and feel much more confident going forward with my current project.

        In working with xtdpdbc, I notice that it (as well as xtdpdqml) doesn't allow use of factor-variable operators, whereas xtdpdgmm does. I often use factor-variable operators for handling interactions and creating higher-order variables, especially when using the margins command for various purposes. Is adding this feature to xtdpdbc at all feasible?

        Thanks again.

        Comment


        • #5
          Factor variables are a programmer's nightmare. This is why I usually do not bother with them in the early stages of a new package. Yet, motivated by your request, I have now implemented the support for factor variables in xtdpdbc. The latest update to version 1.1.0 also has the new option small for a small-sample degrees-of-freedom correction of the standard errors and the reporting of small-sample test statistics.

          Code:
          adoupdate xtdpdbc, update
          Please let me know if you observe any unexpected behavior or error messages when using the command with factor variables.
          https://twitter.com/Kripfganz

          Comment


          • #6
            I figured it probably wasn't that simple to add factor variables and applaud your willingness to do so for xtdpdbc. I just tried the new feature to create and then graph higher-order variables using margins and marginsplot and everything worked as it should. I'll let you know if I encounter any problems, but I don't expect I will.

            Comment


            • #7
              Hi Sebastian,
              I really like your command. As far as I know, there is no built in postestimation command for serial correlation in your code. Do you have any suggestion of which command would fit to test for it?
              Thanks a lot,
              Israel

              Comment


              • #8
                In theory, it would be possible to construct a serial correlation test in the spirit of the Arellano-Bond test. I might implement that for xtdpdbc at some point in the future.

                For the moment, if you are estimating a model with only 1 lag of the dependent variable, I would suggest to use my xtdpdqml command. The QML estimator implemented in that command has very similar properties to the BC estimator implemented in xtdpdbc. After xtdpdqml, you can use the Arellano-Bond test with estat serial.

                If you have higher-order lags of the dependent variable in your model, then you could estimate the model initially with a GMM estimator using my xtdpdgmm command, and again use estat serial for the Arellano-Bond test. When the test supports your specification, you could subsequently obtain the more efficient BC estimates with xtdpdbc.

                There are some other commands for alternative serial correlation tests, but those tests are usually derived under the assumption of strictly exogenous regressors, which excludes lags of the dependent variable.
                https://twitter.com/Kripfganz

                Comment


                • #9
                  Thanks a lot Sebastian,
                  Given that the the Arellano-Bond test after xtdpdqml is also testing autocorrelation of the first-differenced residuals, then first-order autocorrelation is expected (and necessary I suppose). The test is interpreted as in the context of GMM estimation. Is this correct?

                  Comment


                  • #10
                    Originally posted by Israel Garcia View Post
                    Given that the the Arellano-Bond test after xtdpdqml is also testing autocorrelation of the first-differenced residuals, then first-order autocorrelation is expected (and necessary I suppose). The test is interpreted as in the context of GMM estimation. Is this correct?
                    Yes, that is correct.
                    https://twitter.com/Kripfganz

                    Comment


                    • #11
                      Hello again, Sebastian,

                      I have another question that has arisen in connection with calculating long-run effects (LRE) when using xtdpdbc.

                      I mentioned in an earlier message the following article: Wilkins, Arjun S. 2018. To Lag or Not to Lag?: Re-Evaluating the Use of Lagged Dependent Variables in Regression Analysis. Political Science Research and Methods 6(2): 393-411
                      The author suggests at p. 404 that the denominator of the equation for calculating LRE (with two lags of the DV and one lag of the IV) should look like this: (1-ρ)(1-α1-α2), where ρ represents the coefficient of the IV regressed on the lag of the IV. The denominator used by every other article I could find on LRE, and what you suggested in one of your messages to me, does not include 1-ρ. When I follow Wilkins’ recommendation, I get an implausibly high LRE for each of my IVs of interest. Here are results I get for the three IVs of interest, using the equation you suggested (not including 1-ρ) and what Wilkins suggests (including 1-ρ):

                      x1 ρ=.973, LRE without 1-ρ=.256, LRE with 1-ρ=9.459

                      x2 ρ=.937, LRE without 1-ρ=.330, LRE with 1-ρ=5.240

                      x3 ρ=.953, LRE without 1-ρ=.273, LRE with 1-ρ=5.799

                      What is particularly strange about the results when I include 1-ρ is that the DV and the IVs are all continuous within the range of 0-1. Thus, for x1, a one-unit change in x1 results in an LRE that is over nine times the maximum value of the DV.

                      I wrote to Arjun Wilkins about my results, and he suggested I calculate LRE using something less than a one-unit change in the IV, such as a standard-deviation change. That didn’t really help, because the LREs would be respectively 2.204, .865, and 1.693 for each of the IVs. Even if I reduce the change in the IVs to something as low as .1, I still get LREs that seem much greater than expected by theory. He also suggested I check for a unit-root problem, but that wasn’t an issue.

                      Do you have any thoughts on this? In particular, is it appropriate to include 1-ρ when calculating LRE? And if it is, do you have any ideas about the results I am getting?

                      Comment


                      • #12
                        I think it is a matter of perspective: Are we considering
                        1. the long-run effect of a permanent change in X itself [without assuming any underlying process for X], or
                        2. the long-run effect of a permanent change in a determinant/disturbance e to the process of X [assuming an underlying AR(1) process for X in your case]?
                        In 1., we consider the effect of changing X by 1 unit and then keeping it constant afterwards. The long-run effect of that change of X on y will be β/(1-α1-α2).

                        In 2., we are taking into account that X itself will adjust over time to a permanent 1-unit level shift. The long-run effect of that change of e on X will be 1/(1-ρ), and the long-run effect on y will be the product of the two, i.e. β/[(1-ρ)(1-α1-α2)].

                        Calculating these long-run effects for binary dependent variables is problematic in a linear probability model. As you noticed, there is no upper bound for the effect, and it thus may exceed the range of plausible values for y. I am afraid, the only solution I see is to consider a nonlinear binary response model, with all the complications that this brings in the case of dynamic panel models.
                        https://twitter.com/Kripfganz

                        Comment


                        • #13
                          Thanks so much for your informative insights. One thing I didn't understand, however, was what you said in the first sentence of the last paragraph: "Calculating these long-run effects for binary dependent variables is problematic in a linear probability model." To clarify, all of my variables, including the DV, are continuous, not binary, and have values ranging between 0-1 (in their original form they have other values, but I normalized them to 0-1). Knowing that, would your answer be any different?

                          Comment


                          • #14
                            Apologies; I did not reed your message carefully enough. I still believe that you would need to consider a model that takes the nature of your dependent variable into account (e.g. fractional response model) if you want to calculate meaningful long-run effects. Alternatively, you might have to consider a smaller effect size. Think about what would be an economically reasonable magnitude for a change in X.
                            https://twitter.com/Kripfganz

                            Comment


                            • #15
                              Thanks again.

                              Your comments help me understand at a deeper level what Arjun Wilkins was trying to tell me about choosing realistic changes in the IVs rather than worrying about what happens with a one-unit change (or some other change) that results in the DV increasing above its maximum value. And I just remembered an article I read a few days ago (Mummolo, Jonathan, and Erik Peterson. 2018. Improving the Interpretation of Fixed Effects Regression Results. Political Science Research and Methods 6(4): 829-835) that argues researchers doing fixed-effects regressions should (but typically don’t) apply within-unit variation (rather than overall variation) when calculating substantive effects. When I do that, one-standard deviation increases in my IVs result in LREs respectively of approximately .735, .288, and .564, which are entirely plausible.

                              Without the combined help I received from you and Arjun Wilkins, I likely would not have reached the current state of my knowledge on the subject, so thank you very much.

                              Comment

                              Working...
                              X