Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sebastian Kripfganz
    replied
    Originally posted by Israel Garcia View Post
    Given that the the Arellano-Bond test after xtdpdqml is also testing autocorrelation of the first-differenced residuals, then first-order autocorrelation is expected (and necessary I suppose). The test is interpreted as in the context of GMM estimation. Is this correct?
    Yes, that is correct.

    Leave a comment:


  • Israel Garcia
    replied
    Thanks a lot Sebastian,
    Given that the the Arellano-Bond test after xtdpdqml is also testing autocorrelation of the first-differenced residuals, then first-order autocorrelation is expected (and necessary I suppose). The test is interpreted as in the context of GMM estimation. Is this correct?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    In theory, it would be possible to construct a serial correlation test in the spirit of the Arellano-Bond test. I might implement that for xtdpdbc at some point in the future.

    For the moment, if you are estimating a model with only 1 lag of the dependent variable, I would suggest to use my xtdpdqml command. The QML estimator implemented in that command has very similar properties to the BC estimator implemented in xtdpdbc. After xtdpdqml, you can use the Arellano-Bond test with estat serial.

    If you have higher-order lags of the dependent variable in your model, then you could estimate the model initially with a GMM estimator using my xtdpdgmm command, and again use estat serial for the Arellano-Bond test. When the test supports your specification, you could subsequently obtain the more efficient BC estimates with xtdpdbc.

    There are some other commands for alternative serial correlation tests, but those tests are usually derived under the assumption of strictly exogenous regressors, which excludes lags of the dependent variable.

    Leave a comment:


  • Israel Garcia
    replied
    Hi Sebastian,
    I really like your command. As far as I know, there is no built in postestimation command for serial correlation in your code. Do you have any suggestion of which command would fit to test for it?
    Thanks a lot,
    Israel

    Leave a comment:


  • Joseph L. Staats
    replied
    I figured it probably wasn't that simple to add factor variables and applaud your willingness to do so for xtdpdbc. I just tried the new feature to create and then graph higher-order variables using margins and marginsplot and everything worked as it should. I'll let you know if I encounter any problems, but I don't expect I will.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Factor variables are a programmer's nightmare. This is why I usually do not bother with them in the early stages of a new package. Yet, motivated by your request, I have now implemented the support for factor variables in xtdpdbc. The latest update to version 1.1.0 also has the new option small for a small-sample degrees-of-freedom correction of the standard errors and the reporting of small-sample test statistics.

    Code:
    adoupdate xtdpdbc, update
    Please let me know if you observe any unexpected behavior or error messages when using the command with factor variables.

    Leave a comment:


  • Joseph L. Staats
    replied
    Thanks so much for responding in a such a clear and comprehensive way. I am now much better informed about using LDVs and the calculation of short- and long-term effects and feel much more confident going forward with my current project.

    In working with xtdpdbc, I notice that it (as well as xtdpdqml) doesn't allow use of factor-variable operators, whereas xtdpdgmm does. I often use factor-variable operators for handling interactions and creating higher-order variables, especially when using the margins command for various purposes. Is adding this feature to xtdpdbc at all feasible?

    Thanks again.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Originally posted by Joseph L. Staats View Post
    I have tried using xtdpdbc with a new project of mine that investigates the political determinants of various aspects of human rights in the developing world. However, I have run into a problem, one that I have encountered with other projects whenever I include a lagged dependent variable in my models, regardless of what methodology I use (e.g., xtreg, xtdpdgmm, xtdpdqml, and now, xtdpdbc). The problem is that adding one or more LDVs causes the independent variables of interest coefficients to decline to implausibly low levels (as compared to what I get without an LDV and what I expect from theory).
    The coefficients have a different interpretation in static models - without a lagged dependent variable - compared to dynamic models - with at least one lag of the dependent variable. In static models, they could be interpreted as long-run coefficients (although their estimates might suffer from omitted-variable bias due to the omitted LDV). In dynamic models, they would be interpreted as short-run coefficients. Corresponding long-run coefficients (which are typically again larger than the short-run coefficients) can be obtained as follows
    Code:
    nlcom (_b[x] + _b[L.x]) / (1 - _b[L.y] - _b[L2.y])
    Originally posted by Joseph L. Staats View Post
    I see from p. 91 of your 2019 London Stata Conference presentation explaining xtdpdgmm that you recommend adding one or more lags of the independent variables. Although this recommendation is for a specific reason (possible correlation between instrument lags and the error term), does this recommendation also hold for more general reasons when using xtdpdbc?
    For xtdpdbc, absence of serial correlation in the error term is similarly important as it is for xtdpdgmm. Otherwise, the lagged dependent variable would be correlated with the idiosyncratic error term. The bias correction does not correct for that. Adding further lags of the dependent variable (and the independent variables) aims to obtain a model that is "dynamically complete". With GMM, we could possibly still find instruments that are valid when the model is not dynamically complete, but the bias correction approach critically depends on this assumption.

    Originally posted by Joseph L. Staats View Post
    I note that this journal article recommends trying one or more lags of independent variables when using LDVs to solve the problem I am describing: Wilkins, Arjun S. 2018. To Lag or Not to Lag?: Re-Evaluating the Use of Lagged Dependent Variables in Regression Analysis. Political Science Research and Methods 6(2): 393-411.
    Including lags of independent variables serves the same purpose of adding further lags of the dependent variable: to obtain a dynamically complete model; in other words: to proxy for the serial correlation in the idiosyncratic error term.

    Originally posted by Joseph L. Staats View Post
    I tried adding a first lag to each of my independent variables of interest and doing so brings the coefficients of the unlagged variables up to plausible levels. However, I am concerned about collinearity between the lagged and unlagged variables when I do this because the signs of the coefficients for the lagged variables flip to negative and I get very high VIF scores when I test for multicollinearity. I show below the regression results I get without and with lags for the independent variables and the VIF results.
    Notice that the coefficient sum of the unlagged and lagged independent variables is of a similar magnitude as in the model without the added lags. Adding lags allows for richer short-term dynamics: You might have a stronger contemporaneous effect and a balancing delayed effect, but the combined effect is still similar to the case where you only allow for a contemporaneous effect. The key quantity of interest often remains the long-run effect as outlined above.

    Originally posted by Joseph L. Staats View Post
    Based on correspondence I had with Arjun Wilkins, I calculated AIC and BIC scores for the models without and with lagged independent variables. I had to do this using xtreg, fe, because xtdpdbc would not produce results when I ran estat ic. I show the results below. As can be seen, the AIC and BIC scores become substantially better when I add the lagged independent variables. As I understand it, if the AIC and BIC scores improve with the lagged variables present, this indicates a better model fit, thereby lending confidence that the higher coefficients of the unlagged variables are real and not mere artifacts of multicollinearity. But I'm not certain of this and not yet confident that I can rely on the coefficient results I obtain after adding the lags.
    Leaving aside the complications of obtaining AIC/BIC criteria after xtdpdbc, these criteria can be useful indeed to decide about the inclusion of lagged regressors. This has a long-standing tradition in time series econometrics. There are similar criteria available after GMM estimation (see estat mmsc after xtdpdgmm). Regarding the collinearity, you can think about it from a different perspective. You can obtain an equivalent regression by including D.x instead of L.x. This would merely be a different parameterization of the same model, but clearly there would be no concern about collinearity among D.x and x. Note: In this model, you would calculate the long-run coefficient only based on the coefficient of x, not D.x:
    Code:
    nlcom _b[x] / (1 - _b[L.y] - _b[L2.y])

    Leave a comment:


  • Joseph L. Staats
    replied
    Sebastian,

    Thanks for creating another new Stata command that I believe will gain wide interest and use.

    I have tried using xtdpdbc with a new project of mine that investigates the political determinants of various aspects of human rights in the developing world. However, I have run into a problem, one that I have encountered with other projects whenever I include a lagged dependent variable in my models, regardless of what methodology I use (e.g., xtreg, xtdpdgmm, xtdpdqml, and now, xtdpdbc). The problem is that adding one or more LDVs causes the independent variables of interest coefficients to decline to implausibly low levels (as compared to what I get without an LDV and what I expect from theory).

    I see from p. 91 of your 2019 London Stata Conference presentation explaining xtdpdgmm that you recommend adding one or more lags of the independent variables. Although this recommendation is for a specific reason (possible correlation between instrument lags and the error term), does this recommendation also hold for more general reasons when using xtdpdbc?

    I note that this journal article recommends trying one or more lags of independent variables when using LDVs to solve the problem I am describing: Wilkins, Arjun S. 2018. To Lag or Not to Lag?: Re-Evaluating the Use of Lagged Dependent Variables in Regression Analysis. Political Science Research and Methods 6(2): 393-411.

    I tried adding a first lag to each of my independent variables of interest and doing so brings the coefficients of the unlagged variables up to plausible levels. However, I am concerned about collinearity between the lagged and unlagged variables when I do this because the signs of the coefficients for the lagged variables flip to negative and I get very high VIF scores when I test for multicollinearity. I show below the regression results I get without and with lags for the independent variables and the VIF results.

    Code:
     xtdpdbc y x1 x2 x3  if y_l1~=. & y_l2~=. & x1_l1~=. & x2_l1~=. & x3_l1~=. & year>1979, fe lags(2)
    
    Bias-corrected estimation
    Iteration 0:   f(b) =  .00031196  
    Iteration 1:   f(b) =  5.466e-07  
    Iteration 2:   f(b) =  5.337e-12  
    Iteration 3:   f(b) =  5.415e-22  
    
    Group variable: ccode                        Number of obs         =      5088
    Time variable: year                          Number of groups      =       144
    
                                                 Obs per group:    min =         5
                                                                   avg =  35.33333
                                                                   max =        39
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               y |
             L1. |   .9120774   .0230568    39.56   0.000     .8668869    .9572678
             L2. |  -.0380304    .017669    -2.15   0.031     -.072661   -.0033999
                 |
              x1 |   .0170865   .0051434     3.32   0.001     .0070056    .0271674
              x2 |   .0552508   .0160587     3.44   0.001     .0237763    .0867252
              x3 |   .0622974   .0196529     3.17   0.002     .0237785    .1008163
           _cons |  -.0005421    .010707    -0.05   0.960    -.0215275    .0204432
    ------------------------------------------------------------------------------
    
    . xtdpdbc y x1 x1_l1 x2 x2_l1 x3 x3_l1 if y_l1~=. & y_l2~=. & x1_l1~=. & x2_l1~=. & x3_l1~=. & year>1979, fe lags(2)
    
    Bias-corrected estimation
    Iteration 0:   f(b) =  .00031894  
    Iteration 1:   f(b) =  4.215e-08  
    Iteration 2:   f(b) =  3.750e-13  
    
    Group variable: ccode                        Number of obs         =      5088
    Time variable: year                          Number of groups      =       144
    
                                                 Obs per group:    min =         5
                                                                   avg =  35.33333
                                                                   max =        39
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               y |
             L1. |   .9382101   .0209673    44.75   0.000      .897115    .9793052
             L2. |  -.0031981   .0184993    -0.17   0.863     -.039456    .0330599
                 |
              x1 |   .0460023   .0128886     3.57   0.000     .0207411    .0712635
           x1_l1 |  -.0371751   .0117679    -3.16   0.002    -.0602397   -.0141104
              x2 |   .2712156   .0439953     6.16   0.000     .1849864    .3574447
           x2_l1 |  -.2536297   .0419765    -6.04   0.000    -.3359021   -.1713573
              x3 |   .3128306    .056701     5.52   0.000     .2016987    .4239625
           x3_l1 |  -.2910625   .0517423    -5.63   0.000    -.3924755   -.1896496
           _cons |    .013027   .0058651     2.22   0.026     .0015317    .0245223
    ------------------------------------------------------------------------------
    
    
    
    collin y_l1 y_l2 x1 x1_l1 x2 x2_l1 x3 x3_l1 if year>1979 
    (obs=5,088)
    
      Collinearity Diagnostics
    
                            SQRT                   R-
      Variable      VIF     VIF    Tolerance    Squared
    ----------------------------------------------------
          y_l1     38.99    6.24    0.0256      0.9744
          y_l2     37.66    6.14    0.0266      0.9734
            x1     14.56    3.82    0.0687      0.9313
         x1_l1     14.62    3.82    0.0684      0.9316
            x2     31.44    5.61    0.0318      0.9682
         x2_l1     31.51    5.61    0.0317      0.9683
            x3     29.84    5.46    0.0335      0.9665
         x3_l1     29.84    5.46    0.0335      0.9665
    ----------------------------------------------------
      Mean VIF     28.56
    Based on correspondence I had with Arjun Wilkins, I calculated AIC and BIC scores for the models without and with lagged independent variables. I had to do this using xtreg, fe, because xtdpdbc would not produce results when I ran estat ic. I show the results below. As can be seen, the AIC and BIC scores become substantially better when I add the lagged independent variables. As I understand it, if the AIC and BIC scores improve with the lagged variables present, this indicates a better model fit, thereby lending confidence that the higher coefficients of the unlagged variables are real and not mere artifacts of multicollinearity. But I'm not certain of this and not yet confident that I can rely on the coefficient results I obtain after adding the lags.


    Code:
      qui xtreg y y_l1 y_l2 x1 x2 x3  if y_l1~=. & y_l2~=. & x1_l1~=. & x2_l1~=. & x3_l1~=. & year>1979, fe 
    
    . estat ic
    
    Akaike's information criterion and Bayesian information criterion
    
    -----------------------------------------------------------------------------
           Model |          N   ll(null)  ll(model)      df        AIC        BIC
    -------------+---------------------------------------------------------------
               . |      5,088   3782.729   9240.922       6  -18469.84  -18430.64
    -----------------------------------------------------------------------------
    Note: BIC uses N = number of observations. See [R] BIC note.
    
    . qui xtreg y y_l1 y_l2 x1 x1_l1 x2 x2_l1 x3 x3_l1 if y_l1~=. & y_l2~=. & x1_l1~=. & x2_l1~=. & x3_l1~=. & y
    > ear>1979, fe 
    
    . estat ic
    
    Akaike's information criterion and Bayesian information criterion
    
    -----------------------------------------------------------------------------
           Model |          N   ll(null)  ll(model)      df        AIC        BIC
    -------------+---------------------------------------------------------------
               . |      5,088   3782.729   9569.201       9   -19120.4  -19061.59
    -----------------------------------------------------------------------------
    Note: BIC uses N = number of observations. See [R] BIC note.
    I would appreciate any advice you, or anyone else on Statalist, can provide to me.

    Joe

    Leave a comment:

Working...
X