Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Eliana Melo
    replied
    Thank you so much, professor Kripfganz, now is clear!

    Leave a comment:


  • haiyan lin
    replied
    Originally posted by Sebastian Kripfganz View Post
    Bootstrapping GMM estimates for dynamic panel models is not a straightforward task. After resampling the residuals, you would need to recursively reconstruct the data for the dependent variable using the estimate for the coefficient of the lagged dependent variable. The instruments used in the estimation also need to be updated accordingly. As far as I know, this cannot be readily done with the existing bootstrap functionality in Stata.
    Thanks, Sebastian! I searched for this issue but can't find a solution. Feel relieved after receiving your answer :D

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Eliana Melo
    In your specification, ob_agre subnormal inadpf log_pib log_iasc are treated as strictly exogenous, not predetermined. Also, you implicitly assume that they are uncorrelated with the unobserved "fixed effects", because they are used as instruments without the first-difference transformation in the level model. You might want to change your code as follows:
    Code:
    xtdpdgmm pntbt L.pntbt ob_agre subnormal inadpf log_decapu log_pib log_iasc log_tarid, ///
    gmmiv(L.pntbt, lag(2 2) m(d) collapse) ///
    gmmiv(L.pntbt, lag(1 1) m(l) diff collapse) ///
    gmmiv(log_decapu, lag(2 2) m(d) collapse) ///
    gmmiv(log_decapu, lag(1 1) m(l) diff collapse) ///
    gmmiv(log_tarid, lag(2 2) m(d) collapse) ///
    gmmiv(log_tarid, lag(1 1) m(l) diff collapse) ///
    gmmiv(ob_agre subnormal inadpf log_pib log_iasc, lag(1 1) m(d) collapse) ///
    gmmiv(ob_agre subnormal inadpf log_pib log_iasc, lag(1 1) m(l) diff collapse) ///
    twostep vce(r) overid
    For the level model, it does not make a difference whether a variable is treated as endogenous or predetermined.

    One possibility to deal with the differences in the overidentification tests would be to consider an iterated GMM estimator (option igmm instead of twostep), although this could aggravate any problems if there is a weak identification problem. I would suggest to check for weak identification with the underid command (available from SSC and explained in my presentation).

    For correct specification, you want the Difference-in-Hansen tests to not reject the null hypothesis. But for this test to be valid, it is initially required that the test in the "Excluding" column also does not reject the null hypothesis. In your case, none of the tests gives rise to an obvious concern, but the p-values are also not large enough to be entirely comfortable.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Originally posted by haiyan lin View Post
    Is there a good way to get bootstrapped confidence intervals after GMM estimation?
    Bootstrapping GMM estimates for dynamic panel models is not a straightforward task. After resampling the residuals, you would need to recursively reconstruct the data for the dependent variable using the estimate for the coefficient of the lagged dependent variable. The instruments used in the estimation also need to be updated accordingly. As far as I know, this cannot be readily done with the existing bootstrap functionality in Stata.

    Leave a comment:


  • Eliana Melo
    replied
    Dear all,


    I have doubts about how to embed predetermined variables in the system GMM. I read Professor Sebastian's presentation and I'm not sure if I'm doing it right. My dependent variable is the percentage of Non-Technical Losses in distribution of electricity (pntbt) or electricity theft. I suspect the endogeneity of two explanatory variable: duration of interruptions in electrical distribution (log_dec)) and electricity price (log_tarid). The other variables are predetermined (I have no evidence to think they are strictly exogenous).

    Code:
    Code:
    xtdpdgmm pntbt L.pntbt ob_agre subnormal inadpf log_decapu log_pib log_iasc log_tarid, ///
    gmmiv(L.pntbt, lag(2 2) m(d) collapse) ///
    gmmiv(L.pntbt, lag(2 2) m(l) diff collapse) ///
    gmmiv(log_decapu, lag(2 2) m(d) collapse) ///
    gmmiv(log_decapu, lag(3 3) m(l) diff collapse) ///
    gmmiv(log_tarid, lag(2 2) m(d) collapse) ///
    gmmiv(log_tarid, lag(2 2) m(l) diff collapse) ///
    gmmiv(ob_agre subnormal inadpf log_pib log_iasc, lag(0 1) m(d) collapse) ///
    gmmiv(ob_agre subnormal inadpf log_pib log_iasc, lag(0 1) m(l) collapse) ///
    twostep vce(r) overid

    Code:
    Group variable: id                           Number of obs         =       721
    Time variable: ano                           Number of groups      =        61
    
    Moment conditions:     linear =      27      Obs per group:    min =         8
                        nonlinear =       0                        avg =  11.81967
                            total =      27                        max =        12
    
                                        (Std. Err. adjusted for 61 clusters in id)
    ------------------------------------------------------------------------------
                 |              WC-Robust
           pntbt |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           pntbt |
             L1. |   .8545485   .0980974     8.71   0.000     .6622812    1.046816
                 |
         ob_agre |   .0000241   .0002129     0.11   0.910    -.0003931    .0004414
       subnormal |   .2545236   .1650646     1.54   0.123    -.0689971    .5780442
          inadpf |   .0712857   .2658358     0.27   0.789    -.4497428    .5923142
      log_decapu |   .0202332   .0179756     1.13   0.260    -.0149983    .0554647
         log_pib |   .0086632    .008496     1.02   0.308    -.0079886     .025315
        log_iasc |  -.0108519   .0179764    -0.60   0.546    -.0460851    .0243813
       log_tarid |   .0352162   .0273752     1.29   0.198    -.0184383    .0888706
           _cons |  -.2797855   .2705651    -1.03   0.301    -.8100833    .2505124
    ------------------------------------------------------------------------------
    Code:
    estat serial
    estat overid
    estat overid, difference
    Code:
    estat serial
    
    Arellano-Bond test for autocorrelation of the first-differenced residuals
    H0: no autocorrelation of order 1:     z =   -3.2453   Prob > |z|  =    0.0012
    H0: no autocorrelation of order 2:     z =    1.5184   Prob > |z|  =    0.1289
    
    . estat overid
    
    Sargan-Hansen test of the overidentifying restrictions
    H0: overidentifying restrictions are valid
    
    2-step moment functions, 2-step weighting matrix       chi2(18)    =   24.4723
                                                           Prob > chi2 =    0.1402
    
    2-step moment functions, 3-step weighting matrix       chi2(18)    =   30.4614
                                                           Prob > chi2 =    0.0332
    
    . estat overid, difference
    
    Sargan-Hansen (difference) test of the overidentifying restrictions
    H0: (additional) overidentifying restrictions are valid
    
    2-step weighting matrix from full model
    
                      | Excluding                   | Difference                  
    Moment conditions |       chi2     df         p |        chi2     df         p
    ------------------+-----------------------------+-----------------------------
       1, model(diff) |    24.4438     17    0.1079 |      0.0286      1    0.8658
      2, model(level) |    24.4721     17    0.1072 |      0.0002      1    0.9884
       3, model(diff) |    22.0325     17    0.1835 |      2.4398      1    0.1183
      4, model(level) |    24.4103     17    0.1087 |      0.0620      1    0.8033
       5, model(diff) |    22.2540     17    0.1751 |      2.2183      1    0.1364
      6, model(level) |    24.4709     17    0.1072 |      0.0014      1    0.9700
       7, model(diff) |    15.6926      8    0.0470 |      8.7797     10    0.5531
      8, model(level) |     8.6499      8    0.3727 |     15.8224     10    0.1048
          model(diff) |     8.0584      5    0.1530 |     16.4139     13    0.2275
         model(level) |     8.0584      5    0.1530 |     16.4139     13    0.2275

    In a last post, prof. Kripfganz mentioned that
    It is usually sufficient to consider the overidentification test with the 2-step weighting matrix. The two tests are asymptotically equivalent. If they differ substantially, then this would be an indication that the weighting matrix is poorly estimated.
    . In this case, the two overidentification test are differents, what would I do in that case? When is consider that the 2-step and 3-step weighting matrix are substantially different?

    Also, I am not sure of interpreting right the Sargan-Hansen difference test. In a general way, the (Difference-in-) Hansen tests do not reject the null hypothesis, then the instruments in all equations are valid. Or is it to be concerned that the in some equations the p values are relatively small?


    Thank you so much for any comment!!!
    Last edited by Eliana Melo; 04 Jul 2021, 12:02.

    Leave a comment:


  • haiyan lin
    replied
    Dear all,

    Is there a good way to get bootstrapped confidence intervals after GMM estimation? Great thanks!

    Best,
    Haiyan

    Leave a comment:


  • Sebastian Kripfganz
    replied
    I do not have a conclusive answer, but I suspect that the "small" number of clusters relative to the very large number of observations intensifies the differences that result from the different implementations of the Arellano-Bond tests. If you look into the literature on how many clusters you should at least have for reliable inference, you would often find that 57 should be sufficient. However, in my opinion these absolute thresholds are not very meaningful because the performance also depends on how many observations there are within each cluster.

    Leave a comment:


  • Reid Taylor
    replied
    Sebastian Kripfganz I have 75,204 company-zipcode groupings with observations across a T=10 year panel (so total N=752,050). There are 57 companies in the panel, which are represented by group_code in the code I showed above.

    The reason I feel convicted to cluster the standard errors is the yearly treatment variable is common within each company across the zipcodes (x_gt) while observation is (x_igt). Ideally I would cluster at the company-year level, however xtdpdgmm does not allow the clustering to be at the company-year level since the panel id is not contained within the cluster. Therefore, I cluster at the company level.

    As you noted, when running the two with unadjusted standard errors and onestep estimation, the two AB z scores match between xtdpdgmm and xtabond2 for both AR(1) and AR(2).


    Leave a comment:


  • Sebastian Kripfganz
    replied
    Reid Taylor
    That is indeed a substantial difference for the AR(2) tests. What are the dimensions of your data set (number of time periods, number of groups, number of clusters)? Do the tests coincide if you do not specify (cluster-)robust standard errors?

    Jose Albrecht
    Whether the panel is balanced or unbalanced should not be of much relevance here. xtivreg can deal with unbalanced panel data, too. The problem with your data set is rather that the number of groups is very small (20) - usually too small to expect reliable results from standard dynamic panel data GMM estimators. You might want to have a look into estimators for large-T data sets, e.g. those implemented by the community-contributed xtdcce2 command, but that would be a topic for a different thread. But even xtivreg might be good enough given that 35 time periods might be enough to not worry too much about the dynamic panel data (Nickell) bias.

    Leave a comment:


  • Reid Taylor
    replied
    Sebastian, If I may follow up with your response. The AR results are as follows:

    xtdpdgmm yields:
    Arellano-Bond test for autocorrelation of the first-differenced residuals
    H0: no autocorrelation of order 1: z = -1.1432 Prob > |z| = 0.2530
    H0: no autocorrelation of order 2: z = -0.8983 Prob > |z| = 0.3690

    xtabond2:
    Arellano-Bond test for AR(1) in first differences: z = -1.19 Pr > z = 0.233
    Arellano-Bond test for AR(2) in first differences: z = -6.18 Pr > z = 0.000

    As you probably guessed, I am concerned since the xtabond2 model provides strong evidence of AR(2). Does the difference in z scores align with your prior of how much difference there can be between the two processes?

    Thanks,
    -Reid

    Leave a comment:


  • Jose Albrecht
    replied
    Dear, Tugrul Cinar ,

    Thank you for your comment, what approach do you recommend then? I am aware that my panel data is unbalanced, I have small observations (720) and a common panel data with endogeneity regression (with xtivreg, etc) is mostly used when you have a strongly balanced panel.

    By the way, thank you for your time, I am aware that my econometric and Stata level are limited and it would be such a great help to recieve anykind of feedback.

    Regards,

    José Albrecht
    Last edited by Jose Albrecht; 15 Jun 2021, 05:32.

    Leave a comment:


  • Tugrul Cinar
    replied
    Dear Jose Albrecht

    I would not prefer to use xtabond/xtabond2 or even xtdpdgmm in your case since GMM estimation is not a good option for your sample.

    If you want to get comprehensive answers to your questions please see the FAQ section for posting rules.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    xtabond2 and xtdpdgmm use different formulae for the Arellano-Bond test. The results should coincide for the one-step estimator with robust (but not cluster-robust) standard errors, and for the two-step estimator without robust standard errors. They differ for the non-robust one-step estimator because xtdpdgmm always computes the test in a robust way (using an influence function approach in a similar way as the suest command). They differ for the two-step robust estimator because xtdpdgmm accounts for the Windmeijer correction in all terms of the test statistic (while xtabond2 does so only for the main term), and for a similar reason they differ for cluster-robust standard errors. Usually, the differences should not be large.

    Leave a comment:


  • Reid Taylor
    replied
    I am estimating a model using the difference GMM estimator. I am trying to replicate my results from xtdpdgmm with xtabond2. Results replicate fine (coef. estimates and std.errors). However, the A & B AR tests are vastly different. I saw a comment from Sebastian Kripfganz earlier which stated that the overid tests may vary with the presence of time dummies, but I am not sure if this holds true for the AR tests as well. I am also not sure if this is an issue with the time dummies or the clustering (xtabond2 may not adjust the test for clustering?).

    Thanks, and sorry if this is a repeat question.

    For reference, the two codes:

    Code:
     xtdpdgmm log_pols treat rps_treat L1.drought L1.dr  if everfire==0, model(diff) gmm(treat rps_treat, lag(1 3)) iv(drought dr, model(level)) teffects nocons vce(cluster group_code) twostep
    xtabond2 log_pols treat rps_treat yr* L1.drought L1.dr if everfire==0, gmmstyle(treat rps_treat, lag(1 3) equation(diff)) ivstyle(yr* drought dr, equation(level) passthru) noconstant cluster(group_code) twostep
    Last edited by Reid Taylor; 11 Jun 2021, 19:14.

    Leave a comment:


  • Dario Maimone Ansaldo Patti
    replied
    Sebastian Kripfganz thanks a lot. Yes, i am aware of your last comment. The specification is not correct. Bit at the moment I did not pay attention to this. I just wanted to understand the "compatibility" of the two commands. Thanks a lot again.

    Leave a comment:

Working...
X