Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sarah Magd
    replied
    Dear Prof. Kripfganz,


    I tried to check my Two-sys-GMM model with the diagnostic checks you posted in #367. My GMM type variables are Y, X2, X3, X5 and their lags' range is set from one to three. The standard type instrumental variables are first, second, and third lags of X1, X4, and X6.
    I have the following outcome. The P-values of estat overid are significant, showing that my model is misspecified. Also, the Hausman test shows that I have cross-sectional dependence. What would you recommend to solve the two problems?



    . xtdpdgmm L(0/1).Y X1 X2 X3 X4 X5 X6, model(diff) collapse gmm(Y X2 X3 X5, lag(1 3)) gmm(X1 X4 X6, lag(1 3)) gmm(Y X2 X3 X5, lag(1 1) diff model(level)) gmm(X1 X4 X6, lag (0 0) diff model (level)) two vce(r) overid

    Generalized method of moments estimation

    Fitting full model:
    Step 1 f(b) = .01275011
    Step 2 f(b) = .94657417

    Fitting reduced model 1:
    Step 1 f(b) = .77326433

    Fitting reduced model 2:
    Step 1 f(b) = .84751107

    Fitting reduced model 3:
    Step 1 f(b) = .86874136

    Fitting reduced model 4:
    Step 1 f(b) = .89973548

    Fitting no-diff model:
    Step 1 f(b) = 2.604e-09

    Fitting no-level model:
    Step 1 f(b) = .78165801

    Group variable: iso_num Number of obs = 336
    Time variable: year Number of groups = 28

    Moment conditions: linear = 29 Obs per group: min = 12
    nonlinear = 0 avg = 12
    total = 29 max = 12

    (Std. Err. adjusted for 28 clusters in iso_num)
    WC-Robust
    Y Coef. Std. Err. z P>z [95% Conf. Interval]
    Y
    L1. .7967178 .0652365 12.21 0.000 .6688566 .924579
    X1 -.1074083 .0301788 -3.56 0.000 -.1665576 -.048259
    X2 .0900069 .0329441 2.73 0.006 .0254376 .1545763
    X3 -.0131671 .0110706 -1.19 0.234 -.034865 .0085307
    X4 .226282 .0839039 2.70 0.007 .0618333 .3907307
    X5 -.0476106 .0942898 -0.50 0.614 -.2324151 .1371939
    X6 -.1638557 .0978363 -1.67 0.094 -.3556113 .0278999
    _cons -2.162016 .5205269 -4.15 0.000 -3.18223 -1.141802
    Instruments corresponding to the linear moment conditions:
    1, model(diff):
    L1.Y L2.Y L3.Y L1.X2 L2.X2 L3.X2 L1.X3 L2.X3 L3.X3 L1.X5 L2.X5 L3.X5
    2, model(diff):
    L1.X1 L2.X1 L3.X1 L1.X4 L2.X4 L3.X4 L1.X6 L2.X6 L3.X6
    3, model(level):
    L1.D.Y L1.D.X2 L1.D.X3 L1.D.X5
    4, model(level):
    D.X1 D.X4 D.X6
    5, model(level):
    _cons

    . estat overid, difference

    Sargan-Hansen (difference) test of the overidentifying restrictions
    H0: (additional) overidentifying restrictions are valid

    2-step weighting matrix from full model

    | Excluding | Difference
    Excluding Difference
    Moment conditions chi2 df p chi2 df p
    1, model(diff) 21.6514 9 0.0101 4.8527 12 0.9627
    2, model(diff) 23.7303 12 0.0221 2.7738 9 0.9726
    3, model(level) 24.3248 17 0.1109 2.1793 4 0.7028
    4, model(level) 25.1926 18 0.1197 1.3115 3 0.7264
    model(diff) 0.0000 0 . 26.5041 21 0.1879
    model(level) 21.8864 14 0.0810 4.6177 7 0.7065
    . estimates store full

    . xtdpdgmm L(0/1).Y X1 X2 X3 X4 X5 X6, model(diff) collapse gmm(X1 X4 X6, lag(1 3)) gmm(X1 X4 X6, lag(0 0) diff model (level)) two vce(r)

    Generalized method of moments estimation

    Fitting full model:
    Step 1 f(b) = .0013473
    Step 2 f(b) = .35246126

    Group variable: iso_num Number of obs = 336
    Time variable: year Number of groups = 28

    Moment conditions: linear = 13 Obs per group: min = 12
    nonlinear = 0 avg = 12
    total = 13 max = 12

    (Std. Err. adjusted for 28 clusters in iso_num)
    WC-Robust
    Y Coef. Std. Err. z P>z [95% Conf. Interval]
    Y
    L1. .6915241 .3140208 2.20 0.028 .0760547 1.306994
    X1 -.0796047 .0434806 -1.83 0.067 -.1648251 .0056158
    X2 .1766118 .1830601 0.96 0.335 -.1821794 .535403
    X3 .0940592 .0302717 3.11 0.002 .0347278 .1533906
    X4 .1856448 .1214403 1.53 0.126 -.0523739 .4236635
    X5 -.0955597 .1559156 -0.61 0.540 -.4011488 .2100293
    X6 -.1655075 .653214 -0.25 0.800 -1.445784 1.114769
    _cons -2.94722 2.484702 -1.19 0.236 -7.817146 1.922706
    Instruments corresponding to the linear moment conditions:
    1, model(diff):
    L1.X1 L2.X1 L3.X1 L1.X4 L2.X4 L3.X4 L1.X6 L2.X6 L3.X6
    2, model(level):
    D.X1 D.X4 D.X6
    3, model(level):
    _cons

    . estat overid

    Sargan-Hansen test of the overidentifying restrictions
    H0: overidentifying restrictions are valid

    2-step moment functions, 2-step weighting matrix chi2(5) = 9.8689
    Prob > chi2 = 0.0790

    2-step moment functions, 3-step weighting matrix chi2(5) = 14.9084
    Prob > chi2 = 0.0108

    . estat overid full

    Sargan-Hansen difference test of the overidentifying restrictions
    H0: additional overidentifying restrictions are valid

    2-step moment functions, 2-step weighting matrix chi2(16) = 16.6352
    Prob > chi2 = 0.4096

    2-step moment functions, 3-step weighting matrix chi2(16) = 13.0916
    Prob > chi2 = 0.6661

    . estat hausman full

    Generalized Hausman test chi2(7) = 30.3760
    H0: coefficients do not systematically differ Prob > chi2 = 0.0001
    Last edited by Sarah Magd; 26 Feb 2022, 02:21.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    I received a question by e-mail which is easier to answer here on the forum, and which might be of interest to others:
    How can we check the cross-sectional dependence after estimating the SYS-GMM with xtpdgmm?
    According to your presentation, we can only do diagnostic checks with estat serial and estat overid. I wonder if we can run the Sarafidis et al. (2009) testing procedure for error cross-section dependence after estimating SYS-GMM with xtpdgmm.
    Let us start with a DIF-GMM estimator. The proposed test for cross-sectional dependence is then simply an incremental Hansen test for the validity of the moment conditions for the lagged dependent variable, and it can be implemented using xtdpdgmm as follows:
    Code:
    . webuse abdata
    
    . xtdpdgmm L(0/1).n L(0/1).(w k), gmm(L.n, lag(1 5) model(diff)) gmm(w k, lag(2 6) model(diff)) teffects collapse twostep vce(robust) overid
    
    (estimation output partially omitted)
    ------------------------------------------------------------------------------
    Instruments corresponding to the linear moment conditions:
     1, model(diff):
       L1.L.n L2.L.n L3.L.n L4.L.n L5.L.n
     2, model(diff):
       L2.w L3.w L4.w L5.w L6.w L2.k L3.k L4.k L5.k L6.k
     3, model(level):
       1978bn.year 1979.year 1980.year 1981.year 1982.year 1983.year 1984.year
     4, model(level):
       _cons
    
    . estat overid, difference
    
    Sargan-Hansen (difference) test of the overidentifying restrictions
    H0: (additional) overidentifying restrictions are valid
    
    2-step weighting matrix from full model
    
                      | Excluding                   | Difference                  
    Moment conditions |       chi2     df         p |        chi2     df         p
    ------------------+-----------------------------+-----------------------------
       1, model(diff) |     9.1619      5    0.1028 |      2.9311      5    0.7106
       2, model(diff) |     0.0000      0         . |     12.0930     10    0.2789
      3, model(level) |     3.7749      3    0.2868 |      8.3180      7    0.3054
          model(diff) |          .     -5         . |           .      .         .
    The relevant test is the "Difference" test for the first set of instruments, labeled "1, model(diff)". If there was cross-sectional dependence, we would expect this test to reject the null hypothesis. Here, this is not the case given that we have a p-value of 0.71. (Note that this test is only applicable if the "Excluding" test does not reject the null hypothesis, which is the case here.)

    With a system GMM estimator, the test requires unfortunately a bit more effort because it involves jointly testing the validity of the moment conditions for the lagged dependent variable for the differenced and the level model. This is not (yet) directly possible with the above procedure. However, there is a workaround which involves estimating the full model and the reduced model (i.e. the model without the instruments under investigation) separately:
    Code:
    . xtdpdgmm L(0/1).n L(0/1).(w k), gmm(L.n, lag(1 5) model(diff)) gmm(w k, lag(2 6) model(diff)) gmm(L.n, lag(0 0) model(level)) gmm(w k, lag(1 1) model(level)) teffects collapse twostep vce(robust) overid
    
    (estimation output partially omitted)
    ------------------------------------------------------------------------------
    Instruments corresponding to the linear moment conditions:
     1, model(diff):
       L1.L.n L2.L.n L3.L.n L4.L.n L5.L.n
     2, model(diff):
       L2.w L3.w L4.w L5.w L6.w L2.k L3.k L4.k L5.k L6.k
     3, model(level):
       L.n
     4, model(level):
       L1.w L1.k
     5, model(level):
       1978bn.year 1979.year 1980.year 1981.year 1982.year 1983.year 1984.year
     6, model(level):
       _cons
    
    . estat overid, difference
    
    Sargan-Hansen (difference) test of the overidentifying restrictions
    H0: (additional) overidentifying restrictions are valid
    
    2-step weighting matrix from full model
    
                      | Excluding                   | Difference                  
    Moment conditions |       chi2     df         p |        chi2     df         p
    ------------------+-----------------------------+-----------------------------
       1, model(diff) |    10.3512      8    0.2412 |      2.9586      5    0.7064
       2, model(diff) |     1.0343      3    0.7930 |     12.2755     10    0.2670
      3, model(level) |    12.9565     12    0.3722 |      0.3533      1    0.5523
      4, model(level) |    12.2839     11    0.3427 |      1.0259      2    0.5987
      5, model(level) |     8.8893      6    0.1799 |      4.4204      7    0.7303
          model(diff) |          .     -2         . |           .      .         .
         model(level) |     3.7790      3    0.2863 |      9.5308     10    0.4826
    
    . estimates store full
    
    . xtdpdgmm L(0/1).n L(0/1).(w k), gmm(w k, lag(2 6) model(diff)) gmm(w k, lag(1 1) model(level)) teffects collapse twostep vce(robust)
    
    (estimation output partially omitted)
    ------------------------------------------------------------------------------
    Instruments corresponding to the linear moment conditions:
     1, model(diff):
       L2.w L3.w L4.w L5.w L6.w L2.k L3.k L4.k L5.k L6.k
     2, model(level):
       L1.w L1.k
     3, model(level):
       1978bn.year 1979.year 1980.year 1981.year 1982.year 1983.year 1984.year
     4, model(level):
       _cons
    
    . estat overid
    
    Sargan-Hansen test of the overidentifying restrictions
    H0: overidentifying restrictions are valid
    
    2-step moment functions, 2-step weighting matrix       chi2(7)     =   10.4055
                                                           Prob > chi2 =    0.1667
    (postestimation output partially omitted)
    
    . estat overid full
    
    Sargan-Hansen difference test of the overidentifying restrictions
    H0: additional overidentifying restrictions are valid
    
    2-step moment functions, 2-step weighting matrix       chi2(6)     =    2.9043
                                                           Prob > chi2 =    0.8208
    (postestimation output partially omitted)
    
    . estat hausman full
    
    Generalized Hausman test                               chi2(6)     =    3.7445
    H0: coefficients do not systematically differ          Prob > chi2 =    0.7112
    After estimating the full model, I first checked for any sign of misspecification of the level moment conditions, which could indicate that the additional Blundell-Bond assumption for the SYS-GMM estimator might not be satisfied. Here, all the p-values from the Difference-in-Hansen tests are acceptable.
    After estimation the reduced model, the first estat overid command checks for any misspecification in the reduced model. This is similar to the "Excluding" test in the earlier DIF-GMM example. Correct specification of the reduced model is again a prerequisite for the subsequent tests. For the next estat overid command, I have supplied the name of the stored estimation results from the full model. This postestimation command now computes a Difference-in-Hansen test by simply taking the difference of the two Hansen overidentification test statistics from the two models. This is conceptually the same as the "Difference" test in the earlier example, just that the two test statistics to be compared here are based on separate estimates of the variance-covariance matrix, while in the earlier example only the variance estimates from the full model were used. Asymptotically, both approaches are equivalent.
    Finally, the generalized Hausman test again compares the two models using the Hausman principle as an alternative to the Difference-in-Hansen test. However, the Hausman test tends to have poor finite-sample performance. In our case here, neither of the two tests rejects the null hypothesis that the full model is correctly specified. If there was evidence of cross-sectional dependence, we would expect to see a rejection by these tests.

    Leave a comment:


  • Prateek Bedi
    replied
    Thank you so much, Prof. Kripfganz. I shall try to increase the time periods.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    You need at least 4 time periods to calculate the AR(2) test statistic. With just 3 time periods (in levels), you only have 2 time periods in first differences, such that the second lag cannot be computed.

    Leave a comment:


  • Prateek Bedi
    replied
    Thanks a lot, Prof. Kripfganz. I have another question. For the output mentioned below, please let me know why the AR(2) coefficient was not calculated.

    Code:
    Generalized method of moments estimation
    
    Fitting full model:
    Step 1         f(b) =  5555652.4
    Step 2         f(b) =  .04704686
    
    Group variable: CompanyID                    Number of obs         =      1170
    Time variable: Year                          Number of groups      =       396
    
    Moment conditions:     linear =      21      Obs per group:    min =         1
                        nonlinear =       0                        avg =  2.954545
                            total =      21                        max =         3
    
                                (Std. Err. adjusted for 396 clusters in CompanyID)
    ------------------------------------------------------------------------------
                 |              WC-Robust
             PAT |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             PAT |
             L1. |    .973882   .1321274     7.37   0.000     .7149171    1.232847
                 |
             Lev |  -4838.181   2017.099    -2.40   0.016    -8791.623    -884.739
       Log_sales |  -80.49196   287.9182    -0.28   0.780    -644.8013    483.8174
                 |
            Year |
           2014  |  -130.4292   148.6886    -0.88   0.380    -421.8534     160.995
           2015  |   134.1587   180.3201     0.74   0.457    -219.2622    487.5795
                 |
           _cons |   2106.412    2522.11     0.84   0.404    -2836.834    7049.657
    ------------------------------------------------------------------------------
    
    . estat serial
    
    Arellano-Bond test for autocorrelation of the first-differenced residuals
    H0: no autocorrelation of order 1:     z =   -3.3944   Prob > |z|  =    0.0007
    H0: no autocorrelation of order 2:     z =         .   Prob > |z|  =         .

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Yes, it is standard practice to use differenced instruments for the model in levels; see e.g. slides 30 and following in my 2019 London Stata Conference presentation.

    A conventional specification check for the constant correlation over time would be a difference-in-Hansen test for the validity of the respective instruments for the level model; slides 48 and following in my presentation.

    Leave a comment:


  • Prateek Bedi
    replied
    Ok, Prof. Kripfganz. So how does one determine whether X1 and X2 have a constant correlation over time with the unobserved effects? Moreover, is it a standard practice to do differencing of the instruments (i.e. by using the diff option) to obtain valid instruments?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    The diff suboption applies a first-difference transformation to the instruments (not the model). It the untransformed variables X1 X2 have a constant correlation over time with the unobserved effects, their first differences D.X1 D.X2 will be uncorrelated with the unobserved effects. Thus, differencing the instruments would be necessary to obtain valid instruments.

    Leave a comment:


  • Prateek Bedi
    replied
    Originally posted by Sebastian Kripfganz View Post
    1. Did you specify the overid option in the xtdpdgmm command line? This is required for running the incremental overidentification tests.
    2. These are the values of the quadratic GMM objective function. In a just-identified model, these values would be zero. In an overidentified model, we cannot satisfy the empirical moment conditions exactly but we minimize their weighted squared deviations. The values differ between step 1 and 2 because of the different weighting matrices. The numbers themselves are not informative.
    3. You can specify X3 either in an iv() or a gmm() option. The former is just a collapsed version of the latter. For strictly exogenous variables, all lags and leads are valid instruments. Thus, in principle, you could specify lag(. .). It is however common practice not to use leads, i.e. lag(0 .). To avoid a too-many-instruments problem, especially when the time dimension is not very small, you can further restrict the maximum lag length, e.g. lag(0 4). This guidance applies to the model(diff) instruments. For model(level), you would typically just specify lag(0 0) for exogenous variables.
    4. This depends on what you want to achieve. If you want to implement a system GMM estimator, you need to specify separate gmm() options for model(diff) and model(level). Given that you would start with different lags for predetermined and endogenous variables, you would typically also specify separate options for the two variables. For example:
      Code:
      gmm(X1, lag(1 .) model(diff)) gmm(X2, lag(2 .) model(diff)) gmm(X1, diff lag(0 0) model(level)) gmm(X2, diff lag(1 1) model(level))
    Dear Prof. Sebastian,

    Thanks a lot once again for your crystal clear answers. I have the following query regarding your response in Point #4.

    In the command mentioned by you (reproduced below), what is the significance of writing 'diff'? What would be the implication if we do not write it?
    gmm(X1, diff lag(0 0) model(level)) gmm(X2, diff lag(1 1) model(level))

    Leave a comment:


  • Nursena Sagir
    replied
    Thank you for your detailed and quick reply.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    1. You do not need to solve it. Just ignore row 6 and the last row. The important row is row 5.

    2. You are adding the model(level) instruments to the model(fodev) instruments. You are not replacing them.

    3. My personal view is that the specification tests can aid your specification search, especially when you are unsure about the classification of variables. If you have strong theoretical reasons to assume that your variable is endogenous, I would stick to that. If you are willing to revise your prior assumption based on the specification tests, the estimates generally become more efficient when you assume that the variable is exogenous, as you can use more and stronger instruments in the latter case.

    Leave a comment:


  • Nursena Sagir
    replied
    Thanks for the reply.

    Originally posted by Sebastian Kripfganz View Post
    The missing test results (dots) tell us that there are insufficient degrees of freedom available to carry out the respective test. Removing all the instruments for the time dummys in your case means that the number of instruments would be smaller than the number of regressors, and therefore the coefficients would no longer be identified. Normally, we are primarily interested in the results from row 5.
    1. How can I solve this missing test results problem?

    2. In row 5, not rejecting the additional instruments used for the system GMM estimator means that I should use system GMM estimator rather than model(fodev) specification, right? Or is it more like adding system instruments to existing FOD model?

    3. My last questions is from theoretical point I believe I should define self_efficacy as endogenous variable. However, by looking all m1,m2,Hansen and underidentificatiin tests model improves when it is defined as exogenous variable. How should I decide on that?

    Best regards,
    Nursena

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Row 5 in the output table provides the test results for the instruments gmm(income income_lag, lag(0 0) diff model(level)). Row 6 provides the results for the time dummy instruments, generated by the teffects option. The last row in the output table provides results for jointly testing the instruments from row 5 and 6. The missing test results (dots) tell us that there are insufficient degrees of freedom available to carry out the respective test. Removing all the instruments for the time dummys in your case means that the number of instruments would be smaller than the number of regressors, and therefore the coefficients would no longer be identified. Normally, we are primarily interested in the results from row 5.

    Nonlinear moment conditions can be very useful to circumvent identification problems and to obtain more efficient estimates. However, when adding Blundell-Bond type instruments for the level model, those nonlinear moment conditions might become redundant. Technically, this redundancy occurs when we do not curtail and/or collapse the instruments. Thus, the nonlinear moment conditions may retain some relevance under such instrument reduction strategies. In practice, it is not clear whether it is beneficial to include nonlinear moment conditions jointly with collapsed Blundell-Bond instruments.

    The Hausman test could be of help to decide between nonlinear moment conditions assuming absence of serial correlation and those that additionally assume homoskedasticity. It is not very helpful to decide whether or not to include any nonlinear moment conditions at all. If there is no evidence of serial correlation, it generally does not harm to include the nl(noserial) option (aside from the potential redundancy mentioned above).

    Leave a comment:


  • Nursena Sagir
    replied
    Hi Sebastian,

    I have a questions related to your note below:

    Originally posted by Sebastian Kripfganz View Post
    1. With xtdpdgmm you could use the overid option and then the estat overid, difference postestimation command after the system GMM estimation. The last line in the test output that starts with model(level) can be used to make the desired assessment. If the test in the column headed "Excluded" does not reject the null hypothesis, then the difference GMM estimator is fine and you can use the column headed "Difference" to test the additional instruments used for the system GMM estimator. If the test in column headed "Excluded" rejects the null hypothesis, then the difference GMM estimator is misspecified and the corresponding "Difference" test becomes useless.

    I add additional level instruments for income (following your advice on p.117 in your London Stata Conference presentation). I use following command:

    Code:
    xtdpdgmm L(0/1).(depression_score) income income_lag self_efficacy, model(fodev) collapse gmm(depression_score, l(1 3)) gmm(income, l(0 2)) gmm(income_lag, l(0 2)) gmm(self_efficacy, l(0 2) m(mdev)) gmm(income income_lag, lag(0 0) diff model(level)) teffects two vce(r) overid nocons
    Then I look at the post estimation statistics.

    Code:
     estat overid, diff
    
    Sargan-Hansen (difference) test of the overidentifying restrictions
    H0: (additional) overidentifying restrictions are valid
    
    2-step weighting matrix from full model
    
                      | Excluding                   | Difference                  
    Moment conditions |       chi2     df         p |        chi2     df         p
    ------------------+-----------------------------+-----------------------------
      1, model(fodev) |     8.3313      7    0.3043 |      1.8420      3    0.6058
      2, model(fodev) |     7.9503      7    0.3370 |      2.2230      3    0.5274
      3, model(fodev) |     8.2779      7    0.3087 |      1.8954      3    0.5944
       4, model(mdev) |     4.4378      7    0.7282 |      5.7355      3    0.1252
      5, model(level) |     8.1528      8    0.4187 |      2.0205      2    0.3641
      6, model(level) |          .     -6         . |           .      .         .
         model(fodev) |     0.6462      1    0.4215 |      9.5270      9    0.3901
         model(level) |          .     -8         . |           .      .         .
    The last line in the test output that starts with model(level) is missing. How should I interpret this?

    In addition to that, it is not very clear to me when we should consider to add non-linear moment conditions. Should we use Hausman test to decide?

    Best regards,
    Nursena

    Leave a comment:


  • Jains Chacko
    replied
    Thank you for your valuable comments, Sir.

    Leave a comment:

Working...
X