Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • It worked. Thank you very much. I was making a technical mistake. It is fine now.

    Comment


    • Dear Sebastian,

      I have a question to you related with "Sargan-Hansen test of the overidentifying restrictions" test. In case we get different results for 2-step weighting matrix and 3-step weighting matrix, can we rely on one of them? Particularly in my case Sargan-Hansen results for 2-step weighting matrix is 15.97 (p=0.314) whilst for 3-step weighting matrix it is 26.95 (p=0.019). Is it acceptable?

      Thank you in advance.

      Comment


      • This discrepancy could indicate that the weighting matrix is not precisely estimated. This can happen if you have many instruments relative to the number of groups and/or if you have weak instruments. If you have not done this yet, try to reduce the number of instruments by curtailing the lags used as instruments and/or by collapsing the instruments.

        Alternatively, you could also try the iterated GMM estimator, option igmm instead of twostep, but it may not always converge in a reasonable number of steps if there are some underlying problems with the weighting matrix.

        If you only have a small number of groups, then there is sometimes not much that can be done. Estimating the weighting matrix precisely is then hardly possible and overidentification tests have only limited reliability in such a case.
        https://twitter.com/Kripfganz

        Comment


        • Thank you very much for clarification.

          Comment


          • Dear Prof. Sebastian,

            I have the following queries regarding xtdpdgmm.

            1. The post-estimation command estat overid, diff is returning the following error when I run it after the main command i.e. xtdpdgmm. Although, I have used this post-estimation command many a times before, I do not understand the reason for this error.

            Code:
            . estat overid, diff
            requested action not valid after most recent estimation command
            r(321);
            Code:
            
            


            2. How do we interpret the following numbers we get at the start of the output?

            Code:
            Fitting full model:
            Step 1         f(b) =  .00305113
            Step 2         f(b) =   .9227573
            Code:
            
            


            3. If I have 3 explanatory variables in my model, say X1, X2 and X3 and I believe that X1 is predetermined, X2 is endogenous and X3 is exogenous, do I need to specify instruments for X3 in my command by opening the gmmiv brackets and specifying certain starting and ending lag lengths? If yes, what should these lag lengths be?

            4. For the predetermined and endogenous variables, X1 and X2, do I need to open two gmmiv brackets, one each with model(level) and model(diff) in my command?

            Thanks!

            Comment


              1. Did you specify the overid option in the xtdpdgmm command line? This is required for running the incremental overidentification tests.
              2. These are the values of the quadratic GMM objective function. In a just-identified model, these values would be zero. In an overidentified model, we cannot satisfy the empirical moment conditions exactly but we minimize their weighted squared deviations. The values differ between step 1 and 2 because of the different weighting matrices. The numbers themselves are not informative.
              3. You can specify X3 either in an iv() or a gmm() option. The former is just a collapsed version of the latter. For strictly exogenous variables, all lags and leads are valid instruments. Thus, in principle, you could specify lag(. .). It is however common practice not to use leads, i.e. lag(0 .). To avoid a too-many-instruments problem, especially when the time dimension is not very small, you can further restrict the maximum lag length, e.g. lag(0 4). This guidance applies to the model(diff) instruments. For model(level), you would typically just specify lag(0 0) for exogenous variables.
              4. This depends on what you want to achieve. If you want to implement a system GMM estimator, you need to specify separate gmm() options for model(diff) and model(level). Given that you would start with different lags for predetermined and endogenous variables, you would typically also specify separate options for the two variables. For example:
                Code:
                gmm(X1, lag(1 .) model(diff)) gmm(X2, lag(2 .) model(diff)) gmm(X1, diff lag(0 0) model(level)) gmm(X2, diff lag(1 1) model(level))
              https://twitter.com/Kripfganz

              Comment


              • Hi Sir,

                I am working on predicting the firm-level optimum level of investment. In the estimation, I have the model developed by Richardson (2006) which is expressed as an investment as a function of one year lagged values of investment, explanatory variables and control variables. I tried using the following command in Stata.

                Code:
                xtdpdgmm L(0/1).loginvest l1.tobinsq l1.streturn l1.cash l1.logta l1.age l1.lev, noserial gmm(L1.loginvest, collapse model(difference)) iv(l1.tobinsq l1.streturn l1.cash l1.logta l1.age, difference model(difference)) twostep vce(cluster co_id)
                Postestimaation
                Code:
                estat serial
                estat overid
                Output
                Code:
                Generalized method of moments estimation
                
                Fitting full model:
                
                Step 1:
                initial:       f(b) =   60.63222
                alternative:   f(b) =  50.099668
                rescale:       f(b) =  4.5306529
                Iteration 0:   f(b) =  4.5306529  
                Iteration 1:   f(b) =  .88650032  
                Iteration 2:   f(b) =  .00834241  
                Iteration 3:   f(b) =  .00348571  
                Iteration 4:   f(b) =  .00347453  
                Iteration 5:   f(b) =  .00347453  
                
                Step 2:
                Iteration 0:   f(b) =  .00273063  
                Iteration 1:   f(b) =    .002638  
                Iteration 2:   f(b) =    .002638  
                
                Group variable: co_id                        Number of obs         =      1717
                Time variable: year                          Number of groups      =       855
                
                Moment conditions:     linear =       9      Obs per group:    min =         1
                                    nonlinear =       1                        avg =  2.008187
                                        total =      10                        max =         3
                
                                                (Std. Err. adjusted for 855 clusters in co_id)
                ------------------------------------------------------------------------------
                             |              WC-Robust
                   loginvest |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                   loginvest |
                         L1. |   .3550268   .1388608     2.56   0.011     .0828647     .627189
                             |
                     tobinsq |
                         L1. |  -.0089324   .0571256    -0.16   0.876    -.1208965    .1030317
                             |
                    streturn |
                         L1. |   .0912456   .1086676     0.84   0.401    -.1217391    .3042302
                             |
                        cash |
                         L1. |  -5.99e-07   9.15e-06    -0.07   0.948    -.0000185    .0000173
                             |
                       logta |
                         L1. |   .4958169   .6708675     0.74   0.460    -.8190593    1.810693
                             |
                         age |
                         L1. |  -.2694499   .0745122    -3.62   0.000    -.4154912   -.1234086
                             |
                         lev |
                         L1. |  -20.92443   9.509304    -2.20   0.028    -39.56233   -2.286541
                             |
                       _cons |   11.72433   4.518156     2.59   0.009      2.86891    20.57976
                ------------------------------------------------------------------------------
                Instruments corresponding to the linear moment conditions:
                 1, model(diff):
                   L1.L.loginvest L2.L.loginvest L3.L.loginvest
                 2, model(diff):
                   D.L.tobinsq D.L.streturn D.L.cash D.L.logta D.L.age
                 3, model(level):
                   _cons
                
                .
                . estat serial
                
                Arellano-Bond test for autocorrelation of the first-differenced residuals
                H0: no autocorrelation of order 1:     z =   -3.2873   Prob > |z|  =    0.0010
                H0: no autocorrelation of order 2:     z =         .   Prob > |z|  =         .
                
                . estat overid
                
                Sargan-Hansen test of the overidentifying restrictions
                H0: overidentifying restrictions are valid
                
                2-step moment functions, 2-step weighting matrix       chi2(2)     =    2.2555
                                                                       Prob > chi2 =    0.3238
                
                2-step moment functions, 3-step weighting matrix       chi2(2)     =    2.2374
                                                                       Prob > chi2 =    0.3267
                I ran this code on a panel with five years of data (strongly balanced). The post-estimation Arellano-Bond test of order 2 is not available. I request you to confirm the correctness of the code used.

                Comment


                • I am afraid your panel is not balanced. The regression output states that you have between 1 and 3 observations per company - on average about 2. With a maximum of 3 observations per group it is not possible to calculate the AR(2) test statistic. You need a minimum of 4 time periods.

                  Note that the xtset command might tell you that your panel is "strongly balanced". However, this only means that you have for each company and year a respective row in your data set. It does not check whether there are any missing values. Those missing values in some of your variables turn the estimation sample unbalanced.
                  https://twitter.com/Kripfganz

                  Comment


                  • Thank you for your valuable comments, Sir.

                    Comment


                    • Hi Sebastian,

                      I have a questions related to your note below:

                      Originally posted by Sebastian Kripfganz View Post
                      1. With xtdpdgmm you could use the overid option and then the estat overid, difference postestimation command after the system GMM estimation. The last line in the test output that starts with model(level) can be used to make the desired assessment. If the test in the column headed "Excluded" does not reject the null hypothesis, then the difference GMM estimator is fine and you can use the column headed "Difference" to test the additional instruments used for the system GMM estimator. If the test in column headed "Excluded" rejects the null hypothesis, then the difference GMM estimator is misspecified and the corresponding "Difference" test becomes useless.

                      I add additional level instruments for income (following your advice on p.117 in your London Stata Conference presentation). I use following command:

                      Code:
                      xtdpdgmm L(0/1).(depression_score) income income_lag self_efficacy, model(fodev) collapse gmm(depression_score, l(1 3)) gmm(income, l(0 2)) gmm(income_lag, l(0 2)) gmm(self_efficacy, l(0 2) m(mdev)) gmm(income income_lag, lag(0 0) diff model(level)) teffects two vce(r) overid nocons
                      Then I look at the post estimation statistics.

                      Code:
                       estat overid, diff
                      
                      Sargan-Hansen (difference) test of the overidentifying restrictions
                      H0: (additional) overidentifying restrictions are valid
                      
                      2-step weighting matrix from full model
                      
                                        | Excluding                   | Difference                  
                      Moment conditions |       chi2     df         p |        chi2     df         p
                      ------------------+-----------------------------+-----------------------------
                        1, model(fodev) |     8.3313      7    0.3043 |      1.8420      3    0.6058
                        2, model(fodev) |     7.9503      7    0.3370 |      2.2230      3    0.5274
                        3, model(fodev) |     8.2779      7    0.3087 |      1.8954      3    0.5944
                         4, model(mdev) |     4.4378      7    0.7282 |      5.7355      3    0.1252
                        5, model(level) |     8.1528      8    0.4187 |      2.0205      2    0.3641
                        6, model(level) |          .     -6         . |           .      .         .
                           model(fodev) |     0.6462      1    0.4215 |      9.5270      9    0.3901
                           model(level) |          .     -8         . |           .      .         .
                      The last line in the test output that starts with model(level) is missing. How should I interpret this?

                      In addition to that, it is not very clear to me when we should consider to add non-linear moment conditions. Should we use Hausman test to decide?

                      Best regards,
                      Nursena

                      Comment


                      • Row 5 in the output table provides the test results for the instruments gmm(income income_lag, lag(0 0) diff model(level)). Row 6 provides the results for the time dummy instruments, generated by the teffects option. The last row in the output table provides results for jointly testing the instruments from row 5 and 6. The missing test results (dots) tell us that there are insufficient degrees of freedom available to carry out the respective test. Removing all the instruments for the time dummys in your case means that the number of instruments would be smaller than the number of regressors, and therefore the coefficients would no longer be identified. Normally, we are primarily interested in the results from row 5.

                        Nonlinear moment conditions can be very useful to circumvent identification problems and to obtain more efficient estimates. However, when adding Blundell-Bond type instruments for the level model, those nonlinear moment conditions might become redundant. Technically, this redundancy occurs when we do not curtail and/or collapse the instruments. Thus, the nonlinear moment conditions may retain some relevance under such instrument reduction strategies. In practice, it is not clear whether it is beneficial to include nonlinear moment conditions jointly with collapsed Blundell-Bond instruments.

                        The Hausman test could be of help to decide between nonlinear moment conditions assuming absence of serial correlation and those that additionally assume homoskedasticity. It is not very helpful to decide whether or not to include any nonlinear moment conditions at all. If there is no evidence of serial correlation, it generally does not harm to include the nl(noserial) option (aside from the potential redundancy mentioned above).
                        https://twitter.com/Kripfganz

                        Comment


                        • Thanks for the reply.

                          Originally posted by Sebastian Kripfganz View Post
                          The missing test results (dots) tell us that there are insufficient degrees of freedom available to carry out the respective test. Removing all the instruments for the time dummys in your case means that the number of instruments would be smaller than the number of regressors, and therefore the coefficients would no longer be identified. Normally, we are primarily interested in the results from row 5.
                          1. How can I solve this missing test results problem?

                          2. In row 5, not rejecting the additional instruments used for the system GMM estimator means that I should use system GMM estimator rather than model(fodev) specification, right? Or is it more like adding system instruments to existing FOD model?

                          3. My last questions is from theoretical point I believe I should define self_efficacy as endogenous variable. However, by looking all m1,m2,Hansen and underidentificatiin tests model improves when it is defined as exogenous variable. How should I decide on that?

                          Best regards,
                          Nursena

                          Comment


                          • 1. You do not need to solve it. Just ignore row 6 and the last row. The important row is row 5.

                            2. You are adding the model(level) instruments to the model(fodev) instruments. You are not replacing them.

                            3. My personal view is that the specification tests can aid your specification search, especially when you are unsure about the classification of variables. If you have strong theoretical reasons to assume that your variable is endogenous, I would stick to that. If you are willing to revise your prior assumption based on the specification tests, the estimates generally become more efficient when you assume that the variable is exogenous, as you can use more and stronger instruments in the latter case.
                            https://twitter.com/Kripfganz

                            Comment


                            • Thank you for your detailed and quick reply.

                              Comment


                              • Originally posted by Sebastian Kripfganz View Post
                                1. Did you specify the overid option in the xtdpdgmm command line? This is required for running the incremental overidentification tests.
                                2. These are the values of the quadratic GMM objective function. In a just-identified model, these values would be zero. In an overidentified model, we cannot satisfy the empirical moment conditions exactly but we minimize their weighted squared deviations. The values differ between step 1 and 2 because of the different weighting matrices. The numbers themselves are not informative.
                                3. You can specify X3 either in an iv() or a gmm() option. The former is just a collapsed version of the latter. For strictly exogenous variables, all lags and leads are valid instruments. Thus, in principle, you could specify lag(. .). It is however common practice not to use leads, i.e. lag(0 .). To avoid a too-many-instruments problem, especially when the time dimension is not very small, you can further restrict the maximum lag length, e.g. lag(0 4). This guidance applies to the model(diff) instruments. For model(level), you would typically just specify lag(0 0) for exogenous variables.
                                4. This depends on what you want to achieve. If you want to implement a system GMM estimator, you need to specify separate gmm() options for model(diff) and model(level). Given that you would start with different lags for predetermined and endogenous variables, you would typically also specify separate options for the two variables. For example:
                                  Code:
                                  gmm(X1, lag(1 .) model(diff)) gmm(X2, lag(2 .) model(diff)) gmm(X1, diff lag(0 0) model(level)) gmm(X2, diff lag(1 1) model(level))
                                Dear Prof. Sebastian,

                                Thanks a lot once again for your crystal clear answers. I have the following query regarding your response in Point #4.

                                In the command mentioned by you (reproduced below), what is the significance of writing 'diff'? What would be the implication if we do not write it?
                                gmm(X1, diff lag(0 0) model(level)) gmm(X2, diff lag(1 1) model(level))

                                Comment

                                Working...
                                X