Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtabond2 - is it ok to have significant ar(2) and insignificant ar(3) if the gmmstyle lags are limited to third order (3 .)?

    Hello friends,

    I have read Roodman (2009) and some posts here regarding the question, but I am not sure if I understood correctly.

    If, when performing system GMM using the xtabond2 command, I restrict the gmmstyle lags to 3-6 order lags, does that mean that an insignificant AB autocorrelation test of order 3 suffice. I.e. is it okay if AR(2) rejects the null but AR(3) does not?

    Thanks,
    Gal

    This is the xtabond2 command line
    Code:
    xtabond2 per_ma3_TFP $finC $nonfinC $convar yr* , gmm($finC $nonfinC per_ma3_gov_exp_GDP per_ma3_cpi_inflation per_logGDP_pc, coll  laglimits(3 6)) iv(yr* per_ma3_trade_GDP  per_ma3_yrs_sch per_ma3_IRCG  per_ma3_trade_GDP) twostep small r artests(3)
    And this is the output:
    Code:
    .                 xtabond2 per_ma3_TFP $finC $nonfinC $convar yr* , gmm($finC $nonfinC per_ma3_gov_exp_GDP per_ma3_cpi_infl
    > ation per_logGDP_pc, coll  laglimits(3 6)) iv(yr* per_ma3_trade_GDP  per_ma3_yrs_sch per_ma3_IRCG  per_ma3_trade_GDP) two
    > step small r artests(3)
    Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
    yr2013 dropped due to collinearity
    Warning: Two-step estimated covariance matrix of moments is singular.
      Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
      Difference-in-Sargan/Hansen statistics may be negative.
    
    Dynamic panel-data estimation, two-step system GMM
    ------------------------------------------------------------------------------
    Group variable: ID                              Number of obs      =       328
    Time variable : year                            Number of groups   =        58
    Number of instruments = 47                      Obs per group: min =         2
    F(18, 57)     =   2153.89                                      avg =      5.66
    Prob > F      =     0.000                                      max =         9
    ----------------------------------------------------------------------------------------
                           |              Corrected
               per_ma3_TFP |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
       per_c_ma3_fin_stock |  -.0000648   .0007612    -0.09   0.932    -.0015892    .0014595
        per_c_ma3_fin_flow |   .0003164   .0042521     0.07   0.941    -.0081983    .0088311
    per_c_ma3_nonfin_stock |   .0009372   .0007619     1.23   0.224    -.0005885    .0024628
     per_c_ma3_nonfin_flow |   .0029903   .0023584     1.27   0.210    -.0017323    .0077129
           per_ma3_yrs_sch |  -.0090489   .0106553    -0.85   0.399    -.0303858     .012288
              per_ma3_IRCG |  -.0007388    .003128    -0.24   0.814    -.0070026     .005525
     per_ma3_cpi_inflation |  -.0008791     .00289    -0.30   0.762    -.0066662     .004908
       per_ma3_gov_exp_GDP |  -.0045448   .0064397    -0.71   0.483    -.0174401    .0083505
         per_ma3_trade_GDP |   -.000299   .0002061    -1.45   0.152    -.0007117    .0001136
             per_logGDP_pc |   .0408159   .0410099     1.00   0.324    -.0413049    .1229368
                    yr1992 |  -.0753124   .0527374    -1.43   0.159    -.1809172    .0302924
                    yr1995 |  -.0561759   .0322474    -1.74   0.087    -.1207503    .0083985
                    yr1998 |   -.045674   .0403163    -1.13   0.262    -.1264061     .035058
                    yr2001 |  -.0322637   .0388948    -0.83   0.410    -.1101493    .0456218
                    yr2004 |   .0057429   .0476757     0.12   0.905    -.0897261    .1012119
                    yr2007 |   .0118822   .0280837     0.42   0.674    -.0443545    .0681189
                    yr2010 |  -.0082215   .0137538    -0.60   0.552     -.035763    .0193199
                    yr2016 |   .0163729   .0068277     2.40   0.020     .0027005    .0300452
                     _cons |   .7881233   .1980452     3.98   0.000     .3915447    1.184702
    ----------------------------------------------------------------------------------------
    Instruments for first differences equation
      Standard
        D.(yr1992 yr1995 yr1998 yr2001 yr2004 yr2007 yr2010 yr2013 yr2016
        per_ma3_trade_GDP per_ma3_yrs_sch per_ma3_IRCG per_ma3_trade_GDP)
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        L(3/6).(per_c_ma3_fin_stock per_c_ma3_fin_flow per_c_ma3_nonfin_stock
        per_c_ma3_nonfin_flow per_ma3_gov_exp_GDP per_ma3_cpi_inflation
        per_logGDP_pc) collapsed
    Instruments for levels equation
      Standard
        yr1992 yr1995 yr1998 yr2001 yr2004 yr2007 yr2010 yr2013 yr2016
        per_ma3_trade_GDP per_ma3_yrs_sch per_ma3_IRCG per_ma3_trade_GDP
        _cons
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        DL2.(per_c_ma3_fin_stock per_c_ma3_fin_flow per_c_ma3_nonfin_stock
        per_c_ma3_nonfin_flow per_ma3_gov_exp_GDP per_ma3_cpi_inflation
        per_logGDP_pc) collapsed
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z =   2.56  Pr > z =  0.010
    Arellano-Bond test for AR(2) in first differences: z =   2.17  Pr > z =  0.030
    Arellano-Bond test for AR(3) in first differences: z =   0.90  Pr > z =  0.368
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(28)   = 112.22  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(28)   =  27.90  Prob > chi2 =  0.470
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(21)   =  18.78  Prob > chi2 =  0.599
        Difference (null H = exogenous): chi2(7)    =   9.11  Prob > chi2 =  0.245

  • #2
    Taken at face value, yes; if you treat the variables in your gmm() option as endogenous and start only with lag 3, then a significant AR(2) test would not pose a problem.

    As an aside, all the variables specified in your iv() option must be assumed to be uncorrelated with the unobserved group-specific effects. This is essentially a random-effects assumption, which often is hard to justify.
    Furthermore, notice that iv(varlist) without the eq() suboption is not equivalent to the combination iv(varlist, eq(diff)) iv(varlist, eq(level)). If this surprises you, then you probably want the second specification and should explicitly specify the eq() suboptions.

    The following presentation might be helpful as well:
    https://twitter.com/Kripfganz

    Comment


    • #3
      Thanks so much for the helpful comments.

      I will apply the equation option. Regarding the random effect issue, I want to see that I understand correctly. Given that I cannot justify such assumption (I'll have to think of that), should I include them in the gmm as predetermined variables? which also means that unless I restrict them to lag 3 as well, the ar(2) result is important. Correct? Or should I just drop them?

      Comment


      • #4
        I assume those variables are varying over time. There are different ways how you can treat them without dropping them (which I would not recommend). If you continue to assume that they are strictly exogenous with regard to the idiosyncratic error component, you could use a gmm() option starting with lag 0 or you could use the iv() option with eq(diff) only. The AR() results are irrelevant for strictly exogenous variables.

        If you instead want to treat them as predetermined, you would normally use the gmm() option starting with lag 1. If the AR(2) test is significant, but not the AR(3) test, you would need to start at lag 2.
        https://twitter.com/Kripfganz

        Comment


        • #5
          And by using iv( varlist , e(d)) it should resemble a fixed affect assumption?

          Comment


          • #6
            Kind of, yes, because the group-specific effects drop out in the first-differenced model.

            If you want to mimic the actual fixed-effects estimator (as in xtreg, fe) for those variables by using instruments in deviations from their within-group means, you can use my xtdpdgmm command:
            Code:
            xtdpdgmm per_ma3_TFP $finC $nonfinC $convar, teffects gmm($finC $nonfinC per_ma3_gov_exp_GDP per_ma3_cpi_inflation per_logGDP_pc, coll lag(3 6) m(diff)) gmm($finC $nonfinC per_ma3_gov_exp_GDP per_ma3_cpi_inflation per_logGDP_pc, coll lag(2 2) diff m(level)) iv(per_ma3_trade_GDP per_ma3_yrs_sch per_ma3_IRCG  per_ma3_trade_GDP, m(mdev)) twostep small vce(robust)
            The two gmm() options of xtdpdgmm replicate the gmm() option of xtabond2.
            https://twitter.com/Kripfganz

            Comment


            • #7
              Hello Professor Sebastian, I have a similar issue in my GMM estimation, where AR(2) is significant whereas subsequent AR(3 and more) is insignificant. So if I understand your above suggestion correctly, this can be handled by starting lag for lag-dependent and endogenous variables from 3 and predetermined from lag 2. And can we directly show all three AR (1,2 and 3) in the research paper . Further, i have tried to find in the literature if there is any paper that has reported in the same way but I couldn't find any.

              Thank you

              Comment


              • #8
                Hello @Sebastian Kripfganz, I have a similar issue in my GMM estimation, where AR(2) is significant whereas subsequent AR(3) and more are insignificant. So if I understand your above suggestion correctly, this can be handled by instrumenting the lag-dependent variables and endogenous variables from 3rd lag and predetermined variables from 2nd lag. In several places, you suggest using the higher-order lag of the dependent variable as a regressor.

                My questions are as follows:

                1). How do you justify the use of higher order in a research paper? I haven't come across any paper where, in DPD settings, a second or more lag is used as a regressor. Can we say something like, " To address the 2nd and higher order serial correlation in the error term, the 2nd lag of the dependent variable is introduced"?

                2) In my case, the use of second lag makes the AR(2) insignificant, but my coefficient on second lag is also insignificant. Is it still okay to use second lag as a regressor?

                3) Further, can we directly show all three AR (1, 2, and 3) in the research paper?

                Thank you

                Comment


                • #9
                  1. Serial correlation in the error term is often a sign of misspecified dynamics. Adding a second lag of a dependent variable or adding lags of the independent variables as regressors aims to obtain a dynamically complete model, where all the dynamic effects are captured by the right-hand side variables.(Note: This is different from using higher-order lags as instruments. If there is evidence of second-order serial correlation in the first-differenced errors but no higher-order serial correlation, then the third lag onwards of the dependent variable qualifies as a valid instrument. However, this does not address the potential misspecification of the model dynamics and could lead to weak-instruments problems if those higher-order lags are insufficiently correlated with the regressors.)
                  2. Even though the second lag is statistically insignificant, it could sometimes still help to account for unexplained serial correlation. Note: Statistical insignificance does not imply that the coefficient is equal to zero; there is just not enough statistical evidence to rule this out.
                  3. There is nothing wrong with showing the results from several serial correlation tests if it is helpful for your argumentation.
                  https://twitter.com/Kripfganz

                  Comment


                  • #10
                    Thank you so much, @Sebastian Kripfganz sir, for your invaluable help. I have a follow-up question. During my literature search, I encountered a paper where the author justified the inclusion of the second lag of the dependent variable as a regressor to address its higher persistence. Additionally, his results supported this with a statistically significant second lag coefficient.
                    In my research paper, what justification could I provide for incorporating a second lag? I noticed in your presentation at the London Stata Conference (slide 90) that you mentioned: “higher-order lags of the dependent variable, yi,t−2, yi,t−3, and the other regressors, xi,t−1, xi,t−2, might have predictive power and could help to prevent serial correlation of the error term uit when included as regressors”. Would it be appropriate for me to cite this?

                    Comment


                    • #11
                      Sure, you can use this argument to justify the inclusion of a second lag.
                      https://twitter.com/Kripfganz

                      Comment


                      • #12
                        Great, thanks a lot!

                        Comment


                        • #13
                          Hello @Sebastian Kripfganz sir, based on your above suggestion I have used the following model to estimate system GMM with an orthogonal option.

                          1.
                          Could you please help me decide if it is appropriate or not?
                          xtabond2 INT L.INT L2.INT ROA LNTA LnTASq DivNIITI MQOETA HHITAFinal Inflation GDP GC CO PV PB c.GC#c.PB c.GC#c.PV c.GC#c.LNTA c.CO#c.PB c.CO#c.PV c.CO#c.LNTA , gmm(INT, lag (1 3) eq (level)) gmm(INT, lag (2 4) collapse eq (diff)) gmm (ROA DivNIITI MQOETA LNTA LnTASq, lag(2 4) collapse ) iv(HHITAFinal Inflation GDP GC CO PV PB c.GC#c.PB c.GC#c.PV c.GC#c.LNTA c.CO#c.PB c.CO#c.PV c.CO#c.LNTA, eq(level)) twostep robust small orthogonal artests(3)


                          Where :

                          INT is a dependent variable (predetermined)
                          ROA LNTA LnTASq DivNIITI MQOETA are firm-specific variables considered endogenous.
                          HHITAFinal Inflation GDP are macroeconomic variables considered exogenous.
                          GC is a dummy variable for two years (8th and 9th year where it takes the value 1 otherwise 0)
                          CO is a one-year dummy (21st year of data)
                          PV is a dummy for Private sector firm
                          PB is a dummy for the Public sector firm


                          By the way my data is unbalanced panel data of 23 years.

                          2. Sir what is the logic of starting one lag before in level and FOD equation compared to the difference equation for lag dependent variable, predetermined and endogenous variable?

                          Let's say that the level form of the GMM equation is Yit = Yi,t-1 + Xit + Uit. Then how the Lag(1) of Y as an instrument for the level equation will look like? Will it be (Yit - Yi,t-1) or (Yi,t-1 - Yi,t-2)?

                          3. Is it because in the level equation the error term (Uit) is in level form, and the first lag of Yit is in difference (Yit - Yi,t-1)? And since (Y) is a predetermined variable none of the expressions in the (Yit - Yi,t-1) will correlate with Uit?

                          4. Furthermore, is it okay to have the same lag length for the difference/FOD equation and the level equation? Or is there a criterion to decide this?
                          Last edited by Minhaj uddin; 01 May 2024, 02:57.

                          Comment


                          • #14
                            Hello @Sebastian Kripfganz sir, based on your above suggestion, I have used the following model to estimate system GMM with an orthogonal option.

                            1.)
                            Could you please help me decide if it is appropriate or not?

                            xtabond2 INT L.INT L2.INT ROA LNTA LnTASq DivNIITI MQOETA HHITAFinal Inflation GDP GC CO PV PB c.GC#c.PB c.GC#c.PV c.GC#c.LNTA c.CO#c.PB c.CO#c.PV c.CO#c.LNTA , gmm(INT, lag (1 3) eq (level)) gmm(INT, lag (2 4) collapse eq (diff)) gmm (ROA DivNIITI MQOETA LNTA LnTASq, lag(2 4) collapse ) iv(HHITAFinal Inflation GDP GC CO PV PB c.GC#c.PB c.GC#c.PV c.GC#c.LNTA c.CO#c.PB c.CO#c.PV c.CO#c.LNTA, eq(level)) twostep robust small orthogonal artests(3)


                            Where :

                            INT is a dependent variable (predetermined)
                            ROA LNTA LnTASq DivNIITI MQOETA are firm-specific variables considered endogenous.
                            HHITAFinal Inflation GDP are macroeconomic variables considered exogenous.
                            GC is a dummy variable for two years (8th and 9th year, where it takes the value 1; and 0 otherwise)
                            CO is a one-year dummy (21st year of data)
                            PV is a dummy for Private sector firm
                            PB is a dummy for the Public sector firm


                            By the way my data is unbalanced panel data of 23 years.

                            2.) Sir what is the logic of starting one lag before in level and FOD equation compared to the difference equation for lag dependent variable, predetermined and endogenous variable?

                            Let's say that the level form of the GMM equation is Yit = Yi,t-1 + Xit + Uit. Then how the Lag(1) of Y as an instrument for the level equation will look like? Will it be (Yit - Yi,t-1) or (Yi,t-1 - Yi,t-2)?

                            3. Is it because in the level equation the error term (Uit) is in level form, and the first lag of Yit is in difference (Yit - Yi,t-1)? And since (Y) is a predetermined variable none of the expressions in (Yit-Yi,t-1) will correlate with Uit?

                            4. Furthermore, is it okay to have the same lag length for the difference/FOD equation and the level equation? Or is there a criterion to decide this?

                            5. Regarding 1st point of post#9 you mentioned that

                            "Serial correlation in the error term is often a sign of misspecified dynamics. Adding a second lag of a dependent variable or adding lags of the independent variables as regressors aims to obtain a dynamically complete model, where all the dynamic effects are captured by the right-hand side variables.(Note: This is different from using higher-order lags as instruments. If there is evidence of second-order serial correlation in the first-differenced errors but no higher-order serial correlation, then the third lag onwards of the dependent variable qualifies as a valid instrument. However, this does not address the potential misspecification of the model dynamics and could lead to weak-instruments problems if those higher-order lags are insufficiently correlated with the regressors.) "

                            5 B) "Will adding a second lag of a dependent variable or lags of the independent variables take care of misspecification in the model dynamics?"

                            5 A) As mentioned by you above, if there's a second-order serial correlation in the first-differenced errors but no higher-order serial correlation, then the third lag onwards of the dependent variable qualifies as a valid instrument. Does this mean that for my model, the instruments for the dependent variable should be "gmm(INT, lag (3 5) eq (level))and gmm(INT, lag (3 5) collapse eq (diff).

                            Comment


                            • #15
                              Please ignore the second last post (#13) as I have updated a few more questions in the last post (#14).

                              Sorry for adding more than one post here. I tried to edit and add this to the last post, but it didn't work.


                              1. My question is if serial correlation for higher-order orders becomes inconsistent and significantly varies as in the following case, what could be the reason for that?

                              Arellano-Bond test for AR(1) in first differences: z = -6.42 Pr > z = 0.000
                              Arellano-Bond test for AR(2) in first differences: z = -0.78 Pr > z = 0.436
                              Arellano-Bond test for AR(3) in first differences: z = -0.60 Pr > z = 0.546
                              Arellano-Bond test for AR(4) in first differences: z = 1.68 Pr > z = 0.093
                              Arellano-Bond test for AR(5) in first differences: z = -2.65 Pr > z = 0.008
                              Arellano-Bond test for AR(6) in first differences: z = 1.61 Pr > z = 0.107
                              Arellano-Bond test for AR(7) in first differences: z = 1.83 Pr > z = 0.067
                              Arellano-Bond test for AR(8) in first differences: z = -2.89 Pr > z = 0.004

                              2. In the following test, do we need both the excluding group and the difference to be insignificant?

                              Difference-in-Hansen tests of exogeneity of instrument subsets:

                              GMM instruments for levels
                              Hansen test excluding group: chi2(5) = 11.44 Prob > chi2 = 0.043
                              Difference (null H = exogenous): chi2(67) = 72.25 Prob > chi2 = 0.309
                              gmm(INT, eq(level) lag(1 3))
                              Hansen test excluding group: chi2(9) = 29.19 Prob > chi2 = 0.001
                              Difference (null H = exogenous): chi2(63) = 54.50 Prob > chi2 = 0.769
                              gmm(INT, collapse eq(diff) lag(2 4))
                              Hansen test excluding group: chi2(69) = 81.85 Prob > chi2 = 0.138
                              Difference (null H = exogenous): chi2(3) = 1.84 Prob > chi2 = 0.606
                              gmm(ROA DivNIITI MQOETA LNTA, collapse lag(2 4))
                              Hansen test excluding group: chi2(56) = 68.74 Prob > chi2 = 0.118
                              Difference (null H = exogenous): chi2(16) = 14.95 Prob > chi2 = 0.528


                              Thank you!

                              Comment

                              Working...
                              X