Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtabond2 and deeper lags

    Hello. I'm new to here. (Correct me if there is something I'm missing!)
    I have questions regarding the use of xtabond2.

    I ran a regression with xtabond2
    I am using a panel data with n= 7543 and T =9.

    The problem I'm having is, unless I use deeper lags as instruments, I cannot pass Hansen-test.
    (P-values are so close to 0.). The syntax I'm using for the System GMM is

    xtabond2 y .l.y x1 x2 gmm(y, (6,7)) gmm(x1, x2, lag(6,7)) iv(i.year) robust twostep

    However, even if I can pass Hasen-test, would these deeper lags constitute valid instruments?

    Thank you in advance!


  • #2
    If possible, please show us the output table with the estimation results (using CODE delimiters as explained in the FAQ #12.3).

    For the instruments, you would usually start with the second lag of the dependent variable and the first lag of the independent variables (or contemporaneous terms, depending on whether the variables are predetermined or strictly exogenous) instead of lag 6. Why did you choose the latter? If your variables are not very persistent, the sixth and seventh lag may not be strong instruments because they are only weekly correlated with the instrumented variables.
    https://twitter.com/Kripfganz

    Comment


    • #3
      Dear Sebastian

      Thank you for your answer.

      I am running the regression below in which total factor productivity (ltfp) is regressed on outsourcing intensity (routsales) and r&d intensity (rndva).
      I suspect both of the regressors are endogenous. So I used

      Code:
       xtabond2 ltfp l.ltfp routsales rndva i.year, iv(i.year) gmm(routsales rndva, lag(2 3)) gmm(ltfp, lag(2 3)) twostep robust artests(3)
      However, if I use second and third lags as instruments..
      I get the following

      Code:
      
      Sargan test of overid. restrictions: chi2(64)   = 172.90  Prob > chi2 =  0.000
        (Not robust, but not weakened by many instruments.)
      Hansen test of overid. restrictions: chi2(64)   = 128.38  Prob > chi2 =  0.000
        (Robust, but weakened by many instruments.)
      
      Difference-in-Hansen tests of exogeneity of instrument subsets:
        GMM instruments for levels
          Hansen test excluding group:     chi2(39)   =  82.00  Prob > chi2 =  0.000
          Difference (null H = exogenous): chi2(25)   =  46.37  Prob > chi2 =  0.006
        gmm(routsales rndva, lag(2 3))
          Hansen test excluding group:     chi2(19)   =  62.34  Prob > chi2 =  0.000
          Difference (null H = exogenous): chi2(45)   =  66.04  Prob > chi2 =  0.022
        gmm(ltfp, lag(2 3))
          Hansen test excluding group:     chi2(40)   =  41.81  Prob > chi2 =  0.392
          Difference (null H = exogenous): chi2(24)   =  86.57  Prob > chi2 =  0.000
        iv(2006b.year 2007.year 2008.year 2009.year 2010.year 2011.year 2012.year 2013.year 2014.year
      >  2015.year)
          Hansen test excluding group:     chi2(56)   =  96.96  Prob > chi2 =  0.001
          Difference (null H = exogenous): chi2(8)    =  31.42  Prob > chi2 =  0.000
      which I think does mean that my instruments are not valid. Only when I use 6th and 7th lags I get the following

      Code:
      Sargan test of overid. restrictions: chi2(28)   =  20.00  Prob > chi2 =  0.865
        (Not robust, but not weakened by many instruments.)
      Hansen test of overid. restrictions: chi2(28)   =  30.62  Prob > chi2 =  0.334
        (Robust, but weakened by many instruments.)
      
      Difference-in-Hansen tests of exogeneity of instrument subsets:
        GMM instruments for levels
          Hansen test excluding group:     chi2(14)   =   8.97  Prob > chi2 =  0.833
          Difference (null H = exogenous): chi2(14)   =  21.66  Prob > chi2 =  0.086
        gmm(routsales rndva, lag(6 7))
          Hansen test excluding group:     chi2(11)   =   9.14  Prob > chi2 =  0.609
          Difference (null H = exogenous): chi2(17)   =  21.48  Prob > chi2 =  0.205
        gmm(ltfp, lag(6 7))
          Hansen test excluding group:     chi2(12)   =  12.73  Prob > chi2 =  0.389
          Difference (null H = exogenous): chi2(16)   =  17.89  Prob > chi2 =  0.330
        iv(2006b.year 2007.year 2008.year 2009.year 2010.year 2011.year 2012.year 2013.year 2014.year
      >  2015.year)
          Hansen test excluding group:     chi2(20)   =  23.97  Prob > chi2 =  0.244
          Difference (null H = exogenous): chi2(8)    =   6.65  Prob > chi2 =  0.575
      As I'm new to this command and very little experience in panel data analysis, any advice would be really helpful
      Thank you in advance!

      Comment


      • #4
        There might be remaining serial correlation of the error term. Is the Arellano-Bond AR(2) test rejecting the null hypothesis of no second-order serial correlation of the first-differenced error term? In that case, it is probably a better idea to directly include further lags of the dependent (and/or independent) variable(s) as regressors.

        In addition, I highly recommend to use the suboption equation(level) for the time dummy instruments, that is iv(i.year, eq(level)). See the following topic for a discussion of this matter: System GMM - Time dummies

        Moreover, the degrees of freedom for the Hansen test might be incorrect if there are omitted variables (in particular, omitted categories of the time dummies). This is a bug in xtabond2. This can only be avoided if the time dummies are specified explicitly without the factor notation. See my command xtseqreg with its option teffects as an alternative to xtabond2: XTSEQREG: new Stata command for sequential / two-stage (GMM) estimation of linear panel models
        Last edited by Sebastian Kripfganz; 12 Jul 2017, 07:39. Reason: last paragraph added
        https://twitter.com/Kripfganz

        Comment


        • #5
          Thank you for your suggestion! Especially the command XTSEQREG (I'll have a look at it once I get the hang of XTABOND2)!

          Following your advice, I ran a regression with the following syntax

          Code:
          xtabond2 ltfp l.ltfp routsales rndva yr1-yr10, iv(yr1-yr10, eq(level)) gmm(routsales rndva, lag(2 3)) gmm(ltfp, lag(2 3)) twostep robust artests(3)
          And I'm adding a result for AR tests.

          Code:
          ------------------------------------------------------------------------------
          Arellano-Bond test for AR(1) in first differences: z = -42.15  Pr > z =  0.000
          Arellano-Bond test for AR(2) in first differences: z =   4.08  Pr > z =  0.000
          Arellano-Bond test for AR(3) in first differences: z =  -0.37  Pr > z =  0.715
          ------------------------------------------------------------------------------
          Sargan test of overid. restrictions: chi2(64)   = 171.66  Prob > chi2 =  0.000
            (Not robust, but not weakened by many instruments.)
          Hansen test of overid. restrictions: chi2(64)   = 115.93  Prob > chi2 =  0.000
            (Robust, but weakened by many instruments.)
          The null is rejected for AR(2). This may mean that I need to use further lags such as fourth or fifth as instruments.
          However, as I told you, including fourth or fifth lags as below does not fix the problem.

          Code:
          xtabond2 ltfp l.ltfp routsales rndva yr1-yr10, iv(yr1-yr10, eq(level)) gmm(routsales rndva, l
          > ag(4 5)) gmm(ltfp, lag(4 5)) twostep robust artests(3)
          Code:
          ------------------------------------------------------------------------------
          Arellano-Bond test for AR(1) in first differences: z =  -7.80  Pr > z =  0.000
          Arellano-Bond test for AR(2) in first differences: z =   4.15  Pr > z =  0.000
          Arellano-Bond test for AR(3) in first differences: z =  -0.13  Pr > z =  0.898
          ------------------------------------------------------------------------------
          Sargan test of overid. restrictions: chi2(46)   =  62.42  Prob > chi2 =  0.054
            (Not robust, but not weakened by many instruments.)
          Hansen test of overid. restrictions: chi2(46)   =  68.16  Prob > chi2 =  0.019
            (Robust, but weakened by many instruments.)
          Only from the sixth lag on! :P
          However, sixth and seventh lags would make a poor instrument in terms of validity, won't they?
          As time dummies are specified explicitly, there would be no problem with the calculation of degrees of freedom, as you have noted above?

          Thank you in advance!

          Comment


          • #6
            What about adding further lags of the dependent variable, e.g,
            Code:
            xtabond2 ltfp L.ltfp L2.ltfp routsales rndva yr2-yr10, iv(yr2-yr10, eq(level)) gmm(routsales rndva, lag(2 3)) gmm(ltfp, lag(2 3)) twostep robust artests(3)
            (Notice that this implies that you have to remove one of the time dummies.)

            If none of the coefficients in the regression output is labelled as "omitted" or "empty", then the degrees of freedom should be fine in your case.

            At a later stage, it might be worth to use not just two lags (2 and 3) but a couple of more or even all of them, given that your T is really small relative to N such that instrument proliferation is less of an issue.
            https://twitter.com/Kripfganz

            Comment


            • #7
              Thank you so much for your kind explanation!

              It has been very helpful! :D

              Comment


              • #8
                Dear Sebastian

                Can I ask you one more question?
                I did what you have suggested (including the second lag of dep variable) and I also thought that with a large number of observations N, I thought I can use all deeper lags as instruments. Thus, I used,

                Code:
                xtabond2 ltfp l.ltfp l2.ltfp routsales rndva yr2-yr10, iv(yr2-yr10, eq(level)) gmm(routsales rndva, lag(3 .)) gmm(ltfp, lag(3 .)) twostep robust artests(3)
                However, the result does not look good as it shows

                Code:
                ------------------------------------------------------------------------------
                Arellano-Bond test for AR(1) in first differences: z =  -8.78  Pr > z =  0.000
                Arellano-Bond test for AR(2) in first differences: z =   3.50  Pr > z =  0.000
                Arellano-Bond test for AR(3) in first differences: z =   0.79  Pr > z =  0.430
                ------------------------------------------------------------------------------
                Sargan test of overid. restrictions: chi2(99)   = 127.68  Prob > chi2 =  0.028
                  (Not robust, but not weakened by many instruments.)
                Hansen test of overid. restrictions: chi2(99)   = 156.03  Prob > chi2 =  0.000
                  (Robust, but weakened by many instruments.)
                According to Hansen test, it may suggest that using all the lags as IV is not a good idea as they do not make a valid instrument?
                Am I correct in this reasoning?

                Thank you in advance! :D

                Comment


                • #9
                  It is indeed probably still a good idea to restrict the number of lags used as instruments. Probably some solution between two lags and all lags is most reasonable. The collapse suboption might also be a good idea, at least will not do much harm.

                  Yet, this probably will not help much to get rid of the serial error correlation. If that is possible for you, it might be good to think about adding additional variables that could matter. Such omitted variables can easily be the source of serial error correlation. The Hansen test is just a general test for model misspecification, but there can be many ways of such misspecification.
                  https://twitter.com/Kripfganz

                  Comment


                  • #10
                    Hello,

                    I'm working on a dynamic model (via xtabond2) dedicated to bank risk according to interest rates (with controls at the bank and country level). To check robustness, I compare models using different rates. Is it a mistake to use different lags (for the interest variable and/or control variables) once changing the type of rate ?
                    My goal here is to optimize the results of Arellano-Bond (1) and (2) as well as Sargan and Hansen tests from one model to another as I'm keeping the same dependent and control variables.

                    Thank you

                    Comment

                    Working...
                    X