Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dynamic Panel data using xtabond2 and/or xtdpdgmm

    Dear Statalists,
    I am using panel data of 120 countries from 1990 – 2020, some countries have missing data on some variables. I want to estimate system GMM estimator either via Stata user written command xtabond2 or xtdpdgmm. My model is:
    Yit=aYit-1 + bRit-1 + cXit-1 + dZit + Vi + St + Eit
    In this model, Yit-1, Rit-1 and Xit-1 should be treated endogenous in the command.
    I want to use 2 period lags and 1 period lags for the diff and level equation.
    Question:
    (1). Could you please help me with the exact syntax to perform system GMM?
    (2). As a further check I would like to conduct the same analysis by first creating a five-year non-overlapping data, taking values every five years, and five years averaging all my variables. How should the code at (1) should change? My confusion is the level and lag values of a variable could be the same when conducting the five years average.
    Please help, I have tried to read the help files, but still couldn’t get the different terminologies involved.

    Thank you in advance,

  • #2
    1. Please have a look at the examples listed in the xtdpdgmm help file and the presentation referenced below. Also note that the latest version of the xtdpdgmm package contains the new xtdpdgmmfe command, which has a simpler syntax for specifying such models: It allows you to simply specify which variables are endogenous etc. Again, see the help file for details as well as the following Statalist topic: https://www.statalist.org/forums/for...84#post1675484
    2. A lagged variable for data with five-year averages obviously has a different implication than a lagged variable with annual data. Whether one or the other is appropriate is something you might find in the related literature of your field. Personally, I am not a big fan of taking five-year averages.

    More on GMM estimation of linear dynamic panel data models:
    https://www.kripfganz.de/stata/

    Comment


    • #3
      ello Sebastian,
      Thank you a lot. I have looked at your materials and answers to others in this forum. In the first place, my understanding is really poor, specially because I have seen different answers which are in different way. When lagged variables are considered endogenous, I am uncertain how they should be specified in the gmm option. Can you tell from the example below If they are equivalent?

      To perform system GMM:

      xtabond2 L.(0/1).Yit L.(Rit-1 Xit-1 ) Zit year*, ///
      gmm(Yit Rit Xit, lag(2 2) collapse eq(diff)) ///
      gmm(Yit Rit Xit, lag(1 1) collapse eq(level)) ///
      iv(Zit year*, eq(level)) twostep robust nodiffsargan
      }


      xtdpdgmm L.(0/1).Yit L.(Rit-1 Xit-1 ) Zit year*, ///
      model(diff) gmm(Yit Rit Xit, lag(2 2) collapse) iv(Zit year*, diff) small noconstant vce(robust) twostep
      }

      Are the two equivalent?

      I really appreciate your feedback!

      BW

      Comment


      • #4
        The following two specifications should be equivalent:
        Code:
        xtabond2 L(0/1).Y L.(R X) Z year*, ///
           gmm(Y R X, lag(2 2) collapse eq(diff)) ///
           gmm(Y R X, lag(1 1) collapse eq(level)) ///
           iv(Z year*, eq(level)) twostep robust nodiffsargan
        
        xtdpdgmm L(0/1).Y L.(R X) Z year*, model(diff) ///
           gmm(Y R X, lag(2 2) collapse) ///
           gmm(Y R X, diff lag(1 1) collapse model(level)) ///
           iv(Z year*, model(level)) twostep vce(robust)
        https://www.kripfganz.de/stata/

        Comment


        • #5
          Thank you so much for this very helpful syntax. It turns out similar. I picked xtdpdgmm and I noticed that the assumption on AR(2) doesn't hold.
          So, I added the second lag of the dependent variable in the model. I am uncertain if I should also add second lag of other endogenous variables.
          Does it make sense to add second lags of the endogenous variables too?
          Does the modification of the above code to account for this new change correct?
          New code:
          xtdpdgmm L(0/2).Y L.(R X) Z year*, model(diff) ///
          gmm(Y R X, lag(3 3) collapse) ///
          gmm(Y R X, diff lag(2 2) collapse model(level)) ///
          iv(Z year*, model(level)) twostep vce(robust)

          Comment


          • #6
            It can make sense to also add further lags of the other regressors.

            With the extra lags of the lagged dependent variable (and other regressors), you might already get satisfying AR(2) test results. In that case, you would not need to change the lag orders for the instruments.

            Furthermore, I would recommend to use further lags of the instruments; e.g., lag(2 3) or lag(2 4). This is especially important when you add further lags as regressors, because you might otherwise have insufficient instruments for all of the regressors.
            https://www.kripfganz.de/stata/

            Comment


            • #7
              Thank you so much really. Surprisingly the problem persist.

              To be clear I run: I have 120 countries, time 1990 - 2020

              xtdpdgmm L(0/2).MORTALITY L.(ihs_AID ihs_GDP fertility ihs_population) ihs_population_density Co2 age_dependency yr*, model(diff) ///
              gmm(MORTALITY ihs_AID ihs_GDP fertility ihs_population, lag(2 4) collapse) ///
              gmm(MORTALITY ihs_AID ihs_GDP fertility ihs_population, diff lag(1 1) collapse model(level)) ///
              iv(ihs_population_density Co2 age_dependency yr*, model(level)) twostep vce(robust)
              }

              IHS: inverse hyperbolic transformation
              Does this specification looks alright? would you suggest any improvement?
              Thank you in advance for your time.

              Comment


              • #8
                I am afraid it is difficult to give a general answer as to why you might still get unsatisfying results for your serial correlation test. Every data set is different.

                There could still be further omitted variables. You might try adding interaction terms between some of the variables, if that makes economic sense. You can of course also add further lags, but this approach would at some point run into the risk of overparameterization.
                https://www.kripfganz.de/stata/

                Comment


                • #9
                  Thanks A lot. Is it right to use lag(1 1) in the level equation when 2nd lagged of dep variable is added? Plus do you think the variables in the IV() looks right and not causing any problem?
                  As a side question, why is most of time - Sargan test of overid (reject HO) and Hansen test of overid (accept HO) leads me to different conclusion?

                  Kind regards
                  tg
                  Last edited by Tariku Getaneh; 16 Oct 2023, 06:07.

                  Comment


                  • #10
                    lag(1 1) for the level model is fine if there is no serial correlation. If there is evidence of serial correlation, then you would potentially need to use the second (or even third) lag instead, similar to starting with higher lags for the differenced model. Given that you are not using all of the lags for the differenced model, you could also try lag(1 2) [or lag(2 3) in case of serial correlation].

                    For two-step system GMM estimation, the Sargan test is asymptotically invalid because it uses an incorrect weighting matrix. You should just ignore it.
                    https://www.kripfganz.de/stata/

                    Comment


                    • #11
                      Thank you so much Sebastian.

                      BW

                      Comment

                      Working...
                      X