Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • System GMM Replication using, e.g., xtdpdgmm

    I am trying to replicate a system GMM model from a paper (free version), e.g., using xtdpdgmm. Since I am new to this kind of models, I want to make sure that my implementation and understanding of the syntax is correct. The following additional information is given:
    • Panel B (below) reports the results of the System GMM estimates [...]. The reported number of observations refers to the level equation
    • The independent variables are not lagged, as it is assumed that the dataset has already been merged with the lagged independent variables
    • They use time effects, thus I have specified teffects
    • They set the lag order based on the BIC for the dependent and independent variables
    • For the independent variables, it is determined that they should be included with a one year lag (i.e.,I set the lag to (1 1))
    • For the dependent variable, I keep it with all lags (i.e., lag(2 .)) as this would depend on the BIC
    Code:
    xtset id time
    xtdpdgmm L(0/1) nplratio imr irb roae c_i e_ta gl_ta gdp un hpi, ///
    collapse ///
    gmm(nplratio, lag(2 .) model(diff)) ///
    gmm(nplratio, lag(2 .) model(level)) ///
    gmm(imr irb roae c_i e_ta gl_ta gdp un hpi, lag(1 1) model(diff)) ///
    gmm(imr irb roae c_i e_ta gl_ta gdp un hpi, lag(1 1) model(level)) ///
    teffects two vce(robust)
    The model looks in principle like this:

    Click image for larger version

Name:	model.JPG
Views:	1
Size:	25.7 KB
ID:	1763837

    Is my understanding of this procedure and the corresponding syntax correct?

  • #2
    The referenced paper is another unfortunate example of an article which does not provide sufficient information to replicate their analysis. They do not mention anything about the lags used for the instruments, and do not report the total number of instruments. They also do not mention whether the regressors are assumed to be strictly exogenous or predetermined, and whether any collapsing was applied. The BIC is only used to select the lag order for the regressors, not the instruments. You would need to reach out to the authors and ask for their replication code.
    https://www.kripfganz.de/stata/

    Comment


    • #3
      Dear Sebastian Kripfganz, thank you very much for your insights, this is very useful. Let's assume that via BIC it was determined that both the depended and the independent variables are entering with one lag, the regressors are predetermined, and collapsing is applied. Thus, two remaining questions:
      Is the code that I provided fine in a sense that I specify both a 'diff' and a 'level' model inside gmm(). Then the remaining issue would be with how many lags the instruments are entering, is there any guiding principle to decide on an adequate lag range if including all instruments is not an option because of the sample size.

      Comment


      • #4
        For the level model, you want to use first-differenced instruments by adding the difference suboption, i.e.
        Code:
        gmm(nplratio, lag(2 .) model(level) difference)
        gmm(imr irb roae c_i e_ta gl_ta gdp un hpi, lag(1 .) model(level) difference)
        There is only limited guidance about how to restrict the lag length for the instruments. You do not want to use too few because you would lose identification strength. You do not want to use too many because of overfitting concerns and because long lags become weak instruments. I would usually try going back to lag 4 or 5, or so. If you can show that the results are fairly stable for different choices, that would be ideal.
        https://www.kripfganz.de/stata/

        Comment


        • #5
          Thank you very much Sebastian Kripfganz for pointing this suboption out to correctly specify the model. One last thought on the instruments, am I right in assuming that the lag structure should generally be the same between the dependent variable and the independent variables. Consequently, I would specify for the DV lag(2 5) and for the IV lag(1 4) accordingly to get 4 lags each. If this is note the case, I would assume that this likely would need to be guided by some clear theoretical/economic reasoning.

          Comment


          • #6
            Using the same number of lags is a reasonable approach. Similarly, it would be reasonable to use the same maximum lag; i.e. lag(2 4) for the DV and lag(1 4) for the IV. Any other approach would smell like p-hacking and require more careful justification.
            https://www.kripfganz.de/stata/

            Comment

            Working...
            X