Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • why xtabond and xtdpdsys produce the same results (both the coefficient and standard error) for certain variables?

    Dear all,

    I am implementing xtabond and xtdpdsys on a panel data set. But then the results for certain variables are exactly same. xtdpdsys uses level equations in addition too differened equations and thus the results shouldn't be same. couldn't really figure out why I got the same results? confusing, and any help would be really appreciated.


    Jian

  • #2
    We would need to have more information about your setup to give you a helpful answer. Could you please provide the exact command syntax as you typed it in Stata for both commands, and also provide some information about the variables in your model?
    Last edited by Sebastian Kripfganz; 14 Nov 2014, 18:58.
    https://twitter.com/Kripfganz

    Comment


    • #3
      Thanks. This is confusing, not sure if some folks also have the problem.

      The syntax for xtabond I used:

      xtabond z_score age gender majority tempxxx, nocons lag(1)

      the syntax for xtdpdsys I used:

      xtdpdsys z_score age gender majority tempxxx

      The data is a student level score panel data set, unbalanced with at most five semesters for students. So a student has at most five periods. Age is the student's age. Gender, majority and tempxxx are time invariant student specific variables such as gender, ethnicity etc. I was trying to post a small subset of the data so that you can clearly see the problem. but don't know why the forum doesn't allow me to do so.

      When running the two commands, what I got is that the coefficient and standard error for variable age are EXACTLY same for the two commands despite the fact xtdpdsys uses the extra moment condition for the level equation. This is confusing. My guess is that this probably has something to do with the time invariant student specific variables: gender majority and tempxxx. These are dropped for the differenced equations. But didn't know the reason. any insights would be really appreciated.

      Best,
      Jian

      Comment


      • #4
        And for the two commands, if we drop majority and tempxxx variables from the two commands, the estimated coefficient and standard error of age differ.

        Comment


        • #5
          I do not see any econometric reason why the two coefficients should be identical. However, in empirical applications it can happen just by chance that the coefficients and standard errors coincide up to the number of displayed digits. Do you get the same effect if you restrict your sample to a subsample of your data (drop one period or drop a couple of individuals)? I do not think that there is something to worry about.
          https://twitter.com/Kripfganz

          Comment


          • #6
            Yes, the coefficient and standard error are also same not because of the digits displayed even if I restrict the data to a small sample.

            Comment


            • #7
              I was thinking this may have something to do with the instrument matrix and the covariate matrix that are used to calculate beita. When xtabond and xtdpdsys produce the same results for certain variables, this means the matrices used for calculating Beita are partially same.

              Comment


              • #8
                But couldn't really figure it out mathematically.

                Comment


                • #9
                  The coefficient and standard error for the lagged score, which is the key explanatory variable, are also same. This means that xtdpdsys doesn't improve the estimation at all even with additional moment conditions for level equations. This is wired and confusing.

                  Comment


                  • #10
                    Are you able to replicate this with a publicly available data set?

                    The following example has similar characteristics to yours. exp is linearly growing, as is age, and fem and blk are time-invariant.
                    Code:
                    webuse psidextract
                    xtabond lwage exp fem blk, nocons lag(1)
                    xtdpdsys lwage exp fem blk
                    The coefficients and standard errors are different in both cases.

                    Could you please describe a bit further the properties of your dependent variable.
                    https://twitter.com/Kripfganz

                    Comment


                    • #11
                      Thanks, Sebastian! I would like to post the data here, but don't know why the forum does not allow me to do so. If you don't mind, I could send the data set and the command syntax and the output to your email address (I have looked up your email address on your website). Thanks!

                      Comment


                      • #12
                        Feel free to do so. I will have a look at the data and try if I can figure out what is happening.
                        https://twitter.com/Kripfganz

                        Comment


                        • #13
                          And for the data set you used, each person has seven periods (t=1 -7). I arbitrarily generate another individual level time invariant variable ms1:

                          bysort id: egen tempxxx=sum(ms)
                          gen ms1=tempxxx!=0

                          then I run the following two syntax:

                          xtabond lwage exp fem blk ms1, nocons lag(1)

                          xtdpdsys lwage exp fem blk ms1

                          The results are different. But when I restrict the time periods to t=1 - 5 (that is only each individual has 5 periods. This has the same time length as my data set), I run the above two syntax and got the exactly same coefficient estimate for variables lagged lwage and exp.




                          Comment


                          • #14
                            Thanks for sending me the data. I figured out what the problem is, and it reminded me why I never use the command xtdpdsys.

                            To take the punchlines ahead:
                            1) Never use xtdpdsys with time-invariant regressors! (Coefficient estimates for time-invariant regressors are "spurious"!)
                            2) Never use xtabond with time-invariant regressors! (Coefficient estimates are fine but finite-sample adjustments for standard errors are based on the wrong number of regressors => bug!)

                            (Instead use xtdpd or xtabond2, as explained below.)

                            The command xtdpdsys uses first differences of predetermined or endogenous variables (not the strictly exogenous variables as age in your example) as instruments in the level equation based on the assumption that all regressors are correlated with the unobserved individual-specific effects. (You can check the used instruments below the regression output.) But the first differences of the time-invariant regressors are just zero and drop out. Therefore, the "identification" for the time-invariant regressors technically exclusively relies on the first differences of the lagged dependent variable used as instruments for the level equation. But then, these level instruments cannot be used any more to identify the coefficients of the time-varying regressors if their number is less or equal to the number of time-invariant regressors (exact or under-identification of the latter). Hence, the coefficients of time-varying regressors are exclusively identified based on the first-differenced equation.

                            This is essentially a consequence of Proposition 2 in my paper with Claudia Schwarz on time-invariant regressors that you can find on my website:
                            "Estimation of Linear Dynamic Panel Data Models with Time-Invariant Regressors"

                            And this is also a reason why I do not like the "black box" command xtdpdsys. While you still get coefficient estimates for your time-invariant regressors (as long as you have sufficiently many instruments for the level equation), these estimates result from a "spurious" finite-sample correlation between the first differences of time-varying regressors and the time-invariant regressors. This correlation is at best very weak because it is usually hard to justify that there is such a correlation in the population.

                            If you are willing to assume that your time-invariant regressors are uncorrelated with the unobserved individual-specific effects, you can use them as an instrument for themselves. In Stata, you would need to use the xtdpd command to do this. In fact, both xtabond and xtdpdsys are just interfaces that generate instruments in a prespecified way and pass on the input to xtdpd to perform the calculations. I therefore recommend to use only xtdpd for full syntax control (or xtabond2 by David Roodman).

                            The following syntax is equivalent for difference-GMM estimation:
                            Code:
                            xtabond z_score age, nocons
                            xtdpd z_score L.z_score age, dgmmiv(z_score) div(age) nocons
                            Note: Do NOT specify time-invariant regressors in the syntax for xtabond. Even though they are omitted in the calculation of the coefficients, Stata stills counts them for the finite-sample adjustment of the standard errors (which I would call a bug!)

                            The following syntax is equivalent for system-GMM estimation:
                            Code:
                            xtdpdsys z_score age
                            xtdpd z_score L.z_score age, dgmmiv(z_score) div(age) lgmmiv(z_score)
                            With time-invariant regressors you can use something like this if you are willing to assume that gender is uncorrelated with the unobserved effects:
                            Code:
                            xtdpd z_score L.z_score age gender, dgmmiv(z_score) div(age) lgmmiv(z_score) liv(gender) hascons
                            Note: The coefficients differ now to the previous specifications and there is no equivalent specification of this model with the xtdpdsys command. Also, you have to specify the option hascons when you include time-invariant regressors.

                            In my paper with Claudia Schwarz you find more general arguments how to treat time-invariant regressors in dynamic panels. Also see my comments in the following Statalist thread:
                            http://www.statalist.org/forums/foru...-data-analysis
                            https://twitter.com/Kripfganz

                            Comment


                            • #15
                              Thanks, Sebastian! Your paper really explains!

                              A quick question: when using xtdpd, why should one include hascons when the regression has time invariant regressors? I checked the Stata menu; it says that "hascons specifies that xtdpd check for collinearity only among levels of independent variables; by default checks occur among levels and differences". not understand how this relates to that one should include hascons when time invariant regressors are included.

                              Comment

                              Working...
                              X