Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtdpdgmm command - @SebastianKripfganz

    Dear Dr. Kripfganz,

    Following your suggestions in the previous posts I decided to use xtdpdgmm command. It enables me to get meaning of each options that I specify in the command line, unlike the other GMM commands. I constructed my model with reference toI would like to ask whether I use the command in an appropriate way which would also help other researchers to implement command.

    I have a panel data with n=769, T=12. The data contains information about individuals’ depression level(dep_score), average income(avg_inc), health status(health_st), and injuries(injury) in a monthly basis and time dummies(w_*) for each month. Individualid variable is identifier for individuals.

    I want to understand how income, health status and injuries affect the depression level. Since depression status is roughly stable, I used first lag of depression score as an independent variable. Moreover I use dynamic model as depression could affect average income and health status in the subsequent period. Based on your previous explanations I assume that average income and health status are predetermined variables. Injuries and time dummies are exogenous variables in my model.

    I ran a regression with xtdpdgmm. I used vce option for Windmeijer’s correction. small options stands for getting t values instead of z. two option used to estimate two-step GMM model. collapse is for decreasing the number of instruments.

    The command I ran was:
    Code:
    Code:
    xtdpdgmm L(0/1).dep_score avg_inc health_st injury w_*, model(diff) collapse gmm(dep_score, l(2 4)) gmm(avg_inc health_st, l(1 3))  gmm(injury w_*, l(0 2)) nocons  two small vce(cl individualid)
    1. How can I test whether average income and health status are endogenous or predetermined variables? Once I define them as endogenous variables rather than predetermined variables, p value of AR(1)=0 in both model, p value of AR(2) increases to 0.81 from 0.20 and p value of Hansen test stays same 0.353 vs. 0.352. Fitting full model step(1) decreased to 0.13 from 0.15.
    Which statistics should I take into account when deciding true specification of variables?

    2. Exogenous injury and time variables are control variables in my model. Should I use them with gmm() or iv()?

    3. Do I need to specify m(diff) m(level) to inside of all gmm commands? I did not get what's meaning of level equation and difference equation. Once I use m(l) in gmm(injury w_*, l(0 2) m(l)), fitting full model step(1) increased to 0.35 from 0.15. When do I need to use instruments in level rather than difference, how can I decide?

    4. What's the role of nocons option?

    5. Once I run the same model with model(fodev) lag of dependent variable become insignificant. t value to decreases to 0.84 from 2.80 while # of obs. stays same. On the other hand, time dummies become significant. What could be the reason? p value of AR(1)=0 in both model, p value of AR(2) increases to 0.83 from 0.20 and p value of Hansen test increases to 0.70 from 0.352. Fitting full model step(1) increased to 0.23 from 0.15.

    6. Once I run the same model with model(fodev) lag of dependent variable become insignificant. t value to decreases to 38.6 from 2.80 while # of obs. stays same. On the other hand, time dummies become significant. What could be the reason? p value of AR(1)=0 in both model, p value of AR(2) decreases to 0.0004 from 0.20 and p value of Hansen test decreases to 0.009 from 0.352. Fitting full model step(1) increased to 1.13 from 0.15. How can I decide between fodev, level and diff models?

    7. Why Stata drops two time dummies in diff model? It drops only one while using system GMM. Can I specify which dummies should be dropped if I need info for particular time dummy?

    Thank you for your great help to all researchers in this forum.

    Best regards,
    John

  • #2
    1. You can use incremental Hansen tests to decide whether the variables should be treated as strictly exogenous or predetermined. Modify your code as follows:
    Code:
    xtdpdgmm L(0/1).dep_score avg_inc health_st injury w_*, model(diff) collapse gmm(dep_score, l(2 4)) gmm(avg_inc health_st, l(2 3)) gmm(avg_inc, l(1 1)) gmm(health_st, l(1 1)) gmm(injury w_*, l(0 2)) overid nocons  two small vce(cl individualid)
    estat overid, difference
    By specifying separate options for the first lag of the variables you want to test, together with the overid option, you can subsequently obtain the difference-in-Hansen test just for those instruments. If the test rejects their validity, you should treat them as endogenous and remove those instruments from the model. (You might want to apply a sequential procedure, first including the first lag for one of the two variables only, and then the first lag for the other variable in the next step. Please see the section on "Model Selection" in my presentation slides.)

    2. It does not matter much, in particular given that you used the collapse option. iv() is just a collapsed version of gmm(). For the time dummies, I would suggest to just specify them as iv(w_*, diff), which simply instruments the time dummies in the first-differenced model by themselves.

    3. The option model(diff) outside of the gmm() options sets the default that then applies to all gmm() and iv() options. You could override this default be specifying model(level) inside some of these options. Instruments for the level model need to satisfy stronger assumptions, i.e. they need to be uncorrelated with the "fixed effects" that are present in the levels model (but drop out in the first-differenced model). Instruments for the level model, if valid, can make the estimator more efficient and could help in situations when there are identification problems for the difference-GMM estimator (e.g. when the dependent variable is highly persistent). See the section on "System GMM" in my presentation slides.

    4. If you only consider instruments for the first-differenced model, then a regression intercept does not affect the estimation of the other coefficients. So, you could simply suppress it with the nocons option. Whenever you add instruments for the levels model, you should essentially always include an intercept, and therefore not specify this option.

    5. Such a question is generally hard to answer. I do not know.

    6. model(fodev) is an alternative to model(diff). The former has some advantage with unbalanced panel data. Note that you need to modify the lag structure if you switch between the two: https://www.statalist.org/forums/for...53#post1589753

    7. Due to first differencing, you lose one additional observation from the effective estimation sample. (Note that this is not shown in the estimation header, which always displays the number of observations corresponding to the levels model.) You can choose a different base level. Instead of w_*, just specify explicitly those dummies that you want to include.
    https://www.kripfganz.de/stata/

    Comment


    • #3
      Dear Sebastian,

      Thanks for your help. I have unbalanced data but I think fodev and diff options brings me similar results. How could I decide which one to use?

      Here is the diff option:
      Code:
      xtdpdgmm L(0/1).dep_score avg_inc health_st injury w_*, model(diff) collapse gmm(dep_score, l(2 .)) gmm(avg_inc health_st, l(1 .))  gmm(injury, l(0 .)) gmm(w_*, l(0 .) diff) noco ns  two small vce(cl individualid)
      Code:
      Generalized method of moments estimation
      
      Fitting full model:
      Step 1         f(b) =  1.1911407
      Step 2         f(b) =  .10071562
      
      Group variable: individual~s                 Number of obs         =      3775
      Time variable: wave                          Number of groups      =       634
      
      Moment conditions:     linear =      72      Obs per group:    min =         1
                          nonlinear =       0                        avg =  5.954259
                              total =      72                        max =         9
      
                            (Std. Err. adjusted for 634 clusters in individualidsys)
      ------------------------------------------------------------------------------
                   |              WC-Robust
         dep_score |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
         dep_score |
               L1. |   .0820051   .0298021     2.75   0.006     .0234822     .140528
                   |
           avg_inc |  -.0000252   .0000778    -0.32   0.746     -.000178    .0001275
         health_st |   .4386173   .2056473     2.13   0.033     .0347839    .8424507
            injury |   1.357471   .4646371     2.92   0.004     .4450544    2.269888
               w_1 |          0  (omitted)
               w_2 |  -.0376841   .3953373    -0.10   0.924    -.8140154    .7386473
               w_3 |  -.3196453   .3448559    -0.93   0.354    -.9968453    .3575548
               w_4 |   .2120989     .32973     0.64   0.520    -.4353979    .8595958
               w_5 |   .4634128   .2979409     1.56   0.120    -.1216594    1.048485
               w_6 |  -.0477281   .2384947    -0.20   0.841    -.5160645    .4206083
               w_7 |  -.0431505   .2473132    -0.17   0.862    -.5288041     .442503
               w_8 |   .0924813   .2205632     0.42   0.675    -.3406428    .5256053
               w_9 |          0  (omitted)
              w_10 |  -.1450261   .2158403    -0.67   0.502    -.5688758    .2788236
      ------------------------------------------------------------------------------
      Instruments corresponding to the linear moment conditions:
       1, model(diff):
         L2.dep_score L3.dep_score L4.dep_score L5.dep_score L6.dep_score
         L7.dep_score L8.dep_score L9.dep_score
       2, model(diff):
         L1.avg_inc L2.avg_inc L3.avg_inc L4.avg_inc L5.avg_inc L6.avg_inc
         L7.avg_inc L8.avg_inc L9.avg_inc L1.health_st L2.health_st L3.health_st
         L4.health_st L5.health_st L6.health_st L7.health_st L8.health_st
         L9.health_st
       3, model(diff):
         injury L1.injury L2.injury L3.injury L4.injury L5.injury L6.injury
         L7.injury L8.injury L9.injury
       4, model(diff):
         L1.D.w_1 L2.D.w_1 L3.D.w_1 L5.D.w_1 L7.D.w_1 L1.D.w_2 L2.D.w_2 L3.D.w_2
         L4.D.w_2 L5.D.w_2 L6.D.w_2 L7.D.w_2 L8.D.w_2 L1.D.w_3 L2.D.w_3 L3.D.w_3
         L4.D.w_3 L5.D.w_3 L6.D.w_3 L1.D.w_4 L2.D.w_4 L3.D.w_4 L4.D.w_4 L5.D.w_4
         L6.D.w_4 L1.D.w_5 L2.D.w_5 L3.D.w_5 L4.D.w_5 L1.D.w_6 L2.D.w_6 L3.D.w_6
         L4.D.w_6 L1.D.w_7 L2.D.w_7 L1.D.w_8
      Here is the fodev option:
      Code:
      xtdpdgmm L(0/1).dep_score avg_inc health_st injury w_*, model(fodev) collapse gmm(dep_score, l(1 .)) gmm(avg_inc health_st, l(0 .))  gmm(injury, l(0 .) m(mdev)) gmm(w_*, l(0 .) m(mdev)) nocons  two small vce(cl individualid)
      Code:
      Generalized method of moments estimation
      
      Fitting full model:
      Step 1         f(b) =  1.2596658
      Step 2         f(b) =  .10356775
      
      Group variable: individual~s                 Number of obs         =      3775
      Time variable: wave                          Number of groups      =       634
      
      Moment conditions:     linear =      78      Obs per group:    min =         1
                          nonlinear =       0                        avg =  5.954259
                              total =      78                        max =         9
      
                            (Std. Err. adjusted for 634 clusters in individualidsys)
      ------------------------------------------------------------------------------
                   |              WC-Robust
         dep_score |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
         dep_score |
               L1. |   .1022946   .0291661     3.51   0.000     .0450205    .1595686
                   |
           avg_inc |  -.0000177   .0000637    -0.28   0.781    -.0001428    .0001073
         health_st |    .415779   .1748861     2.38   0.018     .0723519    .7592062
            injury |    .986515   .3486896     2.83   0.005     .3017868    1.671243
               w_1 |          0  (omitted)
               w_2 |   .1883661   .3205027     0.59   0.557    -.4410111    .8177433
               w_3 |          0  (omitted)
               w_4 |   .3283641   .3292352     1.00   0.319    -.3181613    .9748895
               w_5 |   .6175765   .3111424     1.98   0.048     .0065804    1.228573
               w_6 |   .0695444   .2886622     0.24   0.810    -.4973069    .6363958
               w_7 |   .1344026   .3027474     0.44   0.657    -.4601081    .7289134
               w_8 |   .1617975   .2810819     0.58   0.565    -.3901683    .7137633
               w_9 |   .0127941   .2983247     0.04   0.966    -.5730318      .59862
              w_10 |  -.0759849   .2790805    -0.27   0.786    -.6240205    .4720506
      ------------------------------------------------------------------------------
      Instruments corresponding to the linear moment conditions:
       1, model(fodev):
         L1.dep_score L2.dep_score L3.dep_score L4.dep_score L5.dep_score
         L6.dep_score L7.dep_score L8.dep_score
       2, model(fodev):
         avg_inc L1.avg_inc L2.avg_inc L3.avg_inc L4.avg_inc L5.avg_inc L6.avg_inc
         L7.avg_inc L8.avg_inc health_st L1.health_st L2.health_st L3.health_st
         L4.health_st L5.health_st L6.health_st L7.health_st L8.health_st
       3, model(mdev):
         injury L1.injury L2.injury L3.injury L4.injury L5.injury L6.injury
         L7.injury L8.injury L9.injury
       4, model(mdev):
         L1.w_1 L3.w_1 L4.w_1 L5.w_1 L6.w_1 L7.w_1 L8.w_1 L9.w_1 L1.w_2 L3.w_2
         L4.w_2 L5.w_2 L6.w_2 L7.w_2 L8.w_2 L1.w_3 L2.w_3 L3.w_3 L4.w_3 L5.w_3
         L6.w_3 L7.w_3 L1.w_4 L2.w_4 L3.w_4 L4.w_4 L5.w_4 L6.w_4 L1.w_5 L2.w_5
         L3.w_5 L4.w_5 L5.w_5 L1.w_6 L2.w_6 L3.w_6 L4.w_6 L1.w_7 L2.w_7 L3.w_7
         L1.w_8 L2.w_8

      Comment


      • #4
        The FOD estimator retains more information than the DIFF estimator when the data has gaps. Other than that, there is not much of a difference.
        https://www.kripfganz.de/stata/

        Comment


        • #5
          Perfect! Sincere thanks for your help.

          Comment

          Working...
          X