xtdpdgmm command - @SebastianKripfganz

John Sgr

Join Date: Sep 2020

Posts: 28
#1

xtdpdgmm command - @SebastianKripfganz

17 Jan 2021, 03:34

Dear Dr. Kripfganz,

Following your suggestions in the previous posts I decided to use xtdpdgmm command. It enables me to get meaning of each options that I specify in the command line, unlike the other GMM commands. I constructed my model with reference to
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

I would like to ask whether I use the command in an appropriate way which would also help other researchers to implement command.

I have a panel data with n=769, T=12. The data contains information about individuals’ depression level(dep_score), average income(avg_inc), health status(health_st), and injuries(injury) in a monthly basis and time dummies(w_*) for each month. Individualid variable is identifier for individuals.

I want to understand how income, health status and injuries affect the depression level. Since depression status is roughly stable, I used first lag of depression score as an independent variable. Moreover I use dynamic model as depression could affect average income and health status in the subsequent period. Based on your previous explanations I assume that average income and health status are predetermined variables. Injuries and time dummies are exogenous variables in my model.

I ran a regression with xtdpdgmm. I used vce option for Windmeijer’s correction. small options stands for getting t values instead of z. two option used to estimate two-step GMM model. collapse is for decreasing the number of instruments.

The command I ran was:
Code:

Code:

xtdpdgmm L(0/1).dep_score avg_inc health_st injury w_*, model(diff) collapse gmm(dep_score, l(2 4)) gmm(avg_inc health_st, l(1 3)) gmm(injury w_*, l(0 2)) nocons two small vce(cl individualid)

1. How can I test whether average income and health status are endogenous or predetermined variables? Once I define them as endogenous variables rather than predetermined variables, p value of AR(1)=0 in both model, p value of AR(2) increases to 0.81 from 0.20 and p value of Hansen test stays same 0.353 vs. 0.352. Fitting full model step(1) decreased to 0.13 from 0.15.
Which statistics should I take into account when deciding true specification of variables?

2. Exogenous injury and time variables are control variables in my model. Should I use them with gmm() or iv()?

3. Do I need to specify m(diff) m(level) to inside of all gmm commands? I did not get what's meaning of level equation and difference equation. Once I use m(l) in gmm(injury w_*, l(0 2) m(l)), fitting full model step(1) increased to 0.35 from 0.15. When do I need to use instruments in level rather than difference, how can I decide?

4. What's the role of nocons option?

5. Once I run the same model with model(fodev) lag of dependent variable become insignificant. t value to decreases to 0.84 from 2.80 while # of obs. stays same. On the other hand, time dummies become significant. What could be the reason? p value of AR(1)=0 in both model, p value of AR(2) increases to 0.83 from 0.20 and p value of Hansen test increases to 0.70 from 0.352. Fitting full model step(1) increased to 0.23 from 0.15.

6. Once I run the same model with model(fodev) lag of dependent variable become insignificant. t value to decreases to 38.6 from 2.80 while # of obs. stays same. On the other hand, time dummies become significant. What could be the reason? p value of AR(1)=0 in both model, p value of AR(2) decreases to 0.0004 from 0.20 and p value of Hansen test decreases to 0.009 from 0.352. Fitting full model step(1) increased to 1.13 from 0.15. How can I decide between fodev, level and diff models?

7. Why Stata drops two time dummies in diff model? It drops only one while using system GMM. Can I specify which dummies should be dropped if I need info for particular time dummy?

Thank you for your great help to all researchers in this forum.

Best regards,
John
Tags: None
Sebastian Kripfganz

Join Date: May 2014

Posts: 2601
#2

17 Jan 2021, 09:02

1. You can use incremental Hansen tests to decide whether the variables should be treated as strictly exogenous or predetermined. Modify your code as follows:

Code:

xtdpdgmm L(0/1).dep_score avg_inc health_st injury w_*, model(diff) collapse gmm(dep_score, l(2 4)) gmm(avg_inc health_st, l(2 3)) gmm(avg_inc, l(1 1)) gmm(health_st, l(1 1)) gmm(injury w_*, l(0 2)) overid nocons two small vce(cl individualid) estat overid, difference

By specifying separate options for the first lag of the variables you want to test, together with the overid option, you can subsequently obtain the difference-in-Hansen test just for those instruments. If the test rejects their validity, you should treat them as endogenous and remove those instruments from the model. (You might want to apply a sequential procedure, first including the first lag for one of the two variables only, and then the first lag for the other variable in the next step. Please see the section on "Model Selection" in my presentation slides.)

2. It does not matter much, in particular given that you used the collapse option. iv() is just a collapsed version of gmm(). For the time dummies, I would suggest to just specify them as iv(w_*, diff), which simply instruments the time dummies in the first-differenced model by themselves.

3. The option model(diff) outside of the gmm() options sets the default that then applies to all gmm() and iv() options. You could override this default be specifying model(level) inside some of these options. Instruments for the level model need to satisfy stronger assumptions, i.e. they need to be uncorrelated with the "fixed effects" that are present in the levels model (but drop out in the first-differenced model). Instruments for the level model, if valid, can make the estimator more efficient and could help in situations when there are identification problems for the difference-GMM estimator (e.g. when the dependent variable is highly persistent). See the section on "System GMM" in my presentation slides.

4. If you only consider instruments for the first-differenced model, then a regression intercept does not affect the estimation of the other coefficients. So, you could simply suppress it with the nocons option. Whenever you add instruments for the levels model, you should essentially always include an intercept, and therefore not specify this option.

5. Such a question is generally hard to answer. I do not know.

6. model(fodev) is an alternative to model(diff). The former has some advantage with unbalanced panel data. Note that you need to modify the lag structure if you switch between the two: https://www.statalist.org/forums/for...53#post1589753

7. Due to first differencing, you lose one additional observation from the effective estimation sample. (Note that this is not shown in the estimation header, which always displays the number of observations corresponding to the levels model.) You can choose a different base level. Instead of w_*, just specify explicitly those dummies that you want to include.

https://www.kripfganz.de/stata/
Comment

John Sgr

Join Date: Sep 2020
Posts: 28

18 Jan 2021, 07:08

Dear Sebastian,

Thanks for your help. I have unbalanced data but I think fodev and diff options brings me similar results. How could I decide which one to use?

Here is the diff option:

Code:

xtdpdgmm L(0/1).dep_score avg_inc health_st injury w_*, model(diff) collapse gmm(dep_score, l(2 .)) gmm(avg_inc health_st, l(1 .))  gmm(injury, l(0 .)) gmm(w_*, l(0 .) diff) noco ns  two small vce(cl individualid)

Code:

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  1.1911407
Step 2         f(b) =  .10071562

Group variable: individual~s                 Number of obs         =      3775
Time variable: wave                          Number of groups      =       634

Moment conditions:     linear =      72      Obs per group:    min =         1
                    nonlinear =       0                        avg =  5.954259
                        total =      72                        max =         9

                      (Std. Err. adjusted for 634 clusters in individualidsys)
------------------------------------------------------------------------------
             |              WC-Robust
   dep_score |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   dep_score |
         L1. |   .0820051   .0298021     2.75   0.006     .0234822     .140528
             |
     avg_inc |  -.0000252   .0000778    -0.32   0.746     -.000178    .0001275
   health_st |   .4386173   .2056473     2.13   0.033     .0347839    .8424507
      injury |   1.357471   .4646371     2.92   0.004     .4450544    2.269888
         w_1 |          0  (omitted)
         w_2 |  -.0376841   .3953373    -0.10   0.924    -.8140154    .7386473
         w_3 |  -.3196453   .3448559    -0.93   0.354    -.9968453    .3575548
         w_4 |   .2120989     .32973     0.64   0.520    -.4353979    .8595958
         w_5 |   .4634128   .2979409     1.56   0.120    -.1216594    1.048485
         w_6 |  -.0477281   .2384947    -0.20   0.841    -.5160645    .4206083
         w_7 |  -.0431505   .2473132    -0.17   0.862    -.5288041     .442503
         w_8 |   .0924813   .2205632     0.42   0.675    -.3406428    .5256053
         w_9 |          0  (omitted)
        w_10 |  -.1450261   .2158403    -0.67   0.502    -.5688758    .2788236
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(diff):
   L2.dep_score L3.dep_score L4.dep_score L5.dep_score L6.dep_score
   L7.dep_score L8.dep_score L9.dep_score
 2, model(diff):
   L1.avg_inc L2.avg_inc L3.avg_inc L4.avg_inc L5.avg_inc L6.avg_inc
   L7.avg_inc L8.avg_inc L9.avg_inc L1.health_st L2.health_st L3.health_st
   L4.health_st L5.health_st L6.health_st L7.health_st L8.health_st
   L9.health_st
 3, model(diff):
   injury L1.injury L2.injury L3.injury L4.injury L5.injury L6.injury
   L7.injury L8.injury L9.injury
 4, model(diff):
   L1.D.w_1 L2.D.w_1 L3.D.w_1 L5.D.w_1 L7.D.w_1 L1.D.w_2 L2.D.w_2 L3.D.w_2
   L4.D.w_2 L5.D.w_2 L6.D.w_2 L7.D.w_2 L8.D.w_2 L1.D.w_3 L2.D.w_3 L3.D.w_3
   L4.D.w_3 L5.D.w_3 L6.D.w_3 L1.D.w_4 L2.D.w_4 L3.D.w_4 L4.D.w_4 L5.D.w_4
   L6.D.w_4 L1.D.w_5 L2.D.w_5 L3.D.w_5 L4.D.w_5 L1.D.w_6 L2.D.w_6 L3.D.w_6
   L4.D.w_6 L1.D.w_7 L2.D.w_7 L1.D.w_8

Here is the fodev option:

Code:

xtdpdgmm L(0/1).dep_score avg_inc health_st injury w_*, model(fodev) collapse gmm(dep_score, l(1 .)) gmm(avg_inc health_st, l(0 .))  gmm(injury, l(0 .) m(mdev)) gmm(w_*, l(0 .) m(mdev)) nocons  two small vce(cl individualid)

Code:

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  1.2596658
Step 2         f(b) =  .10356775

Group variable: individual~s                 Number of obs         =      3775
Time variable: wave                          Number of groups      =       634

Moment conditions:     linear =      78      Obs per group:    min =         1
                    nonlinear =       0                        avg =  5.954259
                        total =      78                        max =         9

                      (Std. Err. adjusted for 634 clusters in individualidsys)
------------------------------------------------------------------------------
             |              WC-Robust
   dep_score |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   dep_score |
         L1. |   .1022946   .0291661     3.51   0.000     .0450205    .1595686
             |
     avg_inc |  -.0000177   .0000637    -0.28   0.781    -.0001428    .0001073
   health_st |    .415779   .1748861     2.38   0.018     .0723519    .7592062
      injury |    .986515   .3486896     2.83   0.005     .3017868    1.671243
         w_1 |          0  (omitted)
         w_2 |   .1883661   .3205027     0.59   0.557    -.4410111    .8177433
         w_3 |          0  (omitted)
         w_4 |   .3283641   .3292352     1.00   0.319    -.3181613    .9748895
         w_5 |   .6175765   .3111424     1.98   0.048     .0065804    1.228573
         w_6 |   .0695444   .2886622     0.24   0.810    -.4973069    .6363958
         w_7 |   .1344026   .3027474     0.44   0.657    -.4601081    .7289134
         w_8 |   .1617975   .2810819     0.58   0.565    -.3901683    .7137633
         w_9 |   .0127941   .2983247     0.04   0.966    -.5730318      .59862
        w_10 |  -.0759849   .2790805    -0.27   0.786    -.6240205    .4720506
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(fodev):
   L1.dep_score L2.dep_score L3.dep_score L4.dep_score L5.dep_score
   L6.dep_score L7.dep_score L8.dep_score
 2, model(fodev):
   avg_inc L1.avg_inc L2.avg_inc L3.avg_inc L4.avg_inc L5.avg_inc L6.avg_inc
   L7.avg_inc L8.avg_inc health_st L1.health_st L2.health_st L3.health_st
   L4.health_st L5.health_st L6.health_st L7.health_st L8.health_st
 3, model(mdev):
   injury L1.injury L2.injury L3.injury L4.injury L5.injury L6.injury
   L7.injury L8.injury L9.injury
 4, model(mdev):
   L1.w_1 L3.w_1 L4.w_1 L5.w_1 L6.w_1 L7.w_1 L8.w_1 L9.w_1 L1.w_2 L3.w_2
   L4.w_2 L5.w_2 L6.w_2 L7.w_2 L8.w_2 L1.w_3 L2.w_3 L3.w_3 L4.w_3 L5.w_3
   L6.w_3 L7.w_3 L1.w_4 L2.w_4 L3.w_4 L4.w_4 L5.w_4 L6.w_4 L1.w_5 L2.w_5
   L3.w_5 L4.w_5 L5.w_5 L1.w_6 L2.w_6 L3.w_6 L4.w_6 L1.w_7 L2.w_7 L3.w_7
   L1.w_8 L2.w_8

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2601
#4

19 Jan 2021, 04:42

The FOD estimator retains more information than the DIFF estimator when the data has gaps. Other than that, there is not much of a difference.

https://www.kripfganz.de/stata/
Comment
John Sgr

Join Date: Sep 2020

Posts: 28
#5

19 Jan 2021, 04:56

Perfect! Sincere thanks for your help.
Comment

Announcement

xtdpdgmm command - @SebastianKripfganz

Comment

Comment

Comment

Comment