System GMM - to address reverse causality

Emma Kemp

Join Date: Jun 2024
Posts: 6

System GMM - to address reverse causality

02 Jul 2024, 12:12

Hi everyone,

I am looking to examine the relationship between perceived neighbourhood cohesion (NSC_index) and life satisfaction (lfsato). In my paper I have run an OLS model (as a benchmark), FE model and then I want to run a dynamic panel model using system GMM.

I have 3 key questions regarding System GMM which I will outline below and would greatly appreciate any guidance:
I am running the system GMM model (using stata 18) as follows:

Code:

xtset pipd wave

xtabond2 lfsato lag_lfsato NSC_index income i.age_group_destr age2 jbstat_simple edu_simple marriage_status tenure_dummy addrmov_dummy aidhh_dummy hhsize_simple nchild_simple physical_health mental_health i.wave i.gor_dv [pweight=l_indscus_lw], gmm (lag_lfsato income marriage_status physical_health mental_health, collapse) iv( NSC_index i.age_group_destr age2 jbstat_simple edu_simple tenure_dummy addrmov_dummy aidhh_dummy hhsize_simple nchild_simple i.wave i.gor_dv) nodiffsargan robust small

In terms of the endogenous variables specified by gmm : I have included lagged life satisfaction and then from the literature Piper (2023) states that marriage status, income and health are endogenous with life satisfaction, so I have included these as well. I also conducted a pairwise correlation matrix and VIF including all of my explanatory variables and life satisfaction and found that mental health was also quite highly correlated with life satisfaction so I have included this variable.

I have then included all of the other explanatory variables from my OLS and FE regressions as exogenous instruments specified by iv.

Q1. Is this the correct/valid way to decide which variables are endogenous/exogenous?

Running the above code in Stata generates the following output:

Code:

. xtabond2 lfsato laglfsato3 NSC_index fihhmngrs1_dv i.age_group_destr age2 jbstat_simple edu_si
> mple mastat_simple tenure_dummy addrmov_dummy aidhh_dummy hhsize_simple nchild_simple scsf1_co
> mbined_r sf12mcs_dv i.wave i.gor_dv [pweight=l_indscus_lw], gmm (laglfsato3 fihhmngrs1_dv mast
> at_simple scsf1_combined_r sf12mcs_dv, collapse) iv( NSC_index i.age_group_destr age2 jbstat_s
> imple edu_simple tenure_dummy addrmov_dummy aidhh_dummy hhsize_simple nchild_simple i.wave i.g
> or_dv) nodiffsargan robust small     
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
1b.age_group_destr dropped due to collinearity
7.age_group_destr dropped due to collinearity
1b.wave dropped due to collinearity
3.wave dropped due to collinearity
1b.gor_dv dropped due to collinearity
(sum of weights is 22647.4695)
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate robust weighting matrix for Hansen test.

Dynamic panel-data estimation, one-step system GMM
------------------------------------------------------------------------------
Group variable: pidp                            Number of obs      =     23378
Time variable : wave                            Number of groups   =      6334
Number of instruments = 53                      Obs per group: min =         1
F(., 6333)    =         .                                      avg =      3.69
Prob > F      =         .                                      max =         4
-----------------------------------------------------------------------------------------------
                              |               Robust
                       lfsato | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
------------------------------+----------------------------------------------------------------
                   laglfsato3 |    .083551   .0159013     5.25   0.000     .0523791    .1147229
                   NSC_index_ |   .1831389   .0231104     7.92   0.000     .1378347    .2284431
                fihhmngrs1_dv |   7.19e-06   5.31e-06     1.35   0.176    -3.22e-06    .0000176
                              |
              age_group_destr |
                       18-24  |   .1436355   .1315885     1.09   0.275    -.1143224    .4015935
                       25-34  |  -.0245395   .1061021    -0.23   0.817    -.2325355    .1834565
                       35-44  |  -.1553759   .0900446    -1.73   0.084    -.3318939     .021142
                       45-54  |  -.2298868   .0702192    -3.27   0.001    -.3675402   -.0922334
                       55-64  |  -.1802464   .0472028    -3.82   0.000    -.2727798    -.087713
                              |
                         age2 |   .0000259   .0000257     1.01   0.314    -.0000245    .0000763
                jbstat_simple |  -.0034113   .0083322    -0.41   0.682    -.0197452    .0129225
                   edu_simple |  -.0472176    .012247    -3.86   0.000    -.0712258   -.0232093
                mastat_simple |  -.0096751   .0375167    -0.26   0.797    -.0832206    .0638704
                 tenure_dummy |   .0607223    .028808     2.11   0.035     .0042489    .1171958
                addrmov_dummy |   .1015179   .0492164     2.06   0.039     .0050371    .1979986
                  aidhh_dummy |  -.1121506   .0495233    -2.26   0.024     -.209233   -.0150682
                hhsize_simple |   -.034727     .02944    -1.18   0.238    -.0924393    .0229853
                nchild_simple |    .017693   .0200045     0.88   0.376    -.0215226    .0569086
             scsf1_combined_r |   .1988844   .0238085     8.35   0.000     .1522117    .2455571
                   sf12mcs_dv |   .0488631   .0020779    23.52   0.000     .0447897    .0529364
                              |
                         wave |
                           2  |  -.0387835   .0257759    -1.50   0.132     -.089313    .0117459
                           4  |   .0262173   .0260069     1.01   0.313    -.0247652    .0771997
                           5  |   .1596027   .0265361     6.01   0.000      .107583    .2116224
                              |
                       gor_dv |
             north west       |   -.011139   .0671737    -0.17   0.868    -.1428223    .1205443
yorkshire and the humber  ..  |   .0114451   .0709075     0.16   0.872    -.1275576    .1504478
             east midlands    |   .0307968   .0670911     0.46   0.646    -.1007246    .1623181
             west midlands    |   .0510337   .0690321     0.74   0.460    -.0842926      .18636
             east of england  |   .0066081   .0640761     0.10   0.918    -.1190027     .132219
                     london   |  -.1073267   .0741767    -1.45   0.148    -.2527381    .0380847
             south east       |   .0033076   .0631424     0.05   0.958     -.120473    .1270882
             south west       |   .0132228   .0638898     0.21   0.836    -.1120228    .1384685
                     wales    |   .0324638   .0703395     0.46   0.644    -.1054255    .1703531
             scotland         |  -.0823215   .0720415    -1.14   0.253    -.2235473    .0589042
     northern ireland         |   .0762113   .0860584     0.89   0.376    -.0924923    .2449148
                              |
                        _cons |   1.236654   .2329817     5.31   0.000     .7799315    1.693377
-----------------------------------------------------------------------------------------------
Instruments for first differences equation
  Standard
    D.(NSC_index_ 1b.age_group_destr 2.age_group_destr 3.age_group_destr
    4.age_group_destr 5.age_group_destr 6.age_group_destr 7.age_group_destr
    age2 jbstat_simple edu_simple tenure_dummy addrmov_dummy aidhh_dummy
    hhsize_simple nchild_simple 1b.wave 2.wave 3.wave 4.wave 5.wave 1b.gor_dv
    2.gor_dv 3.gor_dv 4.gor_dv 5.gor_dv 6.gor_dv 7.gor_dv 8.gor_dv 9.gor_dv
    10.gor_dv 11.gor_dv 12.gor_dv)
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(1/4).(laglfsato3 fihhmngrs1_dv mastat_simple scsf1_combined_r
    sf12mcs_dv) collapsed
Instruments for levels equation
  Standard
    NSC_index_ 1b.age_group_destr 2.age_group_destr 3.age_group_destr
    4.age_group_destr 5.age_group_destr 6.age_group_destr 7.age_group_destr
    age2 jbstat_simple edu_simple tenure_dummy addrmov_dummy aidhh_dummy
    hhsize_simple nchild_simple 1b.wave 2.wave 3.wave 4.wave 5.wave 1b.gor_dv
    2.gor_dv 3.gor_dv 4.gor_dv 5.gor_dv 6.gor_dv 7.gor_dv 8.gor_dv 9.gor_dv
    10.gor_dv 11.gor_dv 12.gor_dv
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    D.(laglfsato3 fihhmngrs1_dv mastat_simple scsf1_combined_r sf12mcs_dv)
    collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -24.57  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =   0.69  Pr > z =  0.490
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(19)   =  41.38  Prob > chi2 =  0.002
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(19)   =  24.70  Prob > chi2 =  0.171
  (Robust, but weakened by many instruments.)

Q2. Does this seem correctly specified?

Q3. I’m a bit concerned that the sargan test is still significant. Should I try and reduce the number of exogenous instruments or lags in my model? Or, is there an alternative way to address this issue?

Thank you in advance for any advice or guidance you may be able to provide. I am very new to statistics and have spent a lot of time reading the documentation in stata and the empirical literature on how to best use GMM but would love some clarification on the above.

Paper cited: Piper, Alan. (2023). What Does Dynamic Panel Analysis Tell Us About Life Satisfaction?. Review of Income and Wealth. 10.1111/roiw.12567.

Many thanks,
Emma

Tags: systemGMM, xtabond2

Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#2

03 Jul 2024, 02:30

Q1: Looking at correlation matrices and VIFs might help with model building, but it cannot really answer the question whether your variables are endogenous or exogenous, unless you rule out by assumption that your variables might be correlated with anything that is unobserved. A better approach might be to look at incremental overidentification (difference-in-Hansen) tests; see slides 48 and following as well as slides 90 and following in my 2019 London Stata Conference presentation:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

Q2: There is no indication of misspecification in this output.

Q3: The Sargan test is invalid for system GMM. You should only consider the Hansen test. Moreover, you should actually use the two-step instead of the one-step estimator to get efficient estimates.

https://www.kripfganz.de/stata/
Comment

Emma Kemp

Join Date: Jun 2024
Posts: 6

10 Jul 2024, 10:54

Hi Sebastian,

Thank you very much for your response - I really appreciate it!

I have spent some time going through your conference slides and working through the sequential model selection process.
I have landed on the following model. The Hansen test for over-identification is 0.228 which comes in just within Roodman's (2009) commensense value of over 0.1 and under 0.25. However I am having some trouble interpreting the difference in Hansen tests for exogeneity.

Code:

 . xtabond2 lfsato lag_lfsato NSC_index log_hhincome age_dv age2 jbstat_simple edu_simple mastat_si
> mple tenure_dummy addrmov_dummy aidhh_dummy hhsize_simple scsf1_combined_r sf12mcs_dv i.wave i.g
> or_dv [pweight=l_indscus_lw], gmm (lag_lfsato) gmm (NSC_index log_hhincome mastat_simple scsf1_c
> ombined_r sf12mcs_dv i.gor_dv, lag(1 2) collapse) iv(age_dv age2 jbstat_simple edu_simple tenure
> _dummy addrmov_dummy aidhh_dummy hhsize_simple i.wave) robust small two  
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
1b.wave dropped due to collinearity
3.wave dropped due to collinearity
1b.gor_dv dropped due to collinearity
(sum of weights is 21629.783)
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
  Difference-in-Sargan/Hansen statistics may be negative.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: pidp                            Number of obs      =     20253
Time variable : wave                            Number of groups   =      6587
Number of instruments = 69                      Obs per group: min =         1
F(., 6586)    =         .                                      avg =      3.07
Prob > F      =         .                                      max =         4
-------------------------------------------------------------------------------------------------
                                |              Corrected
                         lfsato | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
--------------------------------+----------------------------------------------------------------
                     lag_lfsato |   .0929353    .016181     5.74   0.000     .0612153    .1246554
                     NSC_index_ |   .1941958   .0372533     5.21   0.000     .1211673    .2672244
                   log_hhincome |   .0324155   .0396003     0.82   0.413    -.0452138    .1100449
                         age_dv |  -.0413708   .0080018    -5.17   0.000     -.057057   -.0256847
                           age2 |   .0004406   .0000892     4.94   0.000     .0002657    .0006154
                  jbstat_simple |  -.0029496   .0082201    -0.36   0.720    -.0190636    .0131644
                     edu_simple |  -.0830624   .0238215    -3.49   0.000    -.1297602   -.0363646
                  mastat_simple |  -.0128715   .0355886    -0.36   0.718    -.0826367    .0568937
                   tenure_dummy |   .0079694   .0373529     0.21   0.831    -.0652544    .0811931
                  addrmov_dummy |    .084803   .0490902     1.73   0.084    -.0114297    .1810356
                    aidhh_dummy |  -.1575745   .0561456    -2.81   0.005     -.267638   -.0475109
                  hhsize_simple |   -.015033   .0251979    -0.60   0.551    -.0644291    .0343631
               scsf1_combined_r |   .2103753   .0240397     8.75   0.000     .1632497    .2575009
                     sf12mcs_dv |   .0506123   .0021266    23.80   0.000     .0464435    .0547811
                                |
                           wave |
                             2  |  -.0459699   .0279494    -1.64   0.100    -.1007597    .0088199
                             4  |   .0413835   .0280656     1.47   0.140    -.0136341    .0964011
                             5  |   .1611448   .0283927     5.68   0.000     .1054858    .2168038
                                |
                         gor_dv |
               north west       |   .0190738   .4778575     0.04   0.968    -.9176819    .9558295
yorkshire and the humber    ..  |   .1159949   .3975749     0.29   0.770    -.6633808    .8953705
               east midlands    |   .0291419   .4029905     0.07   0.942    -.7608502     .819134
               west midlands    |  -.0906197   .3808375    -0.24   0.812    -.8371847    .6559452
               east of england  |  -.0896637   .3827004    -0.23   0.815    -.8398805    .6605531
                       london   |  -.6188752   .5061444    -1.22   0.221    -1.611082    .3733318
               south east       |  -.3028067    .400984    -0.76   0.450    -1.088865     .483252
               south west       |  -.1844545   .3859682    -0.48   0.633    -.9410773    .5721683
                       wales    |  -.2461455   .5957432    -0.41   0.679    -1.413995    .9217043
               scotland         |  -.0251545   .6380725    -0.04   0.969    -1.275983    1.225674
       northern ireland         |  -.4968346   .4622027    -1.07   0.282    -1.402902    .4092325
                                |
                          _cons |    1.84457   .5243582     3.52   0.000     .8166582    2.872482
-------------------------------------------------------------------------------------------------
Instruments for first differences equation
  Standard
    D.(age_dv age2 jbstat_simple edu_simple tenure_dummy addrmov_dummy
    aidhh_dummy hhsize_simple 1b.wave 2.wave 3.wave 4.wave 5.wave)
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(1/2).(NSC_index_ log_hhincome mastat_simple scsf1_combined_r sf12mcs_dv
    1b.gor_dv 2.gor_dv 3.gor_dv 4.gor_dv 5.gor_dv 6.gor_dv 7.gor_dv 8.gor_dv
    9.gor_dv 10.gor_dv 11.gor_dv 12.gor_dv) collapsed
    L(1/4).lag_lfsato
Instruments for levels equation
  Standard
    age_dv age2 jbstat_simple edu_simple tenure_dummy addrmov_dummy
    aidhh_dummy hhsize_simple 1b.wave 2.wave 3.wave 4.wave 5.wave
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    D.(NSC_index_ log_hhincome mastat_simple scsf1_combined_r sf12mcs_dv
    1b.gor_dv 2.gor_dv 3.gor_dv 4.gor_dv 5.gor_dv 6.gor_dv 7.gor_dv 8.gor_dv
    9.gor_dv 10.gor_dv 11.gor_dv 12.gor_dv) collapsed
    D.lag_lfsato
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -20.16  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =   0.06  Pr > z =  0.956
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(40)   =  89.60  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(40)   =  46.31  Prob > chi2 =  0.228
  (Robust, but weakened by many instruments.)

Code:

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(21)   =  13.68  Prob > chi2 =  0.883
    Difference (null H = exogenous): chi2(19)   =  32.63  Prob > chi2 =  0.026
  gmm(lag_lfsato, lag(1 .))
    Hansen test excluding group:     chi2(31)   =  36.75  Prob > chi2 =  0.220
    Difference (null H = exogenous): chi2(9)    =   9.56  Prob > chi2 =  0.387
  iv(age_dv age2 jbstat_simple edu_simple tenure_dummy addrmov_dummy aidhh_dummy hhsize_simple 1b.
> wave 2.wave 3.wave 4.wave 5.wave)
    Hansen test excluding group:     chi2(28)   =  31.17  Prob > chi2 =  0.310
    Difference (null H = exogenous): chi2(12)   =  15.14  Prob > chi2 =  0.234

Am I correctly interpreting the above?
1. The test statistic for gmm(lag_lfsato) and iv instruments are failing to reject the null and so are exogenous, which I believe means they are appropriately specified and the model is valid. Is that correct?
2. The statistic for the GMM instrument for levels is close to rejecting the null p<0.05 - meaning they are endogenous. I am a bit confused as I thought it was ok for the gmm-type instruments to be endogenous. Do I need to be concerned about this test statistic?
3. In this model I have 69 instruments for 6587 groups - is that an ok ratio?

Thank you in advance for your guidance.
Emma

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#4

11 Jul 2024, 09:19

1. That's generally correct.
2. These are instruments for endogenous regressors. The instruments themselves, however, should not be endogenous. Hence this rejection of the difference-in-Hansen test is indeed reason for concern. A system GMM estimator might not be appropriate. A difference GMM estimator is probably preferable.
3. That should be okay.

https://www.kripfganz.de/stata/
Comment
Emma Kemp

Join Date: Jun 2024

Posts: 6
#5

12 Jul 2024, 07:26

Thank you very much Sebastian - I really appreciate your help!
Comment

Announcement

System GMM - to address reverse causality

Comment

Comment

Comment

Comment