Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • System GMM - to address reverse causality

    Hi everyone,

    I am looking to examine the relationship between perceived neighbourhood cohesion (NSC_index) and life satisfaction (lfsato). In my paper I have run an OLS model (as a benchmark), FE model and then I want to run a dynamic panel model using system GMM.

    I have 3 key questions regarding System GMM which I will outline below and would greatly appreciate any guidance:
    I am running the system GMM model (using stata 18) as follows:

    Code:
    xtset pipd wave
    
    xtabond2 lfsato lag_lfsato NSC_index income i.age_group_destr age2 jbstat_simple edu_simple marriage_status tenure_dummy addrmov_dummy aidhh_dummy hhsize_simple nchild_simple physical_health mental_health i.wave i.gor_dv [pweight=l_indscus_lw], gmm (lag_lfsato income marriage_status physical_health mental_health, collapse) iv( NSC_index i.age_group_destr age2 jbstat_simple edu_simple tenure_dummy addrmov_dummy aidhh_dummy hhsize_simple nchild_simple i.wave i.gor_dv) nodiffsargan robust small
    In terms of the endogenous variables specified by gmm : I have included lagged life satisfaction and then from the literature Piper (2023) states that marriage status, income and health are endogenous with life satisfaction, so I have included these as well. I also conducted a pairwise correlation matrix and VIF including all of my explanatory variables and life satisfaction and found that mental health was also quite highly correlated with life satisfaction so I have included this variable.

    I have then included all of the other explanatory variables from my OLS and FE regressions as exogenous instruments specified by iv.

    Q1. Is this the correct/valid way to decide which variables are endogenous/exogenous?

    Running the above code in Stata generates the following output:

    Code:
    . xtabond2 lfsato laglfsato3 NSC_index fihhmngrs1_dv i.age_group_destr age2 jbstat_simple edu_si
    > mple mastat_simple tenure_dummy addrmov_dummy aidhh_dummy hhsize_simple nchild_simple scsf1_co
    > mbined_r sf12mcs_dv i.wave i.gor_dv [pweight=l_indscus_lw], gmm (laglfsato3 fihhmngrs1_dv mast
    > at_simple scsf1_combined_r sf12mcs_dv, collapse) iv( NSC_index i.age_group_destr age2 jbstat_s
    > imple edu_simple tenure_dummy addrmov_dummy aidhh_dummy hhsize_simple nchild_simple i.wave i.g
    > or_dv) nodiffsargan robust small     
    Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
    1b.age_group_destr dropped due to collinearity
    7.age_group_destr dropped due to collinearity
    1b.wave dropped due to collinearity
    3.wave dropped due to collinearity
    1b.gor_dv dropped due to collinearity
    (sum of weights is 22647.4695)
    Warning: Two-step estimated covariance matrix of moments is singular.
      Using a generalized inverse to calculate robust weighting matrix for Hansen test.
    
    Dynamic panel-data estimation, one-step system GMM
    ------------------------------------------------------------------------------
    Group variable: pidp                            Number of obs      =     23378
    Time variable : wave                            Number of groups   =      6334
    Number of instruments = 53                      Obs per group: min =         1
    F(., 6333)    =         .                                      avg =      3.69
    Prob > F      =         .                                      max =         4
    -----------------------------------------------------------------------------------------------
                                  |               Robust
                           lfsato | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ------------------------------+----------------------------------------------------------------
                       laglfsato3 |    .083551   .0159013     5.25   0.000     .0523791    .1147229
                       NSC_index_ |   .1831389   .0231104     7.92   0.000     .1378347    .2284431
                    fihhmngrs1_dv |   7.19e-06   5.31e-06     1.35   0.176    -3.22e-06    .0000176
                                  |
                  age_group_destr |
                           18-24  |   .1436355   .1315885     1.09   0.275    -.1143224    .4015935
                           25-34  |  -.0245395   .1061021    -0.23   0.817    -.2325355    .1834565
                           35-44  |  -.1553759   .0900446    -1.73   0.084    -.3318939     .021142
                           45-54  |  -.2298868   .0702192    -3.27   0.001    -.3675402   -.0922334
                           55-64  |  -.1802464   .0472028    -3.82   0.000    -.2727798    -.087713
                                  |
                             age2 |   .0000259   .0000257     1.01   0.314    -.0000245    .0000763
                    jbstat_simple |  -.0034113   .0083322    -0.41   0.682    -.0197452    .0129225
                       edu_simple |  -.0472176    .012247    -3.86   0.000    -.0712258   -.0232093
                    mastat_simple |  -.0096751   .0375167    -0.26   0.797    -.0832206    .0638704
                     tenure_dummy |   .0607223    .028808     2.11   0.035     .0042489    .1171958
                    addrmov_dummy |   .1015179   .0492164     2.06   0.039     .0050371    .1979986
                      aidhh_dummy |  -.1121506   .0495233    -2.26   0.024     -.209233   -.0150682
                    hhsize_simple |   -.034727     .02944    -1.18   0.238    -.0924393    .0229853
                    nchild_simple |    .017693   .0200045     0.88   0.376    -.0215226    .0569086
                 scsf1_combined_r |   .1988844   .0238085     8.35   0.000     .1522117    .2455571
                       sf12mcs_dv |   .0488631   .0020779    23.52   0.000     .0447897    .0529364
                                  |
                             wave |
                               2  |  -.0387835   .0257759    -1.50   0.132     -.089313    .0117459
                               4  |   .0262173   .0260069     1.01   0.313    -.0247652    .0771997
                               5  |   .1596027   .0265361     6.01   0.000      .107583    .2116224
                                  |
                           gor_dv |
                 north west       |   -.011139   .0671737    -0.17   0.868    -.1428223    .1205443
    yorkshire and the humber  ..  |   .0114451   .0709075     0.16   0.872    -.1275576    .1504478
                 east midlands    |   .0307968   .0670911     0.46   0.646    -.1007246    .1623181
                 west midlands    |   .0510337   .0690321     0.74   0.460    -.0842926      .18636
                 east of england  |   .0066081   .0640761     0.10   0.918    -.1190027     .132219
                         london   |  -.1073267   .0741767    -1.45   0.148    -.2527381    .0380847
                 south east       |   .0033076   .0631424     0.05   0.958     -.120473    .1270882
                 south west       |   .0132228   .0638898     0.21   0.836    -.1120228    .1384685
                         wales    |   .0324638   .0703395     0.46   0.644    -.1054255    .1703531
                 scotland         |  -.0823215   .0720415    -1.14   0.253    -.2235473    .0589042
         northern ireland         |   .0762113   .0860584     0.89   0.376    -.0924923    .2449148
                                  |
                            _cons |   1.236654   .2329817     5.31   0.000     .7799315    1.693377
    -----------------------------------------------------------------------------------------------
    Instruments for first differences equation
      Standard
        D.(NSC_index_ 1b.age_group_destr 2.age_group_destr 3.age_group_destr
        4.age_group_destr 5.age_group_destr 6.age_group_destr 7.age_group_destr
        age2 jbstat_simple edu_simple tenure_dummy addrmov_dummy aidhh_dummy
        hhsize_simple nchild_simple 1b.wave 2.wave 3.wave 4.wave 5.wave 1b.gor_dv
        2.gor_dv 3.gor_dv 4.gor_dv 5.gor_dv 6.gor_dv 7.gor_dv 8.gor_dv 9.gor_dv
        10.gor_dv 11.gor_dv 12.gor_dv)
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        L(1/4).(laglfsato3 fihhmngrs1_dv mastat_simple scsf1_combined_r
        sf12mcs_dv) collapsed
    Instruments for levels equation
      Standard
        NSC_index_ 1b.age_group_destr 2.age_group_destr 3.age_group_destr
        4.age_group_destr 5.age_group_destr 6.age_group_destr 7.age_group_destr
        age2 jbstat_simple edu_simple tenure_dummy addrmov_dummy aidhh_dummy
        hhsize_simple nchild_simple 1b.wave 2.wave 3.wave 4.wave 5.wave 1b.gor_dv
        2.gor_dv 3.gor_dv 4.gor_dv 5.gor_dv 6.gor_dv 7.gor_dv 8.gor_dv 9.gor_dv
        10.gor_dv 11.gor_dv 12.gor_dv
        _cons
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        D.(laglfsato3 fihhmngrs1_dv mastat_simple scsf1_combined_r sf12mcs_dv)
        collapsed
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z = -24.57  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =   0.69  Pr > z =  0.490
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(19)   =  41.38  Prob > chi2 =  0.002
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(19)   =  24.70  Prob > chi2 =  0.171
      (Robust, but weakened by many instruments.)
    Q2. Does this seem correctly specified?

    Q3. I’m a bit concerned that the sargan test is still significant. Should I try and reduce the number of exogenous instruments or lags in my model? Or, is there an alternative way to address this issue?

    Thank you in advance for any advice or guidance you may be able to provide. I am very new to statistics and have spent a lot of time reading the documentation in stata and the empirical literature on how to best use GMM but would love some clarification on the above.

    Paper cited: Piper, Alan. (2023). What Does Dynamic Panel Analysis Tell Us About Life Satisfaction?. Review of Income and Wealth. 10.1111/roiw.12567.

    Many thanks,
    Emma

  • #2
    Q1: Looking at correlation matrices and VIFs might help with model building, but it cannot really answer the question whether your variables are endogenous or exogenous, unless you rule out by assumption that your variables might be correlated with anything that is unobserved. A better approach might be to look at incremental overidentification (difference-in-Hansen) tests; see slides 48 and following as well as slides 90 and following in my 2019 London Stata Conference presentation: Q2: There is no indication of misspecification in this output.

    Q3: The Sargan test is invalid for system GMM. You should only consider the Hansen test. Moreover, you should actually use the two-step instead of the one-step estimator to get efficient estimates.
    https://www.kripfganz.de/stata/

    Comment


    • #3
      Hi Sebastian,

      Thank you very much for your response - I really appreciate it!

      I have spent some time going through your conference slides and working through the sequential model selection process.
      I have landed on the following model. The Hansen test for over-identification is 0.228 which comes in just within Roodman's (2009) commensense value of over 0.1 and under 0.25. However I am having some trouble interpreting the difference in Hansen tests for exogeneity.

      Code:
       . xtabond2 lfsato lag_lfsato NSC_index log_hhincome age_dv age2 jbstat_simple edu_simple mastat_si
      > mple tenure_dummy addrmov_dummy aidhh_dummy hhsize_simple scsf1_combined_r sf12mcs_dv i.wave i.g
      > or_dv [pweight=l_indscus_lw], gmm (lag_lfsato) gmm (NSC_index log_hhincome mastat_simple scsf1_c
      > ombined_r sf12mcs_dv i.gor_dv, lag(1 2) collapse) iv(age_dv age2 jbstat_simple edu_simple tenure
      > _dummy addrmov_dummy aidhh_dummy hhsize_simple i.wave) robust small two  
      Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
      1b.wave dropped due to collinearity
      3.wave dropped due to collinearity
      1b.gor_dv dropped due to collinearity
      (sum of weights is 21629.783)
      Warning: Two-step estimated covariance matrix of moments is singular.
        Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
        Difference-in-Sargan/Hansen statistics may be negative.
      
      Dynamic panel-data estimation, two-step system GMM
      ------------------------------------------------------------------------------
      Group variable: pidp                            Number of obs      =     20253
      Time variable : wave                            Number of groups   =      6587
      Number of instruments = 69                      Obs per group: min =         1
      F(., 6586)    =         .                                      avg =      3.07
      Prob > F      =         .                                      max =         4
      -------------------------------------------------------------------------------------------------
                                      |              Corrected
                               lfsato | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      --------------------------------+----------------------------------------------------------------
                           lag_lfsato |   .0929353    .016181     5.74   0.000     .0612153    .1246554
                           NSC_index_ |   .1941958   .0372533     5.21   0.000     .1211673    .2672244
                         log_hhincome |   .0324155   .0396003     0.82   0.413    -.0452138    .1100449
                               age_dv |  -.0413708   .0080018    -5.17   0.000     -.057057   -.0256847
                                 age2 |   .0004406   .0000892     4.94   0.000     .0002657    .0006154
                        jbstat_simple |  -.0029496   .0082201    -0.36   0.720    -.0190636    .0131644
                           edu_simple |  -.0830624   .0238215    -3.49   0.000    -.1297602   -.0363646
                        mastat_simple |  -.0128715   .0355886    -0.36   0.718    -.0826367    .0568937
                         tenure_dummy |   .0079694   .0373529     0.21   0.831    -.0652544    .0811931
                        addrmov_dummy |    .084803   .0490902     1.73   0.084    -.0114297    .1810356
                          aidhh_dummy |  -.1575745   .0561456    -2.81   0.005     -.267638   -.0475109
                        hhsize_simple |   -.015033   .0251979    -0.60   0.551    -.0644291    .0343631
                     scsf1_combined_r |   .2103753   .0240397     8.75   0.000     .1632497    .2575009
                           sf12mcs_dv |   .0506123   .0021266    23.80   0.000     .0464435    .0547811
                                      |
                                 wave |
                                   2  |  -.0459699   .0279494    -1.64   0.100    -.1007597    .0088199
                                   4  |   .0413835   .0280656     1.47   0.140    -.0136341    .0964011
                                   5  |   .1611448   .0283927     5.68   0.000     .1054858    .2168038
                                      |
                               gor_dv |
                     north west       |   .0190738   .4778575     0.04   0.968    -.9176819    .9558295
      yorkshire and the humber    ..  |   .1159949   .3975749     0.29   0.770    -.6633808    .8953705
                     east midlands    |   .0291419   .4029905     0.07   0.942    -.7608502     .819134
                     west midlands    |  -.0906197   .3808375    -0.24   0.812    -.8371847    .6559452
                     east of england  |  -.0896637   .3827004    -0.23   0.815    -.8398805    .6605531
                             london   |  -.6188752   .5061444    -1.22   0.221    -1.611082    .3733318
                     south east       |  -.3028067    .400984    -0.76   0.450    -1.088865     .483252
                     south west       |  -.1844545   .3859682    -0.48   0.633    -.9410773    .5721683
                             wales    |  -.2461455   .5957432    -0.41   0.679    -1.413995    .9217043
                     scotland         |  -.0251545   .6380725    -0.04   0.969    -1.275983    1.225674
             northern ireland         |  -.4968346   .4622027    -1.07   0.282    -1.402902    .4092325
                                      |
                                _cons |    1.84457   .5243582     3.52   0.000     .8166582    2.872482
      -------------------------------------------------------------------------------------------------
      Instruments for first differences equation
        Standard
          D.(age_dv age2 jbstat_simple edu_simple tenure_dummy addrmov_dummy
          aidhh_dummy hhsize_simple 1b.wave 2.wave 3.wave 4.wave 5.wave)
        GMM-type (missing=0, separate instruments for each period unless collapsed)
          L(1/2).(NSC_index_ log_hhincome mastat_simple scsf1_combined_r sf12mcs_dv
          1b.gor_dv 2.gor_dv 3.gor_dv 4.gor_dv 5.gor_dv 6.gor_dv 7.gor_dv 8.gor_dv
          9.gor_dv 10.gor_dv 11.gor_dv 12.gor_dv) collapsed
          L(1/4).lag_lfsato
      Instruments for levels equation
        Standard
          age_dv age2 jbstat_simple edu_simple tenure_dummy addrmov_dummy
          aidhh_dummy hhsize_simple 1b.wave 2.wave 3.wave 4.wave 5.wave
          _cons
        GMM-type (missing=0, separate instruments for each period unless collapsed)
          D.(NSC_index_ log_hhincome mastat_simple scsf1_combined_r sf12mcs_dv
          1b.gor_dv 2.gor_dv 3.gor_dv 4.gor_dv 5.gor_dv 6.gor_dv 7.gor_dv 8.gor_dv
          9.gor_dv 10.gor_dv 11.gor_dv 12.gor_dv) collapsed
          D.lag_lfsato
      ------------------------------------------------------------------------------
      Arellano-Bond test for AR(1) in first differences: z = -20.16  Pr > z =  0.000
      Arellano-Bond test for AR(2) in first differences: z =   0.06  Pr > z =  0.956
      ------------------------------------------------------------------------------
      Sargan test of overid. restrictions: chi2(40)   =  89.60  Prob > chi2 =  0.000
        (Not robust, but not weakened by many instruments.)
      Hansen test of overid. restrictions: chi2(40)   =  46.31  Prob > chi2 =  0.228
        (Robust, but weakened by many instruments.)
      Code:
      Difference-in-Hansen tests of exogeneity of instrument subsets:
        GMM instruments for levels
          Hansen test excluding group:     chi2(21)   =  13.68  Prob > chi2 =  0.883
          Difference (null H = exogenous): chi2(19)   =  32.63  Prob > chi2 =  0.026
        gmm(lag_lfsato, lag(1 .))
          Hansen test excluding group:     chi2(31)   =  36.75  Prob > chi2 =  0.220
          Difference (null H = exogenous): chi2(9)    =   9.56  Prob > chi2 =  0.387
        iv(age_dv age2 jbstat_simple edu_simple tenure_dummy addrmov_dummy aidhh_dummy hhsize_simple 1b.
      > wave 2.wave 3.wave 4.wave 5.wave)
          Hansen test excluding group:     chi2(28)   =  31.17  Prob > chi2 =  0.310
          Difference (null H = exogenous): chi2(12)   =  15.14  Prob > chi2 =  0.234
      Am I correctly interpreting the above?
      1. The test statistic for gmm(lag_lfsato) and iv instruments are failing to reject the null and so are exogenous, which I believe means they are appropriately specified and the model is valid. Is that correct?
      2. The statistic for the GMM instrument for levels is close to rejecting the null p<0.05 - meaning they are endogenous. I am a bit confused as I thought it was ok for the gmm-type instruments to be endogenous. Do I need to be concerned about this test statistic?
      3. In this model I have 69 instruments for 6587 groups - is that an ok ratio?

      Thank you in advance for your guidance.
      Emma

      Comment


      • #4
        1. That's generally correct.
        2. These are instruments for endogenous regressors. The instruments themselves, however, should not be endogenous. Hence this rejection of the difference-in-Hansen test is indeed reason for concern. A system GMM estimator might not be appropriate. A difference GMM estimator is probably preferable.
        3. That should be okay.
        https://www.kripfganz.de/stata/

        Comment


        • #5
          Thank you very much Sebastian - I really appreciate your help!

          Comment

          Working...
          X