Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • IVREG2H - Bug when combining generated and external instruments

    Dear All,

    As the title suggests, I found out that the output from ivreg2h when using both the generated and external instruments.
    More specifically, in the process of generating the Lewbel (2012) instruments, ivreg2h did not use all the available exogenous variables in the endogenous filtering but only used the constant.
    This can be temporarily fixed by disabling command line 664 and re-enabling command line 663 in ivreg2h.ado

    The code below illustrates this error:

    Code:
    *    Setup
    webuse abdata, clear
    qui {
    keep if year>1978 & year<1983
    
    *    Two results below are identical. 
    *         (i) Result of ivreg2h
    ivreg2h n w ys ( k = kL1), gen(iv_, replace)
    estimates store result1
    
    *         (ii) Result of ivreg2 with generated IVs manual calculation
    qui {
        tempvar e_
        reg k             // Bug is in here! Regression k on constant only
        predict double `e_', res
        foreach var of varlist w ys {
        sum `var', mean
        gen double k_`var'_g = (`var'-r(mean))*`e_'
        }
    }
    
    ivreg2 n w ys (k = kL1 k_w_g k_ys_g), orthog(kL1)
    estimates store result2
    }
    
    esttab result1 result2, b(6) se(6) ///
        stat(N r2_a sargan sarganp idstat idp widstat cstat cstatp, ///
            l("N" "Adj R-sq" "Sargan" "Sargan p" "Underid" "Underid p" ///
                "Cragg-Donald F" "C stat" "C p") ///
            fmt(%9.0g %9.6f)) ///
        mtit("IVREG2H" "Manual calc.")

    Code:
    --------------------------------------------
                          (1)             (2)   
                      IVREG2H    Manual calc.   
    --------------------------------------------
    k                0.820610***     0.820610***
                   (0.014848)      (0.014848)   
    
    w               -0.405216***    -0.405216***
                   (0.090302)      (0.090302)   
    
    ys               0.037689        0.037689   
                   (0.271396)      (0.271396)   
    
    _cons            2.502956        2.502956   
                   (1.309752)      (1.309752)   
    --------------------------------------------
    N                     560             560   
    Adj R-sq         0.843894        0.843894   
    Sargan           0.873115        0.873115   
    Sargan p         0.646257        0.646257   
    Underid          5.54e+02        5.54e+02   
    Underid p        0.000000        0.000000   
    Cragg-Dona~F     1.76e+04        1.76e+04   
    C stat           0.670040        0.670040   
    C p              0.413038        0.413038   
    --------------------------------------------
    Standard errors in parentheses
    * p<0.05, ** p<0.01, *** p<0.001
    Manh Hoang-Ba,
    Facebook,
    Eureka! Uni - YouTube,
    ManhHB94 (Manh Hoang Ba),
    Hoàng Bá Mạnh – Kinh tế lượng: Lý thuyết và ứng dụng

  • #2
    Excuse me

    I have the problem actually. I am using the command:

    ivreg2h income $covlist (depression = death)

    and it generates three sets of results:

    – With IV

    – With generated instruments

    – With both combined




    However, outreg2 only shows the last table.

    How can I export or display all three results/tables from my regression, since the command outputs them at the same time?




    Thank you for your help!

    Comment


    • #3
      Originally posted by Vinbamba Boris Laurence View Post
      Excuse me

      I have the problem actually. I am using the command:

      ivreg2h income $covlist (depression = death)

      and it generates three sets of results:

      – With IV

      – With generated instruments

      – With both combined




      However, outreg2 only shows the last table.

      How can I export or display all three results/tables from my regression, since the command outputs them at the same time?




      Thank you for your help!

      As I understand only the final result is stored in e().
      If you want to store both results above, you need to estimate them manually with ivreg2 and ivreg2h:
      Code:
      * First result: only use external IV
      ivreg2 income $covlist (depression = death)
      estimates store result1
      * Second result: only use generated IV
      ivreg2h income $covlist (depression = )
      estimates store result2
      Manh Hoang-Ba,
      Facebook,
      Eureka! Uni - YouTube,
      ManhHB94 (Manh Hoang Ba),
      Hoàng Bá Mạnh – Kinh tế lượng: Lý thuyết và ứng dụng

      Comment


      • #4
        Thank you so much Manh Hoang Ba .it was very useful. I have another question, please.

        Comment


        • #5
          I conducted a heterogeneity test with the telework variable, but I found almost the same coefficients before and after using internal instruments. I have made all the necessary corrections and verification checks. How can this result be explained?

          Comment


          • #6
            Tableau 14 : Test d’hétérogénéité – Télétravail

            Sans instruments Avec instruments
            Échantillon total
            (1)
            Femme
            (2)
            Homme
            (3)
            Échantillon total
            (4)

            Femme
            (5)
            Homme
            (6)

            Variables
            Variable dépendante : Performance
            Détresse psychologique -0.620*** -0.583*** -0.720*** -0.619*** -0.581*** -0.721***
            (0.0212) (0.0251) (0.0399) (0.0212) (0.0252) (0.0400)
            Personnel permanent -0.00906 0.0308 -0.110** -0.00903 0.0308 -0.110**
            (0.0251) (0.0288) (0.0502) (0.0251) (0.0288) (0.0502)
            Personnes à charge 0.00958 0.0117 0.00650 0.00947 0.0115 0.00648
            (0.00917) (0.0109) (0.0171) (0.00916) (0.0109) (0.0171)
            Age 0.0149*** 0.0146** 0.0149 0.0147*** 0.0144** 0.0149
            (0.00541) (0.00632) (0.0105) (0.00541) (0.00633) (0.0105)
            Minorité visible 0.221*** 0.162*** 0.331*** 0.221*** 0.163*** 0.331***
            (0.0259) (0.0309) (0.0477) (0.0259) (0.0309) (0.0477)
            Télétravail -0.157*** -0.199*** -0.0673 -0.157*** -0.199*** -0.0673
            (0.0247) (0.0292) (0.0465) (0.0247) (0.0291) (0.0464)
            Statut de handicap -0.0757** -0.0691* -0.0717 -0.0745** -0.0677* -0.0714
            (0.0342) (0.0402) (0.0659) (0.0343) (0.0403) (0.0659)
            Status LGBTQ+ 0.0220 -0.0249 0.135* 0.0226 -0.0244 0.135**
            (0.0392) (0.0490) (0.0689) (0.0392) (0.0490) (0.0689)

            Femme 0.156*** 0.1558***
            (0.0235) (0.0241)

            Interaction :PD-Télétravail -0.103*** -0.127*** -0.0253 -0.108*** -0.134*** -0.0245
            (0.0243) (0.0288) (0.0458) (0.0253) (0.0300) (0.0471)
            Constant 7.627*** 7.791*** 7.590*** 7.628*** 7.792*** 7.590***
            (0.0430) (0.0469) (0.0738) (0.0430) (0.0469) (0.0738)
            N 16,802 11,818 4,929 16,802 11,818 4,929
            F-value 5957.97 4362.26 2739.26
            LM Chi2 [P-value] 3037.14[0.000] 2114.49[0.000] 923.74[0.000]
            J Chi2 [P-value] 49.18 [ 0.000] 31.71 [0.0015] 25.002 [0.0148]

            Comment


            • #7
              Hi Vinbamba Boris Laurence,
              I understand that you are doing heterogeneity analysis based on clusters of telework variable by performing 2SLS on each cluster with internal instruments. Is that correct? If so,:
              • The regression parameters appear to be quite consistent across the different telework clusters. If they are also consistent with the parameters estimated for the overall sample, that would be a good sign. It would imply that you can use the results of the model for the overall sample without worrying about the heterogeneity caused by telework.
              • However, I concern that the instruments do not pass the J test. It is possible that the second-order moment assumptions are not satisfied in each telework cluster. If the model for the entire sample passes the J test, that would be evidence to support the above statement.
              Manh Hoang-Ba,
              Facebook,
              Eureka! Uni - YouTube,
              ManhHB94 (Manh Hoang Ba),
              Hoàng Bá Mạnh – Kinh tế lượng: Lý thuyết và ứng dụng

              Comment


              • #8
                Hi Manh Hoang Ba
                Yes, that’s more or less it. I don’t have any external instrumental variable for this paper, so I’m only using internal instruments with the command:
                ivreg2h performance $covlist (Distress = ).

                I have a question, if you don’t mind:
                With the Lewbel method, is the Sargan–Hansen test the most relevant one to determine whether endogeneity has been properly corrected?

                Please take a look at the results I obtained from one of my regressions using another dataset.

                Comment


                • #9
                  ivreg2h current_work_status $covlist (SDdep = chronic) if sex == 1

                  Standard IV Results


                  IV (2SLS) estimation
                  --------------------

                  Estimates efficient for homoskedasticity only
                  Statistics consistent for homoskedasticity only

                  Number of obs = 13183
                  F( 12, 13170) = 27.19
                  Prob > F = 0.0000
                  Total (centered) SS = 3044.072669 Centered R2 = -3.4957
                  Total (uncentered) SS = 4770 Uncentered R2 = -1.8690
                  Residual SS = 13685.30625 Root MSE = 1.019

                  ------------------------------------------------------------------------------
                  current_wo~s | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                  SDdep | .8261517 .2714369 3.04 0.002 .2941452 1.358158
                  age | .0093758 .001326 7.07 0.000 .0067769 .0119747
                  livChild | .0099277 .0082117 1.21 0.227 -.0061669 .0260223
                  single | -.0434256 .0677992 -0.64 0.522 -.1763096 .0894584
                  unionStatus | -.0684259 .0552477 -1.24 0.216 -.1767094 .0398576
                  separated | -.0210247 .0582951 -0.36 0.718 -.1352811 .0932316
                  primary | .150198 .0361088 4.16 0.000 .079426 .22097
                  secondary | .1110898 .0361113 3.08 0.002 .040313 .1818667
                  higher | .2977971 .059962 4.97 0.000 .1802738 .4153205
                  media | .112286 .026248 4.28 0.000 .0608409 .1637312
                  internet | .2069246 .0342478 6.04 0.000 .1398001 .2740491
                  religion | -.1883852 .0573443 -3.29 0.001 -.300778 -.0759924
                  _cons | .0232787 .0791549 0.29 0.769 -.1318622 .1784195
                  ------------------------------------------------------------------------------
                  Underidentification test (Anderson canon. corr. LM statistic): 11.915
                  Chi-sq(1) P-val = 0.0006
                  ------------------------------------------------------------------------------
                  Weak identification test (Cragg-Donald Wald F statistic): 11.914
                  Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
                  15% maximal IV size 8.96
                  20% maximal IV size 6.66
                  25% maximal IV size 5.53
                  Source: Stock-Yogo (2005). Reproduced by permission.
                  ------------------------------------------------------------------------------
                  Sargan statistic (overidentification test of all instruments): 0.000
                  (equation exactly identified)
                  ------------------------------------------------------------------------------
                  Instrumented: SDdep
                  Included instruments: age livChild single unionStatus separated primary
                  secondary higher media internet religion
                  Excluded instruments: chronic
                  ------------------------------------------------------------------------------

                  IV with Generated Instruments only

                  Instruments created from Z:
                  age livChild single unionStatus separated primary secondary higher media internet religion

                  IV (2SLS) estimation
                  --------------------

                  Estimates efficient for homoskedasticity only
                  Statistics consistent for homoskedasticity only

                  Number of obs = 13183
                  F( 12, 13170) = 133.32
                  Prob > F = 0.0000
                  Total (centered) SS = 3044.072669 Centered R2 = 0.1032
                  Total (uncentered) SS = 4770 Uncentered R2 = 0.4277
                  Residual SS = 2730.01018 Root MSE = .4551

                  ------------------------------------------------------------------------------
                  current_wo~s | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                  SDdep | -.0467523 .0142918 -3.27 0.001 -.0747638 -.0187408
                  age | .0111305 .0005405 20.59 0.000 .0100711 .0121899
                  livChild | -.0063492 .0029004 -2.19 0.029 -.0120339 -.0006646
                  single | -.157409 .0258815 -6.08 0.000 -.2081358 -.1066822
                  unionStatus | -.1287071 .0232329 -5.54 0.000 -.1742428 -.0831714
                  separated | .0165448 .025516 0.65 0.517 -.0334657 .0665552
                  primary | .0632179 .0107792 5.86 0.000 .0420909 .0843448
                  secondary | .0460957 .0134083 3.44 0.001 .0198159 .0723756
                  higher | .2335162 .0252713 9.24 0.000 .1839854 .2830471
                  media | .0598434 .0092263 6.49 0.000 .0417602 .0779265
                  internet | .1304806 .0110826 11.77 0.000 .108759 .1522022
                  religion | -.0392075 .015254 -2.57 0.010 -.0691047 -.0093102
                  _cons | .0971004 .0338558 2.87 0.004 .0307443 .1634565
                  ------------------------------------------------------------------------------
                  Underidentification test (Anderson canon. corr. LM statistic): 857.386
                  Chi-sq(11) P-val = 0.0000
                  ------------------------------------------------------------------------------
                  Weak identification test (Cragg-Donald Wald F statistic): 83.221
                  Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 20.90
                  10% maximal IV relative bias 11.51
                  20% maximal IV relative bias 6.56
                  30% maximal IV relative bias 4.80
                  10% maximal IV size 40.90
                  15% maximal IV size 22.06
                  20% maximal IV size 15.56
                  25% maximal IV size 12.23
                  Source: Stock-Yogo (2005). Reproduced by permission.
                  ------------------------------------------------------------------------------
                  Sargan statistic (overidentification test of all instruments): 23.115
                  Chi-sq(10) P-val = 0.0103
                  ------------------------------------------------------------------------------
                  Instrumented: SDdep
                  Included instruments: age livChild single unionStatus separated primary
                  secondary higher media internet religion
                  Excluded instruments: SDdep_age_g SDdep_livChild_g SDdep_single_g
                  SDdep_unionStatus_g SDdep_separated_g SDdep_primary_g
                  SDdep_secondary_g SDdep_higher_g SDdep_media_g
                  SDdep_internet_g SDdep_religion_g
                  ------------------------------------------------------------------------------

                  IV with Generated Instruments and External Instruments

                  Testing Orthogonality of Instruments created from Z:
                  age livChild single unionStatus separated primary secondary higher media internet religion

                  IV (2SLS) estimation
                  --------------------

                  Estimates efficient for homoskedasticity only
                  Statistics consistent for homoskedasticity only

                  Number of obs = 13183
                  F( 12, 13170) = 133.40
                  Prob > F = 0.0000
                  Total (centered) SS = 3044.072669 Centered R2 = 0.1052
                  Total (uncentered) SS = 4770 Uncentered R2 = 0.4290
                  Residual SS = 2723.819193 Root MSE = .4546

                  ------------------------------------------------------------------------------
                  current_wo~s | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                  SDdep | -.0402776 .0141801 -2.84 0.005 -.0680702 -.0124851
                  age | .0111175 .0005399 20.59 0.000 .0100593 .0121757
                  livChild | -.0062285 .0028969 -2.15 0.032 -.0119064 -.0005506
                  single | -.1565635 .0258512 -6.06 0.000 -.207231 -.105896
                  unionStatus | -.12826 .0232063 -5.53 0.000 -.1737435 -.0827764
                  separated | .0162661 .0254869 0.64 0.523 -.0336874 .0662196
                  primary | .063863 .0107658 5.93 0.000 .0427625 .0849636
                  secondary | .0465778 .0133926 3.48 0.001 .0203289 .0728267
                  higher | .233993 .0252424 9.27 0.000 .1845189 .2834672
                  media | .0602324 .0092153 6.54 0.000 .0421708 .078294
                  internet | .1310476 .0110691 11.84 0.000 .1093525 .1527427
                  religion | -.040314 .0152341 -2.65 0.008 -.0701722 -.0104558
                  _cons | .0965529 .0338171 2.86 0.004 .0302726 .1628331
                  ------------------------------------------------------------------------------
                  Underidentification test (Anderson canon. corr. LM statistic): 868.972
                  Chi-sq(12) P-val = 0.0000
                  ------------------------------------------------------------------------------
                  Weak identification test (Cragg-Donald Wald F statistic): 77.383
                  Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 21.01
                  10% maximal IV relative bias 11.52
                  20% maximal IV relative bias 6.53
                  30% maximal IV relative bias 4.75
                  10% maximal IV size 43.27
                  15% maximal IV size 23.24
                  20% maximal IV size 16.35
                  25% maximal IV size 12.82
                  Source: Stock-Yogo (2005). Reproduced by permission.
                  ------------------------------------------------------------------------------
                  Sargan statistic (overidentification test of all instruments): 77.545
                  Chi-sq(11) P-val = 0.0000
                  -orthog- option:
                  Sargan statistic (eqn. excluding suspect orthogonality conditions): 26.324
                  Chi-sq(10) P-val = 0.0033
                  C statistic (exogeneity/orthogonality of suspect instruments): 51.221
                  Chi-sq(1) P-val = 0.0000
                  Instruments tested: chronic
                  ------------------------------------------------------------------------------
                  Instrumented: SDdep
                  Included instruments: age livChild single unionStatus separated primary
                  secondary higher media internet religion
                  Excluded instruments: chronic SDdep_age_g SDdep_livChild_g SDdep_single_g
                  SDdep_unionStatus_g SDdep_separated_g SDdep_primary_g
                  SDdep_secondary_g SDdep_higher_g SDdep_media_g
                  SDdep_internet_g SDdep_religion_g
                  ------------------------------------------------------------------------------

                  -----------------------------------------------------------
                  Variable | StdIV GenInst GenExtInst
                  -------------+---------------------------------------------
                  SDdep | .8262 -.04675 -.04028
                  | .271 .0143 .0142
                  age | .009376 .01113 .01112
                  | .00133 .00054 .00054
                  livChild | .009928 -.006349 -.006229
                  | .00821 .0029 .0029
                  single | -.04343 -.1574 -.1566
                  | .0678 .0259 .0259
                  unionStatus | -.06843 -.1287 -.1283
                  | .0552 .0232 .0232
                  separated | -.02102 .01654 .01627
                  | .0583 .0255 .0255
                  primary | .1502 .06322 .06386
                  | .0361 .0108 .0108
                  secondary | .1111 .0461 .04658
                  | .0361 .0134 .0134
                  higher | .2978 .2335 .234
                  | .06 .0253 .0252
                  media | .1123 .05984 .06023
                  | .0262 .00923 .00922
                  internet | .2069 .1305 .131
                  | .0342 .0111 .0111
                  religion | -.1884 -.03921 -.04031
                  | .0573 .0153 .0152
                  _cons | .02328 .0971 .09655
                  | .0792 .0339 .0338
                  -------------+---------------------------------------------
                  N | 13183 13183 13183
                  rmse | 1.02 .455 .455
                  j | 0 23.1 77.5
                  jdf | 0 10 11
                  jp | .0103 4.4e-12
                  -----------------------------------------------------------
                  legend: b/

                  Comment


                  • #10
                    Thanks to Manh Hoang Ba for bringing this to my attention. This bug has now been squashed, as has another which gave an erroneous list of excluded instruments when a hyphenated earliest was used. This did not affect the computation, only the display. the z() option should now work properly to allow the specification of a subset of X variables in the Z matrix.

                    --KIt

                    Comment


                    • #11
                      Thanks to Manh Hoang Ba for bringing this to my attention. This bug has now been squashed, as has another which gave an erroneous list of excluded instruments when a hyphenated earliest was used. This did not affect the computation, only the display. the z() option should now work properly to allow the specification of a subset of X variables in the Z matrix.

                      --KIt

                      Comment


                      • #12
                        Thanks to Manh Hoang Ba for pointing this out. This bug has been squashed, as well as another issue that prevented the display of excluded instruments to be listed properly when a hyphenated varlist was used. This did not affect the computations. The ivreg2h package has been updated on SSC; use ado update if it is already installed.

                        --Kit

                        Comment


                        • #13
                          Thanks Kit.

                          Comment


                          • #14
                            Thanks Prof. KitBaum
                            Manh Hoang-Ba,
                            Facebook,
                            Eureka! Uni - YouTube,
                            ManhHB94 (Manh Hoang Ba),
                            Hoàng Bá Mạnh – Kinh tế lượng: Lý thuyết và ứng dụng

                            Comment


                            • #15
                              Hi Vinbamba Boris Laurence ,

                              The Sargan-Hansen test helps to detect whether there is a problem with the set of instruments used by verifying the validity of the over-identification restrictions. A small p-value suggests a problem, so the amount of bias from the endogenous variable is not yet resolved.

                              If you have an existing instrument and the exogeneity has been verified in advance or there is strong theory to support it, then the Sargan-Hansen conclusion may be less important. But that is not the case when you are using purely constructed instruments. It depends heavily on the assumptions of the second moment of the residuals of the endogenous variable equation, and the assumption of the exogenous independent variable already in the model.
                              Manh Hoang-Ba,
                              Facebook,
                              Eureka! Uni - YouTube,
                              ManhHB94 (Manh Hoang Ba),
                              Hoàng Bá Mạnh – Kinh tế lượng: Lý thuyết và ứng dụng

                              Comment

                              Working...
                              X