Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2SLS with binary endogenous variable

    Hi Statalist,

    To tackle issues of endogeneity through omitted variables and reverse causality, I am using two stage least squares regression (2SLS) to conduct my analysis. With the help of previous advice given on here this is the stata code so far:

    Dependent variable (binary): emotional
    Endogenous variable (binary): currwork
    Instrumental variable (binary): cheb

    I estimate the models using these commands which give identical results:


    Code:
    probit currwork  i.husjob attitude i.prevdv i.educgap i.educlvl agegap age agefrstmar i.religion i.urban i.geo_eg1988_2014 nsons i.hhkidlt5 i.wealthq i.year i.cheb
    predict workhat, xb
    ivregress 2sls emotional i.husjob attitude i.prevdv  i.educgap i.educlvl agegap age agefrstmar i.religion i.urban i.geo_eg1988_2014 nsons i.hhkidlt5 i.wealthq i.year (i.currwork=i.cheb workhat)
    and

    Code:
    ivreg2 emotional (i.currwork=i.cheb) i.husjob attitude i.prevdv i.educgap i.educlvl agegap age agefrstmar i.religion i.urban i.geo_eg1988_2014 nsons i.hhkidlt5 i.wealthq i.year, first
    I am slightly confused as I have read comments on statalist suggesting that you should not estimate the first stage using logistic/probit regression but instead using linear regression in the first stage because “in 2SLS the consistency of the estimates in the second stage are not dependent upon specifying the correct functional form in the first stage”.

    My question is am I estimating these models correctly? Or is this the “forbidden regression”?

    Regression output:

    Code:
    First-stage regressions
    -----------------------
    
    
    First-stage regression of 1.currwork:
    
    Statistics consistent for homoskedasticity only
    Number of obs =                   9691
    ----------------------------------------------------------------------------------------
                1.currwork |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
                    1.cheb |   .0623816   .0188212     3.31   0.001     .0254881     .099275
                           |
                    husjob |
         White collar job  |   .0794522    .026904     2.95   0.003     .0267147    .1321898
          Blue collar job  |   .0482838   .0264973     1.82   0.068    -.0036565    .1002242
                           |
                  attitude |  -.0011089   .0025856    -0.43   0.668    -.0061773    .0039595
                  1.prevdv |    .000505   .0101112     0.05   0.960     -.019315     .020325
                           |
                   educgap |
     Wife better educated  |  -.0717708   .0193309    -3.71   0.000    -.1096633   -.0338782
    Both equally educated  |  -.0217123   .0160946    -1.35   0.177    -.0532611    .0098365
                           |
                   educlvl |
                  primary  |  -.0486817   .0132463    -3.68   0.000    -.0746473   -.0227161
                secondary  |   .0835031    .018246     4.58   0.000     .0477372     .119269
                   higher  |   .3439955   .0241975    14.22   0.000     .2965633    .3914276
                           |
                    agegap |  -.0017554   .0008271    -2.12   0.034    -.0033767   -.0001341
                       age |   .0081049   .0007404    10.95   0.000     .0066536    .0095562
                agefrstmar |   .0056594   .0011651     4.86   0.000     .0033755    .0079433
                           |
                  religion |
                christian  |  -.0074209     .01782    -0.42   0.677    -.0423518    .0275101
                           |
                     urban |
                    rural  |   .0310775   .0113583     2.74   0.006      .008813    .0533421
                           |
           geo_eg1988_2014 |
              lower egypt  |   .0509753   .0125685     4.06   0.000     .0263384    .0756121
              upper egypt  |   .0183381   .0125682     1.46   0.145    -.0062982    .0429743
    frontier governorates  |   .0390965   .0182698     2.14   0.032     .0032839    .0749091
                           |
                     nsons |  -.0105289   .0040027    -2.63   0.009    -.0183749   -.0026828
                           |
                  hhkidlt5 |
                        1  |  -.0249247   .0110358    -2.26   0.024    -.0465571   -.0032923
                        2  |   -.039848   .0131426    -3.03   0.002    -.0656102   -.0140859
                       3+  |  -.0166016   .0198564    -0.84   0.403    -.0555242    .0223211
                           |
                   wealthq |
                   poorer  |  -.0595622    .012753    -4.67   0.000    -.0845607   -.0345637
                   middle  |  -.0659929   .0132747    -4.97   0.000     -.092014   -.0399717
                   richer  |  -.0514859   .0147724    -3.49   0.000     -.080443   -.0225289
                  richest  |  -.0389741   .0173162    -2.25   0.024    -.0729173   -.0050308
                           |
                      year |
                     2014  |  -.1019067    .007726   -13.19   0.000    -.1170512   -.0867622
                           |
                     _cons |  -.2760092   .0455071    -6.07   0.000    -.3652125   -.1868058
    ----------------------------------------------------------------------------------------
    F test of excluded instruments:
      F(  1,  9663) =    10.99
      Prob > F      =   0.0009
    Sanderson-Windmeijer multivariate F test of excluded instruments:
      F(  1,  9663) =    10.99
      Prob > F      =   0.0009
    
    
    
    Summary results for first-stage regressions
    -------------------------------------------
    
                                               (Underid)            (Weak id)
    Variable     | F(  1,  9663)  P-val | SW Chi-sq(  1) P-val | SW F(  1,  9663)
    1.currwork   |      10.99    0.0009 |       11.02   0.0009 |       10.99
    
    Stock-Yogo weak ID F test critical values for single endogenous regressor:
                                       10% maximal IV size             16.38
                                       15% maximal IV size              8.96
                                       20% maximal IV size              6.66
                                       25% maximal IV size              5.53
    Source: Stock-Yogo (2005).  Reproduced by permission.
    NB: Critical values are for Sanderson-Windmeijer F statistic.
    
    Underidentification test
    Ho: matrix of reduced form coefficients has rank=K1-1 (underidentified)
    Ha: matrix has rank=K1 (identified)
    Anderson canon. corr. LM statistic       Chi-sq(1)=11.00    P-val=0.0009
    
    Weak identification test
    Ho: equation is weakly identified
    Cragg-Donald Wald F statistic                                      10.99
    
    Stock-Yogo weak ID test critical values for K1=1 and L1=1:
                                       10% maximal IV size             16.38
                                       15% maximal IV size              8.96
                                       20% maximal IV size              6.66
                                       25% maximal IV size              5.53
    Source: Stock-Yogo (2005).  Reproduced by permission.
    
    Weak-instrument-robust inference
    Tests of joint significance of endogenous regressors B1 in main equation
    Ho: B1=0 and orthogonality conditions are valid
    Anderson-Rubin Wald test           F(1,9663)=     12.88     P-val=0.0003
    Anderson-Rubin Wald test           Chi-sq(1)=     12.92     P-val=0.0003
    Stock-Wright LM S statistic        Chi-sq(1)=     12.90     P-val=0.0003
    
    Number of observations               N  =       9691
    Number of regressors                 K  =         28
    Number of endogenous regressors      K1 =          1
    Number of instruments                L  =         28
    Number of excluded instruments       L1 =          1
    
    IV (2SLS) estimation
    --------------------
    
    Estimates efficient for homoskedasticity only
    Statistics consistent for homoskedasticity only
    
                                                          Number of obs =     9691
                                                          F( 27,  9663) =     7.87
                                                          Prob > F      =   0.0000
    Total (centered) SS     =  1341.189145                Centered R2   =  -0.9841
    Total (uncentered) SS   =         1608                Uncentered R2 =  -0.6549
    Residual SS             =  2661.085206                Root MSE      =     .524
    
    ----------------------------------------------------------------------------------------
                 emotional |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
                  currwork |
                      yes  |   1.082058    .433683     2.50   0.013     .2320545    1.932061
                           |
                    husjob |
         White collar job  |  -.0319561   .0526654    -0.61   0.544    -.1351783    .0712662
          Blue collar job  |   .0248605    .044106     0.56   0.573    -.0615856    .1113067
                           |
                  attitude |   .0163529   .0037506     4.36   0.000     .0090018     .023704
                  1.prevdv |   .1211017   .0145366     8.33   0.000     .0926104     .149593
                           |
                   educgap |
     Wife better educated  |    .149971   .0418604     3.58   0.000     .0679261    .2320159
    Both equally educated  |   .0727311   .0249662     2.91   0.004     .0237984    .1216639
                           |
                   educlvl |
                  primary  |   .0611606   .0281502     2.17   0.030     .0059872    .1163339
                secondary  |  -.1788748   .0454536    -3.94   0.000    -.2679623   -.0897873
                   higher  |   -.504343   .1539387    -3.28   0.001    -.8060573   -.2026286
                           |
                    agegap |   .0007305    .001412     0.52   0.605     -.002037     .003498
                       age |  -.0077304    .004027    -1.92   0.055    -.0156232    .0001624
                agefrstmar |  -.0084541   .0026322    -3.21   0.001    -.0136132   -.0032951
                           |
                  religion |
                christian  |  -.0278122   .0257252    -1.08   0.280    -.0782327    .0226083
                           |
                     urban |
                    rural  |  -.0739228   .0208361    -3.55   0.000    -.1147608   -.0330847
                           |
           geo_eg1988_2014 |
              lower egypt  |  -.0476695   .0286722    -1.66   0.096    -.1038659    .0085269
              upper egypt  |  -.0334369   .0193555    -1.73   0.084     -.071373    .0044993
    frontier governorates  |  -.0753515   .0308496    -2.44   0.015    -.1358156   -.0148873
                           |
                     nsons |   .0068156   .0068349     1.00   0.319    -.0065805    .0202117
                           |
                  hhkidlt5 |
                        1  |   .0111388   .0145799     0.76   0.445    -.0174373    .0397149
                        2  |   .0507056   .0192942     2.63   0.009     .0128897    .0885215
                       3+  |   .0227837   .0273771     0.83   0.405    -.0308745    .0764419
                           |
                   wealthq |
                   poorer  |   .0596156   .0315742     1.89   0.059    -.0022687       .1215
                   middle  |    .073244   .0343015     2.14   0.033     .0060144    .1404736
                   richer  |   .0450257   .0307738     1.46   0.143    -.0152898    .1053412
                  richest  |   .0089513   .0301276     0.30   0.766    -.0500976    .0680003
                           |
                      year |
                     2014  |   .1408985   .0452166     3.12   0.002     .0522756    .2295214
                           |
                     _cons |   .3966987   .1248004     3.18   0.001     .1520943     .641303
    ----------------------------------------------------------------------------------------
    Underidentification test (Anderson canon. corr. LM statistic):          11.005
                                                       Chi-sq(1) P-val =    0.0009
    ------------------------------------------------------------------------------
    Weak identification test (Cragg-Donald Wald F statistic):               10.985
    Stock-Yogo weak ID test critical values: 10% maximal IV size             16.38
                                             15% maximal IV size              8.96
                                             20% maximal IV size              6.66
                                             25% maximal IV size              5.53
    Source: Stock-Yogo (2005).  Reproduced by permission.
    ------------------------------------------------------------------------------
    Sargan statistic (overidentification test of all instruments):           0.000
                                                     (equation exactly identified)
    ------------------------------------------------------------------------------
    Instrumented:         1.currwork
    Included instruments: 1.husjob 2.husjob attitude 1.prevdv 2.educgap 3.educgap
                          1.educlvl 2.educlvl 3.educlvl agegap age agefrstmar
                          1.religion 2.urban 2.geo_eg1988_2014 3.geo_eg1988_2014
                          4.geo_eg1988_2014 nsons 1.hhkidlt5 2.hhkidlt5 3.hhkidlt5
                          2.wealthq 3.wealthq 4.wealthq 5.wealthq 2014.year
    Excluded instruments: 1.cheb
    ------------------------------------------------------------------------------

    Last edited by Sherine Maui; 31 Jul 2019, 12:03.

  • #2
    You'll increase your chances of a useful answer by following the FAQ on asking questions. Also, a smaller post is helpful.

    The consistency of 2sls with a binary endogenous variable means you can stay with ivreg or ivreg2. As long as you don't screw around with your own instrumentation and predictions, ivreg2 won't take you wrong.

    There may be more efficient estimators that do take into account the binary characteristic of the endogenous variable - look at the extended regression procedures, the GSEM/SEM procedures, and user-written cmp.

    Comment

    Working...
    X