SUR due to weak instrument?

Kerstin Schmidt

Join Date: Apr 2017
Posts: 125

SUR due to weak instrument?

29 Sep 2022, 16:17

Dear Statalist community,

I estimate the determinants of insurance type adoption (categorical variable) and my main variable of interest (x1) is endogenous.
Therefore, I am first applying an IV model:

Code:

ivreg2 insurance_type $x10_L1 (x1 = z1 z2 z3), endog(x1) cluster(HHID)

HTML Code:

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on HHID

Number of clusters (HHID) =        129                Number of obs =      589
                                                      F( 42,   128) =    11.06
                                                      Prob > F      =   0.0000
Total (centered) SS     =  690.0203735                Centered R2   =   0.1274
Total (uncentered) SS   =         2722                Uncentered R2 =   0.7788
Residual SS             =  602.1361466                Root MSE      =    1.011

------------------------------------------------------------------------------
             |               Robust
insurance_~e |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   1.270311   .4350326     2.92   0.003     .4176624    2.122959
          x2 |   1.556313   .6290618     2.47   0.013     .3233748    2.789252
             |
    round |
          2  |  -.1550322   .1480736    -1.05   0.295    -.4452511    .1351867
          3  |    .209987   .1524959     1.38   0.169    -.0888995    .5088734
          4  |  -.0040926   .1402074    -0.03   0.977    -.2788941    .2707089
          5  |   .2497302   .2458898     1.02   0.310     -.232205    .7316653
             |
        1.x3 |  -.0609238   .1253715    -0.49   0.627    -.3066474    .1847998
          x4 |  -.0000157   .0000518    -0.30   0.762    -.0001172    .0000859
          x5 |    .000725   .0054283     0.13   0.894    -.0099142    .0113643
          x6 |   .0016521   .0005774     2.86   0.004     .0005203    .0027838
        1.x7 |   .1747743   .1917179     0.91   0.362     -.200986    .5505345
        1.x8 |   .3159903   .1993367     1.59   0.113    -.0747025     .706683
             |
       x7#x8 |
        1 1  |  -.3296806   .2209396    -1.49   0.136    -.7627143    .1033531
             |
          E1 |
        yes  |   .3046483   .1316344     2.31   0.021     .0466497     .562647
        1.x9 |  -.2372301   .1924474    -1.23   0.218    -.6144201    .1399599
       1.x10 |  -.3569485   .2012014    -1.77   0.076     -.751296     .037399
             |
      x9#x10 |
        1 1  |   .4888955   .2670217     1.83   0.067    -.0344575    1.012248
             |
       1.x11 |   .2163405   .1425013     1.52   0.129    -.0629569    .4956379
       1.x12 |     .26867   .1299768     2.07   0.039     .0139201    .5234199
         x13 |   6.33e-06   8.83e-06     0.72   0.473     -.000011    .0000236
         x14 |   -.003812   .0024489    -1.56   0.120    -.0086117    .0009877
         x15 |   .1037938   .0552468     1.88   0.060     -.004488    .2120757
       _cons |  -18.08196   7.160276    -2.53   0.012    -32.11584   -4.048077
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):             13.826
                                                   Chi-sq(3) P-val =    0.0032
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):                6.773
                         (Kleibergen-Paap rk Wald F statistic):          5.509
Stock-Yogo weak ID test critical values:  5% maximal IV relative bias    13.91
                                         10% maximal IV relative bias     9.08
                                         20% maximal IV relative bias     6.46
                                         30% maximal IV relative bias     5.39
                                         10% maximal IV size             22.30
                                         15% maximal IV size             12.83
                                         20% maximal IV size              9.54
                                         25% maximal IV size              7.80
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):         0.958
                                                   Chi-sq(2) P-val =    0.6194
-endog- option:
Endogeneity test of endogenous regressors:                               5.312
                                                   Chi-sq(1) P-val =    0.0212
Regressors tested:    x1

The IV diagnosis reveals that the instruments are weak (20-30% maximal IV relative bias).

I could not find better instruments and thus thought of an alternative model: the seemingly unrelated regression. If I have information on what determines the endogenous variable, I should be able to model both equations simultaneously, right?

Code:

cmp (insurance_type = x1 $x10_L1) (x1 = x16 x17 x18 x19 i.round i.G_num),  cluster(HHID) ind($cmp_oprobit $cmp_cont)

HTML Code:

Mixed-process regression                        Number of obs     =        589
                                                Wald chi2(70)     =   29571.55
Log pseudolikelihood = -1112.1264               Prob > chi2       =     0.0000

                                   (Std. Err. adjusted for 129 clusters in HHID)
--------------------------------------------------------------------------------
               |               Robust
               |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
insurance_type |
            x1 |   1.659372   1.568443     1.06   0.290    -1.414719    4.733463
            x2 |   1.136268   1.529741     0.74   0.458     -1.86197    4.134506
                 |
         round |
            2  |   -.069397   .1928497    -0.36   0.719    -.4473755    .3085816
            3  |   .2372032   .1573779     1.51   0.132    -.0712519    .5456583
            4  |  -.0611497   .4463906    -0.14   0.891    -.9360591    .8137597
            5  |   .4501269   .7982792     0.56   0.573    -1.114472    2.014725
               |
          1.x3 |  -.0658192   .1158058    -0.57   0.570    -.2927944     .161156
            x4 |  -2.56e-06   .0000466    -0.05   0.956    -.0000939    .0000888
            x5 |   .0027894    .004861     0.57   0.566     -.006738    .0123169
            x6 |   .0030656   .0037066     0.83   0.408    -.0041991    .0103304
          1.x7 |   .1123516   .2617524     0.43   0.668    -.4006737    .6253768
          1.x8 |   .1835242   .3323619     0.55   0.581    -.4678931    .8349415
               |
         x7#x8 |
          1 1  |  -.1432853   .3310573    -0.43   0.665    -.7921458    .5055751
               |
            E1 |
          yes  |   .3274691   .1456256     2.25   0.025     .0420482      .61289
          1.x9 |   -.118286     .27248    -0.43   0.664     -.652337    .4157649
         1.x10 |  -.2526056   .4006427    -0.63   0.528    -1.037851    .5326397
               |
        x9#x10 |
          1 1  |   .3152279   .5108601     0.62   0.537    -.6860395    1.316495
               |
         1.x11 |  -.0125016   .3352803    -0.04   0.970     -.669639    .6446358
         1.x12 |   .2116942   .2585586     0.82   0.413    -.2950714    .7184597
           x13 |   6.48e-06   9.72e-06     0.67   0.505    -.0000126    .0000255
           x14 |  -.0029787   .0038345    -0.78   0.437    -.0104942    .0045369
           x15 |   .0884911   .1577662     0.56   0.575    -.2207249    .3977072
---------------+----------------------------------------------------------------
x1             |
           x16 |   .3456028   .3272107     1.06   0.291    -.2957185     .986924
           x17 |  -.6661752   .4255005    -1.57   0.117    -1.500141    .1677904
           x18 |   .0055224   .0077664     0.71   0.477    -.0096995    .0207443
           x19 |   .0214949   .2505939     0.09   0.932      -.46966    .5126498
               |
         round |
            2  |  -.0007937    .071285    -0.01   0.991    -.1405096    .1389223
            3  |  -.0940055   .0818426    -1.15   0.251     -.254414     .066403
            4  |   .1362392   .0771827     1.77   0.078     -.015036    .2875145
            5  |  -.3879953   .0909713    -4.27   0.000    -.5662958   -.2096949
               |
       
---------------+----------------------------------------------------------------
      /cut_1_1 |   15.27931   11.73498     1.30   0.193     -7.72083    38.27946
      /cut_1_2 |   15.70747   12.15701     1.29   0.196    -8.119827    39.53476
      /cut_1_3 |   16.51199    12.9449     1.28   0.202    -8.859535    41.88352
      /lnsig_2 |  -.6817145   .0314066   -21.71   0.000    -.7432702   -.6201587
  /atanhrho_12 |   -.755771   1.741556    -0.43   0.664    -4.169158    2.657616
---------------+----------------------------------------------------------------
         sig_2 |   .5057492   .0158838                      .4755562    .5378591
        rho_12 |  -.6385792   1.031378                     -.9995218    .9902158
--------------------------------------------------------------------------------

The average marginal effects yield expected results. However, I not sure whether I should worry about a non-significant /atanhrho_12, when the /lnsig_2 is highly significant.

What do you think?
Any complaints about this overall procedure?

Thanks!

Tags: None

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2289
#2

29 Sep 2022, 22:17

That’s really a simultaneous equations model where you not only have several outside instruments—everything except round in the second equation—but you’ve also made numerous exclusion restrictions in what would normally be the unrestricted reduced form for x1. Altogether you’ve imposed around 20 restrictions. This would be very suspect in economic applications.
Comment
Kerstin Schmidt

Join Date: Apr 2017

Posts: 125
#3

29 Sep 2022, 23:39

Thanks Jeff!
I am not sure whether I truely understand. Let's say I am interested in estimating an individual adoption equation and particularly the effect of peer adoption on individual decision-making, but assume peer adoption to be endogenous. Literature here suggests to either use the lagged peer variable (does not make sense in my application due to shocks) or IV. IV is suffering from weak instruments (20-30% maximal IV relative bias -- see in #1 -- and other instruments are not available). If I have data on what determines the endogenous variable (but which does not determine y_1), can I not just estimate it simultaneously? My second equation R2 yields 44%.

It is right that everything except round as well as village fixed effects (20 estimates that I did not display in the regression table) varies in the two simultaneous equations, but this is due to content.

What would you recommend to do instead that would be less suspect in economic applications?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2289
#4

30 Sep 2022, 00:15

What you’re doing with the joint estimation is essentially what 2SLS does except you’re putting lots of restrictions on the first stage (the equation for x1). Effectively, you’re making the IVs x16 through x19 stronger by omitting all of the other controls from the first stage. Imposing restrictions on what is almost always viewed as a reduced form is typically frowned upon.
Comment
Kerstin Schmidt

Join Date: Apr 2017

Posts: 125
#5

30 Sep 2022, 00:36

Thanks Jeff! How could I improve my model? By, for example, including the same controls in both equations?
Comment
Kerstin Schmidt

Join Date: Apr 2017

Posts: 125
#6

30 Sep 2022, 02:11

Giving it a second thought, from an applied perspective it would not make sense to include all exogenous variables in both equations. However, including all controls of the second equation (first stage in the 2SLS fashion) also in the first equation would be justifiable. This should be a reasonable exclusion restriction.

In your book "Economic analysis or cross section and panel data" (2010"), chapter 9, you write "when an equation in an SEM has economic meaning in isolation from the other equations in the system, we say that the equation is autonomous". From there I conclude that my second equation is autonomous, whereas my first equation is not. Wouldnt that justify the proposed exclusion restriction of including all controls of the second equation in the first one but not vice versa?

The code would then be:

Code:

cmp (insurance_type = x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 i.round i.G_num) (x1 = x16 x17 x18 x19 i.round i.G_num), cluster(HHID) ind($cmp_oprobit $cmp_cont)

Does this make sense?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2289
#7

30 Sep 2022, 12:41

I shouldn't have said that the equations estimated using cmp constitute a simultaneous system. In fact, I can't know that without knowing the definition of x1. Even then, the equation for x1 does not include y1. And you cannot know whether an equation is autonomous by looking at exclusion restrictions. The autonomy requirement can only be judged based on the economics of the problem.

Using a simplified notation, what you're doing is writing

y1 = f1(x1,x2,...,x15,u1)
x1 = f2(x16,x17,x18,x19,u2)

In the second equation, x2 through x15 have been omitted. With 2SLS, these variables all would be included in f2(.). That is, 2SLS uses an unrestricted reduced form for x1. That's preferred unless you have a good story about why x2 ... x15 can be omitted from f2(.). You very well might, but, statistically, this imposes strong assumptions on the system. If the assumptions are true, estimating the above by (effectively) 3SLS is more efficient than estimating the first equation by 2SLS. But if the restrictions are wrong, the joint procedure is inconsistent for the parameters in equation 1.
Comment

Announcement

SUR due to weak instrument?

Comment

Comment

Comment

Comment

Comment

Comment