Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • IV-PROBIT with binary endogenous variable and binary instrument

    Hello There,

    I am exploring the causal effect of maternal education on child health. My dependent variable is binary. It indicates whether children under 6 months of age are exclusively breastfef (1=exclusiveky breastfed; 0=otherwise).
    To account for the endoeneity of maternal education, I utilize a compulsory education law. My IV is also a binary variable. It is reform exposure and it indicates whether the mothers were exposed the law. My education variable is also dummy and it shows whether the mothers received 8 years of education or not (1=8 yearsor more; 0= less than 8 years). I am using 2 rounds of Turkish DHS. I pooled them. I am using STATA 14.

    My code is as follows:

    ivprobit exclusive_BF (completionof8years=reformexpsosure) X1 X2 X3 ... where X1 X2 and X3 are my explanatory variables.

    However, stata says that ivprobit assumes that the endogenous covariates are continuous and so are not appropriate for use with discrete endogenous covariates.

    As a result, my code becomes useless (I guess?).

    Can anybody please help me? How can I run ivprobit model and interpret the results?

    Thank you so much

  • #2
    Stata tells you correctly that -ivprobit- (which is not an IV estimator at all, it is a control function or maximum likelihood estimator) is not appropriate for your situation. -ivprobit- is appropriate for continuous endogenous regressor.

    Use either -biprobit-, or linear regression -ivregress-.

    Comment


    • #3
      Originally posted by Joro Kolev View Post
      Stata tells you correctly that -ivprobit- (which is not an IV estimator at all, it is a control function or maximum likelihood estimator) is not appropriate for your situation. -ivprobit- is appropriate for continuous endogenous regressor.

      Use either -biprobit-, or linear regression -ivregress-.
      Dear Joro Kolev, thank you so much for your reply. However, I am a bit confused about the biprobit. How can I implement my insturment using biprobit?

      Comment


      • #4
        Like this. I load data and I create dummies for both the dependent variable and the endogenous regressor:

        Code:
        .  webuse hsng2, clear
        (1980 Census housing data)
        
        . summ hsngval
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
             hsngval |         50       48484    15770.24      31100     119400
        
        . gen hsngvaldummy = hsngval>r(mean)
        
        . summ rent
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
                rent |         50      234.76    35.35335        180        368
        
        . gen rentdummy = rent>r(mean)
        Then linear regression:

        Code:
        . ivregress 2sls rentdummy pcturban (hsngvaldummy = faminc i.region)
        
        Instrumental variables (2SLS) regression          Number of obs   =         50
                                                          Wald chi2(2)    =      62.78
                                                          Prob > chi2     =     0.0000
                                                          R-squared       =     0.5450
                                                          Root MSE        =     .33619
        
        ------------------------------------------------------------------------------
           rentdummy |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        hsngvaldummy |   .7432537   .1833852     4.05   0.000     .3838253    1.102682
            pcturban |   .0089092   .0047618     1.87   0.061    -.0004239    .0182422
               _cons |  -.3891662   .2817598    -1.38   0.167    -.9414053    .1630729
        ------------------------------------------------------------------------------
        Instrumented:  hsngvaldummy
        Instruments:   pcturban faminc 2.region 3.region 4.region
        And then biprobit:

        Code:
        . biprobit (rentdummy hsngvaldummy pcturban) (hsngvaldummy pcturban faminc i.region), nolog
        
        Seemingly unrelated bivariate probit            Number of obs     =         50
                                                        Wald chi2(7)      =      28.46
        Log likelihood = -19.478371                     Prob > chi2       =     0.0002
        
        ------------------------------------------------------------------------------
                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        rentdummy    |
        hsngvaldummy |   2.269605   .5418049     4.19   0.000     1.207687    3.331523
            pcturban |   .0600618   .0238045     2.52   0.012     .0134058    .1067178
               _cons |  -4.825022   1.634003    -2.95   0.003    -8.027609   -1.622435
        -------------+----------------------------------------------------------------
        hsngvaldummy |
            pcturban |   .0475946    .059457     0.80   0.423    -.0689391    .1641283
              faminc |   .0014891   .0006342     2.35   0.019     .0002461    .0027321
                     |
              region |
            N Cntrl  |  -.6791572    1.58226    -0.43   0.668     -3.78033    2.422015
              South  |    .561397   29.04296     0.02   0.985    -56.36176    57.48455
               West  |   3.720021    1.70954     2.18   0.030     .3693845    7.070657
                     |
               _cons |   -34.8199    12.0947    -2.88   0.004    -58.52507   -11.11473
        -------------+----------------------------------------------------------------
             /athrho |  -14.15105    2426.15    -0.01   0.995    -4769.318    4741.016
        -------------+----------------------------------------------------------------
                 rho |         -1   4.96e-09                            -1           1
        ------------------------------------------------------------------------------
        Wald test of rho=0: chi2(1) = .000034                     Prob > chi2 = 0.9953
        
        .
        Originally posted by Cansu Oymak View Post

        Dear Joro Kolev, thank you so much for your reply. However, I am a bit confused about the biprobit. How can I implement my insturment using biprobit?

        Comment


        • #5
          Originally posted by Joro Kolev View Post
          Like this. I load data and I create dummies for both the dependent variable and the endogenous regressor:

          Code:
          . webuse hsng2, clear
          (1980 Census housing data)
          
          . summ hsngval
          
          Variable | Obs Mean Std. Dev. Min Max
          -------------+---------------------------------------------------------
          hsngval | 50 48484 15770.24 31100 119400
          
          . gen hsngvaldummy = hsngval>r(mean)
          
          . summ rent
          
          Variable | Obs Mean Std. Dev. Min Max
          -------------+---------------------------------------------------------
          rent | 50 234.76 35.35335 180 368
          
          . gen rentdummy = rent>r(mean)
          Then linear regression:

          Code:
          . ivregress 2sls rentdummy pcturban (hsngvaldummy = faminc i.region)
          
          Instrumental variables (2SLS) regression Number of obs = 50
          Wald chi2(2) = 62.78
          Prob > chi2 = 0.0000
          R-squared = 0.5450
          Root MSE = .33619
          
          ------------------------------------------------------------------------------
          rentdummy | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          hsngvaldummy | .7432537 .1833852 4.05 0.000 .3838253 1.102682
          pcturban | .0089092 .0047618 1.87 0.061 -.0004239 .0182422
          _cons | -.3891662 .2817598 -1.38 0.167 -.9414053 .1630729
          ------------------------------------------------------------------------------
          Instrumented: hsngvaldummy
          Instruments: pcturban faminc 2.region 3.region 4.region
          And then biprobit:

          Code:
          . biprobit (rentdummy hsngvaldummy pcturban) (hsngvaldummy pcturban faminc i.region), nolog
          
          Seemingly unrelated bivariate probit Number of obs = 50
          Wald chi2(7) = 28.46
          Log likelihood = -19.478371 Prob > chi2 = 0.0002
          
          ------------------------------------------------------------------------------
          | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          rentdummy |
          hsngvaldummy | 2.269605 .5418049 4.19 0.000 1.207687 3.331523
          pcturban | .0600618 .0238045 2.52 0.012 .0134058 .1067178
          _cons | -4.825022 1.634003 -2.95 0.003 -8.027609 -1.622435
          -------------+----------------------------------------------------------------
          hsngvaldummy |
          pcturban | .0475946 .059457 0.80 0.423 -.0689391 .1641283
          faminc | .0014891 .0006342 2.35 0.019 .0002461 .0027321
          |
          region |
          N Cntrl | -.6791572 1.58226 -0.43 0.668 -3.78033 2.422015
          South | .561397 29.04296 0.02 0.985 -56.36176 57.48455
          West | 3.720021 1.70954 2.18 0.030 .3693845 7.070657
          |
          _cons | -34.8199 12.0947 -2.88 0.004 -58.52507 -11.11473
          -------------+----------------------------------------------------------------
          /athrho | -14.15105 2426.15 -0.01 0.995 -4769.318 4741.016
          -------------+----------------------------------------------------------------
          rho | -1 4.96e-09 -1 1
          ------------------------------------------------------------------------------
          Wald test of rho=0: chi2(1) = .000034 Prob > chi2 = 0.9953
          
          .
          I really appreciate for the syntax and the logical explanation. However, should I need to interpret marginal effects to talk about "odds ratio"? or can I simply interpret the causal effect just using biprobit? Thank yuo so much.

          Comment


          • #6
            You might want to check out -margins- for marginal effects and interpretations. Odds ratios are only for logistic regression.

            If I had to evaluate the endogenous dummy manually, and my estimated structural equation is F(a*dummy + b'x), where F() is the normal CDF, I would set the control variables x at some values, say their means, lets call the vector of x predictors set at their means x*, and I would evaluate

            F(a + b'x*) - F(b'x*).


            Originally posted by Cansu Oymak View Post

            I really appreciate for the syntax and the logical explanation. However, should I need to interpret marginal effects to talk about "odds ratio"? or can I simply interpret the causal effect just using biprobit? Thank yuo so much.

            Comment


            • #7
              Originally posted by Joro Kolev View Post
              You might want to check out -margins- for marginal effects and interpretations. Odds ratios are only for logistic regression.

              If I had to evaluate the endogenous dummy manually, and my estimated structural equation is F(a*dummy + b'x), where F() is the normal CDF, I would set the control variables x at some values, say their means, lets call the vector of x predictors set at their means x*, and I would evaluate

              F(a + b'x*) - F(b'x*).



              I am very thankful for your prompt and explanatory replies ! I guess I got it !

              Comment


              • #8
                Hello,

                I tried running biprobit model with a binary independent and dependent variable and also my IV is a binary variable and I am running the regression with the 'if' command. However, after the nth iteration, STATA is showing me this output "Hessian is not negative semidefinite". Can anybody help, please?

                With Ivregress 2SLS, the model is working but to interpret the estimates I need to look at margins. right? My coefficient from Iv regress 2SLS output and margins are coming as 31.92 and I don't know how to interpret this. Any input is highly appreciated

                Comment


                • #9
                  Hi,

                  there is a new package rbiprobit that handles recursive bivariate probit models

                  Code:
                  ssc install rbiprobit
                  So, if your outcome variable is binary and your endogenous treatment variable is also binary, you can use rbiprobit to get coefficient estimates, to compute marginal effects and to compute treatment effects.

                  Here, a short example for computation of the average treatment effect and the marginal effects of x2 on the joint probability p11
                  Code:
                  rbiprobit (y1 = x1 x2), endog(y2 = x2 x3)
                  rbiprobit tmeffects, tmeff(ate)
                  rbiprobit margdec, dydx(x2) predict(p11) effect(total)
                  While biprobit will give you the correct coefficient estimates if you fit a recursive model, it won't give you the correct predictions, treatment effects and marginal effects.

                  Comment


                  • #10
                    Hi Mustafa,

                    Thanks much for sharing the new package rbiprobit. I can apply it for my IV regression with binary outcome and endogenous variables.

                    Would you recommend any command to test for the weakness of the IV following the rbiprobit?

                    Best,
                    Trang

                    Comment


                    • #11
                      Hi Trang Thu Vu,

                      to the best of my knowledge, there is currently no way of testing weakness of IVs in nonlinear settings like rbiprobit. This might due to the fact that such models are even identified if there is no IV at all. In that case, identification is based solely on the distributional assumption of the model. I'm currently working on version 1.2 of the package that will allow users to test the normality assumption of the model.

                      Comment


                      • #12
                        It is acceptable to test/examine the weakness of the instruments in the linear setting, even though you might end up using these instruments in a nonlinear model.

                        Mustafa is right that this is not exactly correct because the nonlinear model might be ideantified even without any instruments, but this is known as identification through functional form, and is frowned upon. Therefore one can still examine the weakness of the instruments in the linear first stage with available methods for linear models -- although not exactly correct, this is informative.

                        Comment


                        • #13
                          I agree with Joro. It is better to have any test statistic, even if it is based on linear models, than to have no test statistic. It's true that identification through functional form is not the best way to get published but once you have an approriate IV rbiprobit may have better performance than e.g. ivprobit

                          Comment


                          • #14
                            Thank you everyone for the nice guidence.
                            I have similar confusion and need help.

                            I am exploring causal effect of household water quality and child health indicators (ARI, diarrhea, malnutrition- where diarrhea and Infant mortality are binary and malnutrition is continuous outcomes). Water quality is main regression and categorical variable based on bacterial contamination (low, medium, and high risk). 2 Instruments, one is continuous at district level elevation & 2nd is (interaction) max average temperature X interview month.

                            My code is as follows:

                            ivreg2 diarrhea (WQ= mean_elevation c.max_tem#i.Interview_month) X2 X3 dist_sanitation_coverage i.province i.year, cluster (village)

                            similar code for other outcomes. Does this model is correct using 2SLS model

                            Comment


                            • #15
                              I think this is wrong. If I understand what you are saying WQ is a categorical variable bacterial contamination (low, medium, and high risk). You cannot put categorical variables in the regression like this, you need to expand them into dummy variables.

                              So your command should be something like

                              ivreg2 diarrhea (i.WQ= mean_elevation c.max_tem#i.Interview_month) X2 X3 dist_sanitation_coverage i.province i.year, cluster (village)

                              or if Stata does not accept this syntax, first generate the dummies manually, and then include them. E.g.,

                              Code:
                              gen WQhigh = WQ=="high"
                              gen WQmedium = WQ=="medium"
                              
                              ivreg2 diarrhea (WQhigh WQmedium= mean_elevation c.max_tem#i.Interview_month) X2 X3 dist_sanitation_coverage i.province i.year, cluster (village)
                              What I have written above is pseudo code because you do not say or show how WQ is encoded. I assumed that it is a string variable with the level names as strings.

                              Comment

                              Working...
                              X