Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Control function approach with logit and squared term

    Hi,

    Consider the following model:

    yvar = a + b1*x1var + b2*x1var^2 + b3*x2var + b4'*controls + error (eq 1)

    As yvar is binary, I estimate (eq 1) by logit.

    x1var and x2var are continuous and endogenous independent variables.

    I also have z1var and z2var instrumental variables: z1var is an instrument for x1var and z2var is an instrument for x2var.

    Therefore, I would like to implement a control function approach to account for the endogeneity of both x1var (and x1var squared) and x2var.

    I have read Wooldridge textbook and I have read multiple posts here, but I am still having some difficulties.

    Basically, I have tried two different control functions to estimate (eq 1) and I get very different results, unexpectedly.

    First, I estimate a plain and simple control function (CF1) as follows:

    Code:
    *first stage
    reg x1var z1var z2var ${controls}
    predict resid_1, res
    reg x2var z1var z2var ${controls}
    predict resid_2, res
    
    *second stage
    logit yvar c.x1var##c.x1var x2var ${controls} resid_1 resid_2
    
    *both stages are bootstrapped to get correct standard errors
    Then, I try a more flexible control function (CF2) as follows:

    Code:
    *first stage
    gen x1var_2=x1var^2
    reg x1var c.z1var##c.z1var z2var ${controls}
    predict resid_1, res
    reg x1var_2 c.z1var##c.z1var z2var ${controls}
    predict resid_2, res
    reg x2var c.z1var##c.z1varz2var ${controls}
    predict resid_3, res
    
    *second stage
    logit yvar c.x1var##c.x1var x2var ${controls} resid_1 resid_2 resid_3
    
    *both stages are bootstrapped to get correct standard errors
    However, the results that I obtain when using CF1 or CF2 are completely different in terms of sign, magnitude and statistical significance.

    In principle, I would prefer CF2 as the control function is more flexible.

    However, I am uncertain whether there is something wrong with CF2.

    Do you see any obvious reason why the two control functions CF1 and CF2 produce completely different results? Which control functions would you prefer?

    Thanks,

    Lukas
    Last edited by Lukas Lang; 09 Jun 2022, 09:35. Reason: fixing typos
    ------
    I use Stata 17

  • #2
    Hi Lukas,
    what exactly are you using to compare the marginal effects of both models. Can you, for example, post the marginal effects of x1var in both models?
    Certainly coefficients will change, but what it is important here are those marginal effects
    F

    Comment


    • #3
      Thank you FernandoRios. When I say that results are different I refer exactly to the average marginal effects of x1var. This is how I compute these average marginal effects.

      Code:
      margins, dydx(x1var) at(x1var=`min value' x1var=`mean value' x1var=`max value')
      Min, mean and max are my values of interest for the average marginal effects. Hope it makes sense.
      ------
      I use Stata 17

      Comment


      • #4
        and can you show the exact numbers you get?

        Comment


        • #5
          Thanks FernandoRios

          When using CF1 I obtain these results:

          Code:
          margins, dydx(la_exp) at(la_exp=0 la_exp=`min' la_exp=`mean' la_exp=`max') post
          
          Average marginal effects                                Number of obs = 39,115
          Model VCE: Robust
          
          Expression: Pr(houtcome), predict()
          dy/dx wrt:  la_exp
          1._at: la_exp = 16.51067
          2._at: la_exp = 26.61623
          3._at: la_exp = 47.27143
          
          -----------------------------------------------------------------------------------
                            |            Delta-method
                            |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
          ------------------+----------------------------------------------------------------
          la_exp            |
                        _at |
                         1  |  -.0112055   .0032717    -3.42   0.001     -.017618   -.0047931
                         2  |  -.0194655   .0053244    -3.66   0.000    -.0299011   -.0090299
                         3  |   .0001215   .0075195     0.02   0.987    -.0146164    .0148594
          -----------------------------------------------------------------------------------
          When I use CF2 I get:

          Code:
          margins, dydx(la_exp) at(la_exp=0 la_exp=`min' la_exp=`mean' la_exp=`max') post
          
          Average marginal effects                                Number of obs = 39,115
          Model VCE: Robust
          
          Expression: Pr(houtcome), predict()
          dy/dx wrt:  la_exp
          1._at: la_exp = 16.51067
          2._at: la_exp = 26.61623
          3._at: la_exp = 47.27143
          
          -----------------------------------------------------------------------------------
                            |            Delta-method
                            |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
          ------------------+----------------------------------------------------------------
          la_exp            |
                        _at |
                         1  |  -.0008915   .0011282    -0.79   0.429    -.0031027    .0013197
                         2  |  -.0372457   .0043453    -8.57   0.000    -.0457623   -.0287292
                         3  |  -.0162769   .0040293    -4.04   0.000    -.0241742   -.0083795
          -----------------------------------------------------------------------------------
          So, as you can see, results change quite a lot especially when thinking about the interpretation.

          While I do not have any prior on the magnitude of the effect, as this is the effect of health care insurance on a measure of health outcome, the negative sign is what I would expect.

          What confuses me is that at the minimum and maximum value the results are different bust still plausible in both CF1 and CF2 case.

          So, my conclusions about which could be the best model are highly uncertain.

          Any thoughts about how else I can assess the validity of these two models?
          Last edited by Lukas Lang; 13 Jun 2022, 07:34.
          ------
          I use Stata 17

          Comment


          • #6
            well a few thoughts on your results
            1) when using quadratic terms (as you did), you do not need to estimate a first stage for the quadratic term as well. I believe that is common practice for two-step substitution approach, but not residual inclusion approach.
            2) you could add more flexibility, for example, adding functional forms of the IMR.
            3) It seems, to me, that our results are rather consistent for the mean. The magnitude is almost double, yes, but it could be just due to specification.
            4) very few models perform well around the limits of the variables of interest. So I wouldn't put much weight on evaluating the effect at x=max and x=min
            HTH

            Comment


            • #7
              Thank you, your suggestions are helpful!
              ------
              I use Stata 17

              Comment

              Working...
              X