Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to keep controls in 2nd stage but exclude them from 1st stage in IV/2SLS

    Hello,

    I am working with patient-level hospital data and estimating an IV/2SLS model where my endogenous variable is repeat_mr_same_day instrumented by low_quality. While doing this, in the second stage I want to keep the full set of controls like hospital type, MR utilization per machine, etc., but in the first stage I would like to exclude some of these controls like MR utilization, hospital type, because they have role in predicting the endogenous regressor.

    When I use ivreg2 with the partial() option, the variables are only “partialled out” from the output but they still enter the first stage. So I want to ask:
    1. Is there any way in Stata to estimate a standard IV/2SLS where certain controls are kept only in the second stage but not used in the first stage?
    2. If not, is the recommended approach to implement manual SLS (run the first stage with a reduced set of regressors, save the fitted values, then run the second stage with all controls), combined with cluster-robust bootstrap to get valid standard errors?
    3. Would sem or reg3 be a better framework if I want two equations with different right-hand-side variables, while still using cluster-robust SEs?
    Any guidance or examples would be much appreciated.

    Best Regards
    Tuğba

  • #2
    Originally posted by Tugba Kaynak View Post

    I am working with patient-level hospital data and estimating an IV/2SLS model where my endogenous variable is repeat_mr_same_day instrumented by low_quality. While doing this, in the second stage I want to keep the full set of controls like hospital type, MR utilization per machine, etc., but in the first stage I would like to exclude some of these controls like MR utilization, hospital type, because they have role in predicting the endogenous regressor.?
    If those variables truly have no role in predicting the endogenous regressor, then their coefficients in the first stage will be close to zero, and they won’t meaningfully affect the fitted values. But rather than assuming this, it's better to let the data determine whether they are relevant. Assuming your sample size is reasonably large, including a few additional exogenous variables in the first stage won’t substantially reduce your degrees of freedom. More importantly, excluding second-stage controls from the first stage generally causes inconsistency of the estimates. Therefore, there is no valid reason to exclude exogenous controls from the first stage of an IV/2SLS model.


    Is there any way in Stata to estimate a standard IV/2SLS where certain controls are kept only in the second stage but not used in the first stage?
    In the standard approach, all exogenous variables included in the second stage should also appear in the first stage. While it is possible to estimate a model manually — for example, by running the first and second stages separately and enclosing both within a bootstrap program — the resulting estimates will generally be inconsistent if you exclude exogenous variables from the first stage. If someone were to include exogenous variables in the second stage but omit them from the first without clear theoretical or statistical justification, I would be skeptical of the results.

    Comment


    • #3
      Putting restrictions on the first stage is generally frowned upon, because then the estimator is not consistent if those restrictions fail. Of course, you can test whether the controls can be omitted, but then you have to think about the pre-testing problem.

      If you have a good reason to do it, there are two choices. First, you can run the first stage with the restrictions imposed, and obtain the fitted values. Then, use those fitted values as an instrument -- not regressor! -- in ivregress 2sls. When you restrict the first stage, using w^ as a regressor is no longer the same as an IV. When you use as an IV, you don't have to adjust the standard error for the first-step estimation. If you include all of the controls in the first stage, you will reproduce 2SLS.

      The first method is inefficient. You can impose the restrictions on the reduced form for y using the gmm command to estimate the two equations, where you leave out the controls in the first-stage equation.

      Code:
      reg w z1 ... zm
      predict what
      ivregress 2sls y (w = what) x1 ... xk, vce(robust)
      I'd have to look up the gmm command because it's been awhile .... Remember, I'm not sanctioning this approach, but if you want to do it ....

      Comment

      Working...
      X