Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Piecewise Linear Instrumental Variable Estimation

    Does any one know a way to estimate a piecewise linear regression with instrument variable(s)?

    I have a model below,

    y = a + b*Xabovezero + c*Xbelowzero + control variables + error term.

    where Xabovezero = X if X>0, and zero otherwise, and Xbelowzero=X if X<0, and zero otherwise.

    In the event that X is endogenous, I want to use an instrument variable K. I wonder how to do it properly in stata. I found a paper discussing it https://www.dbmi.pitt.edu/sites/defa...s/Scheines.pdf but have not found stata ado files.

    Any help is appreciated!

  • #2
    Hi Faye
    To points on your problem. Why restrict yourself to piecewise linear? You can be more flexible using, for example, a partial linear model. https://www.sciencedirect.com/scienc...65176514001608

    In my own research I also found that the control function approach (when residuals from the first stage are included in the model) also works well.
    Fernando

    Comment


    • #3
      Faye: Since you know the threshold you want is zero then that paper you attached is too hard. Is X a (roughly) continuous variable? If it is, I have some suggestions -- one of which is an implementation of Fernando's excellent suggestion to use a control function approach.

      Comment


      • #4
        Thanks for your valuable input, Fernando and Jeff. X is a continuous variable (sort of, integers ranging from -10 to 10) . I could regress X on K (the IV variable) and K*K (to control for non-linearity between K and X) to get the residual (R1), then include it in the original model, i.e.
        y = a + b*Xabovezero + c*Xbelowzero + control variables + R1 + error term. Please let me know if the model is inappropriate.

        I am curious about how to implement partial linear models in stata. Reading the paper that Fernando referred to is difficult for me at this stage.

        Faye
        Last edited by Faye Gao; 06 Aug 2019, 11:14.

        Comment


        • #5
          The problem with the CF approach in your case is linearity of the first stage is suspect due to the integer nature of X. But you might use it as an approximation. In that case, I'd include more functions of R1, including R1^2 and even interact R1 with a dummy indicating that X is above zero. Once you go the CF route, you can make it pretty flexible.

          A more robust approach (in the sense of consistency), is to estimate, say, a binomial regression separately for X >= 0 and X < 0 and obtain the fitted values. The binomial model need not be correct if you then use the fitted values as instruments -- not regressors.

          Comment


          • #6
            By the way, whether you use the CF approach or generate the IVs as I suggested, the control variables must be included along with K in the first stage (whether it's linear or binomial).

            Comment


            • #7
              Hi Jeff Wooldridge
              If i can ask a follow up question.
              for the example Faye proposed, my first instinct was to do a CF approach following the following process:
              Code:
              y=depvar
              x1,x2=exogenos controls
              x3 endogenous.
                 x3a = x3 if x3>0 and zero otherwise
                 x3b = x3 if x3<0 and zero otherwise
              z instrument.
              
              Econometric model:
              y=a0+a1*x1+a2*x2+b1*x3a+b2*x3b+e
              
              CF implementation. First stage:
              x3=g0+g1*x1+g2*x3+g3*z+u
              
              Model estimation:
              y=a0+a1*x1+a2*x2+b1*x3a+b2*x3b+d1*u+e
              However, as you indicated, it is common to use interactions in the residuals,
              so I thought about the following alternatives
              1) y=a0+a1*x1+a2*x2+b1*x3a+b2*x3b+d1*u*x3+e
              2) y=a0+a1*x1+a2*x2+b1*x3a+b2*x3b+d1*u*x3a+d2*u*x3b+e
              3) y=a0+a1*x1+a2*x2+b1*x3a+b2*x3b+d1*u*(x3>0)+d2*u*(x3<0)+e
              
              In a simple simulation, however, Option 2 does not produce consistent estimates.
              So my question is, while CF can be flexible, is the a risk of "misspecification" of the control function part of general concern? or is there any specific guidance on what to do or not to do in these cases.
              Thank you
              Fernando

              Comment


              • #8
                Fernando:

                Your CF must be such that x3 is a function of the exogenous variables and the CF. That's true in (1) and (3), but not (2). It's obvious for (1). For (3), if you add u*(x3 > 0) and u*(x3 < 0) you get u. But there is no way to write x3 as a function of those interactions in (2). The bottom line is, you should always include u by itself because you're assuming x3 is a linear function of u. After that, you can put in flexible functions.

                BTW, in Faye's application, she has to decide where to put the X = 0 because x3 is discrete. I'm guessing in your simulation x3 is continuous and so P(x3 = 0) = 0.

                Comment


                • #9
                  Thank you!
                  That was insightful.
                  Also, you are correct for the simulation. X3 is continuous.
                  Best regards
                  Fernando

                  Comment


                  • #10
                    Thank you both Jeff and Fernando! I read more on the control function and have followed your advice. I am wondering whether my interpretation is correct. Below, I borrow Fernando's notation.
                    Prior to the use of CF, I get b1>0 and b2<0, based on the original model:
                    y=a0+a1*x1+a2*x2+b1*x3a+b2*x3b+e.

                    After the use of CF, I get b1>0, b2=0, and d1<0, from estimating the second stage model:
                    y=a0+a1*x1+a2*x2+b1*x3a+b2*x3b+d1*u+e,

                    In my mind, the results suggest that the estimates prior to CF (b2<0) is driven by the unexplained part of X (which is d1). But they do not contradict with the inference from the original model, Y has a negative relation to the overall movement in X3b (including both exogenous and unexplained). Can you please let me know if any of this is wrong?

                    Thanks!
                    Faye
                    Last edited by Faye Gao; 02 Oct 2019, 10:02.

                    Comment

                    Working...
                    X