Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Constrained Regression with Instrumental Variables

    Hi everyone,

    Is there a way to estimate a constrained regression with instrumental variables? In particular, suppose that I have data on the following variables: w, L, C, S, and the instrument Z. I want to run:

    log(w) = (psi -1)*log(L) + (1 - phi)*log(C) + (psi - phi)*log(S) + epsilon

    where Z would instrument for log(S).

    I looked extensively online and it seems that cnsreg does not support instrumental variables. (Even if I did not IV, I am not quite sure how cnsreg would handle this aforementioned equation?)

    Any suggestions would be greatly appreciated -- thank you for your time!

  • #2
    You can either use a control function approach or use reg3

    Code:
    clear
    set seed 2568
    * Generate some data
    set obs 200
    * Generate some exogenous regressors
    gen l=rnormal()
    gen c=rnormal()
    * Generate endogenous regressors and instrument
    gen u=rnormal()
    gen v=0.5*rnormal()
    gen z=rnormal()
    gen s=z+u
    * Generate y
    glo psi=2
    glo phi=4
    gen y=($psi-1)*l+(1-$phi)*c+($psi-$phi)*s+u+v
    
    * OLS biased
    reg y l c s
    * Constrained OLS is also biased
    * _b[l]=psi-1
    * _b[c]=1-phi
    * So _b[s]=psi-phi = _b[l]+_b[c]
    constraint 1 s= l + c
    cnsreg y l c s, constraints(1)
    * IV
    ivregress 2sls y l c (s=z)
    * IV with reg3
    reg3 y l c s, inst(l c z) 2sls
    * Constrained cas
    * Control function
    reg s z l c
    predict res, resid
    cnsreg y l c s res, constraints(1)
    * Need to adjust standard errors
    * reg3
    reg3 y l c s, inst(l c z) 2sls constraints(1)
    Jorge Eduardo Pérez Pérez
    www.jorgeperezperez.com

    Comment


    • #3
      Great suggestion Jorge and thank you for your time -- I just implemented it and it works! Could you clarify about how the standard errors should be adjusted? I am not sure the theory about clustered standard errors in context of 3SLS, but I imagine that there should be robust standard errors -- didn't find much online though.

      My code (in general terms) is now:

      constraint define 1 L+ C = S
      global stage1 "(first: w L C S controls)"
      global stage2 "(second: S IV controls)"
      reg3 $stage1 $stage2, endog(S) constr(1)

      Comment


      • #4
        My comment on standard error adjustment was for the control function approach, where you have to adjust for the residuals of the first stage being a generated regressor. For reg3 with the 2sls option (which is missing in your code), it gives the same standard errors as ivregress 2sls with a small sample adjustment

        Code:
        ivregress 2sls y l c (s=z), small
        reg3 y l c s, inst(l c z) 2sls


        which are just the regular IV standard errors. These are non-robust though, and reg3 does not allow robust standard errors. A quick and dirty solution would be to bootstrap the whole thing:

        Code:
        bs, rep(100) : reg3 y l c s, inst(l c z) 2sls constraints(1)

        Jorge Eduardo Pérez Pérez
        www.jorgeperezperez.com

        Comment


        • #5
          Thank you again for checking on the thread and clarifying! I misunderstood the full extent of your initial post, but endeavored the suggestions. While I grasped the concept, I think I will defer to purely the reg3 code I wrote (based on your comments) and the bootstrap suggestion. I will work more on understanding whether bootstrap is appropriate here.

          Comment


          • #6
            Here's another question on this broader topic: what's the difference in stata between writing "inst() versus writing out the instruments "by hand"? By hand, I mean that I'm writing the equations explicitly in their "2SLS form". What does the 2SLS command do instead of the 3SLS? A comment earlier suggested that it affects the computation of standard errors; but, what does the 3SLS command in stata do differently?

            Here's example code below. What I'm trying to do is instrument leisure, nondurables, electricity consumption, and the cubic in air quality of a regression of wages on those regressors, plus controls and fixed effects.

            constraint define 2 -1*(lleisure + lcons_nondur) = laqi + laqi2 + laqi3
            global stage1 "(first: lwage_hourly lleisure lcons_nondur lcons_elect laqi laqi2 laqi3 $X $stecon i.year i.industry i.county)"
            global stage2 "(second: laqi $ivwindsemi1 $X $stecon i.year i.industry i.county)"
            global stage3 "(third: laqi2 $ivwindsemi2 $X $stecon i.year i.industry i.county)"
            global stage4 "(fourth: laqi3 $ivwindsemi3 $X $stecon i.year i.industry i.county)"
            global stage5 "(fifth: lleisure $ivweather $X $stecon i.year i.industry i.county)"
            global stage6 "(sixth: lcons_nondur $ivcons $X $stecon i.year i.industry i.county)"
            global stage7 "(seventh: lcons_elect $ivcons $X $stecon i.year i.industry i.county)"
            quietly reg3 $stage1 $stage2 $stage3 $stage4 $stage5 $stage6 $stage7 [weight=pweight_count], constr(2) 3sls

            The only role that electricity consumption plays in stage7 is as a control since the $ivcons instruments are interactions of electricity and some other fixed effects -- so by including it as a control I am exploiting electricity consumption variation within the categories of the interacted dummy variables.

            On a side note, if there's a more efficient way to write this, I'd love to know. These regressions take a while to run.

            Comment


            • #7
              reg3 with lots of FEs is quite slow I've realized. One of the more efficient ways seems to be using a control function approach noted in an earlier post by Jorge. Since S, L, and C are all endogenous, does the control function approach go something like this?

              reg S iv_S controls
              predict res1, resid
              reg L iv_L controls
              predict res2, resid
              reg C iv_C controls
              predict res3, resid
              reg w S L C res1 res2 res3 controls

              Comment


              • #8
                No, you have to regress each endogenous variable on the full set of instruments.

                Code:
                * Generate example data
                clear
                set obs 1000
                set seed 98135136
                * Common unobserved factor
                gen u=rnormal()
                * Instruments
                gen z1=rnormal()
                gen z2=rnormal()
                * Endogenous variables
                gen x1=z1+8*u+rnormal()
                gen x2=z2+8*u+rnormal()
                * Dependent variable
                gen y= 3*x1 + 5*x2 + 5*u + rnormal()
                *  End of data generation
                
                * OLS is biased
                reg y x1 x2
                * IV is unbiased
                ivregress 2sls y (x1 x2 = z1 z2)
                * Control function approach running separate first stages doesn't give IV results
                reg x1 z1
                predict e1, resid
                reg x2 z2
                predict e2, resid
                reg y x1 x2 e1 e2
                
                * Control function approach yields IV estimates
                drop e1 e2
                reg x1 z1 z2
                predict e1, resid
                reg x2 z1 z2
                predict e2, resid
                reg y x1 x2 e1 e2
                Jorge Eduardo Pérez Pérez
                www.jorgeperezperez.com

                Comment


                • #9
                  Thanks Jorge!

                  Comment

                  Working...
                  X