Two endogenous variables but only interested in the coefficient of one...sufficient to isntrument only for one variable?

Bobby Wood

Join Date: Jul 2017

Posts: 39
#1

Two endogenous variables but only interested in the coefficient of one...sufficient to isntrument only for one variable?

10 Nov 2017, 05:49

Hi,

I want to estimate a model:

y=a1+a1x1+a2x2+a3x3

x1 and x2 are endogenous.

I'm interested in the coefficient a1, but not really in the coefficient a2 and only include x2 as a control variable.

Will a1 be unbiased if I only instrument for x1 but not for x2?

Many thanks,
Bobby
Tags: None
Bobby Wood

Join Date: Jul 2017

Posts: 39
#2

10 Nov 2017, 08:01

Or would it be better to just drop x2 from the model in that case?
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

13 Nov 2017, 17:22

Yes, the coefficient on x1 will be inconsistent if x2 is endogenous and not instrumented. Dropping x2 results in omitted variables bias. So, instrument x2.
Comment
Bobby Wood

Join Date: Jul 2017

Posts: 39
#4

21 Nov 2017, 07:50

Originally posted by Phil Bromiley View Post

Yes, the coefficient on x1 will be inconsistent if x2 is endogenous and not instrumented. Dropping x2 results in omitted variables bias. So, instrument x2.

Can you give an explanation why this is the case? Actually my understanding was that one of the main applications for IV is if there is a ommited variable bias. If the instruments are not correlated with the error term why should there still be an ommited variable bias?
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2470

22 Nov 2017, 07:47

Dear Bobby,
As Phil already said, you will have a problem if x2 is not instrumented or if it is omitted. Perhaps an example will convince you of that to be the case.

Code:

** This first part of the code simmulates data to use later to create a system with endogeneity
program data_simulation
    clear
    * This sets the sample size to 5000
    set obs 5000 
    * The following matrices establish the variance-covariance matrix for the 
    * errors, explanatory variables and exogenous instruments
    matrix cu=[ 1 , .7 \ ///
               0.7, 1   ]
    matrix cx=[ 1 , .3 , .3  ,-.1  \ ///
               .3 , 1  , .35 , .1  \ ///
               .3 , .35, 1   ,-.15 \ ///
               -.1, .1 ,-.15 , 1   ]
    matrix cz=[ 1 , .5 ,-.5   \ ///
               .5 , 1  ,-.8   \ ///
              -.5 ,-.8 , 1   ]
    drawnorm u1 u2, corr(cu)
    drawnorm x1 x2 x3 x4 , corr(cx)
    drawnorm z1 z2 z3, corr(cz)    
end

program data_ivregress
    data_simulation 
    gen x3s=x3+0.4*z1+0.6*z2+u2
    gen x4s=x4-0.5*z1+0.5*z3+u2
    gen y1=1+0.5*x1-0.5*x2+0.5*x3s-0.5*x4s+u1
end

** As you can see i created an example with two endogenous variables (x3s x4s), and three possible Insrtuments (z1 z2 z3)

data_ivregress
** Here we will use Ivregress with the different "assumptions"
set seed 102
data_ivregress
* Ignoring Endogeneity
ivregress 2sls y1 x1 x2 x3s x4s
est sto m0
* Instrumenting both endogenous variables
ivregress 2sls y1 x1 x2 (x3s x4s=z1 z2 z3)
est sto m1
* Instrumenting  only one variable x3s. Using only instrument that affects x4s
ivregress 2sls y1 x1 x2 x3s (x4s =z3 )
est sto m2
* Instrumenting only one variable. Using instruments z1 z3
ivregress 2sls y1 x1 x2 x3s (x4s =z1 z3)
est sto m3
* Instrumenting only one variable. all instruments
ivregress 2sls y1 x1 x2 x3s (x4s =z1 z2 z3)
est sto m4
* Ommiting the uninstrumented variable
ivregress 2sls y1 x1 x2 (x4s =z1 z2 z3)
est sto m5

esttab m0 m1 m2 m3 m4 m5, compress se


----------------------------------------------------------------------------------------
                 (1)          (2)          (3)          (4)          (5)          (6)   
                  y1           y1           y1           y1           y1           y1   
----------------------------------------------------------------------------------------
x1             0.488***     0.506***     0.487***     0.488***     0.491***     0.534***
            (0.0121)     (0.0160)     (0.0122)     (0.0121)     (0.0121)     (0.0286)   

x2            -0.598***    -0.481***    -0.598***    -0.598***    -0.600***    -0.298***
            (0.0121)     (0.0247)     (0.0121)     (0.0121)     (0.0121)     (0.0284)   

x3s            0.746***     0.454***     0.746***     0.746***     0.745***             
           (0.00717)     (0.0474)    (0.00721)    (0.00719)    (0.00719)                

x4s           -0.259***    -0.545***    -0.266***    -0.260***    -0.244***    -0.992***
           (0.00708)     (0.0501)     (0.0138)     (0.0113)     (0.0111)     (0.0327)   

_cons          1.019***     1.020***     1.019***     1.019***     1.019***     1.022***
            (0.0114)     (0.0149)     (0.0114)     (0.0114)     (0.0114)     (0.0271)   
----------------------------------------------------------------------------------------
N               5000         5000         5000         5000         5000         5000   
----------------------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001


** So, as Phil indicated. Ommiting X3s, affects the estimation of the X4s Beta, and in this example also affects the estimation of betas of x1 and x2
** Not instrumenting x3s also affects the the estimation for x4s.

HTH
Fernando

Comment

Bobby Wood

Join Date: Jul 2017
Posts: 39

22 Nov 2017, 09:48

Originally posted by FernandoRios View Post

Dear Bobby,
As Phil already said, you will have a problem if x2 is not instrumented or if it is omitted. Perhaps an example will convince you of that to be the case.

Code:

...

HTH
Fernando

Thanks very much for your explanation. But isn't the problem here by construction as both endogenous variables are functions of z1?
But what if this is not the case as in the example below. There are still some marginal deviations but they are due to the efficiency loss (or not?).

Code:

** This first part of the code simmulates data to use later to create a system with endogeneity
program data_simulation
    clear
    * This sets the sample size to 5000
    set obs 5000
    * The following matrices establish the variance-covariance matrix for the
    * errors, explanatory variables and exogenous instruments
     matrix cu=[1 , .3 ,-.2   \ ///
               .3 , 1  ,-.6   \ ///
              -.2 ,-.6 , 1   ]
    matrix cx=[ 1 , .3 , .3  ,-.1  \ ///
               .3 , 1  , .35 , .1  \ ///
               .3 , .35, 1   ,-.15 \ ///
               -.1, .1 ,-.15 , 1   ]
    matrix cz=[ 1 , .5 ,-.5   \ ///
               .5 , 1  ,-.8   \ ///
              -.5 ,-.8 , 1   ]
    drawnorm u1 u2 u3, corr(cu)
    drawnorm x1 x2 x3 x4 , corr(cx)
    drawnorm z1 z2 z3, corr(cz)    
end

program data_ivregress
    data_simulation
    gen x3s=x3+0.4*z1+0.6*z2+u1
    gen x4s=x4-0.5*z1+0.5*z3+u2
    gen y1=1+0.5*x1-0.5*x2+0.5*x3s-0.5*x4s+u3
end

** As you can see i created an example with two endogenous variables (x3s x4s), and three possible Insrtuments (z1 z2 z3)

data_ivregress
** Here we will use Ivregress with the different "assumptions"
set seed 102
data_ivregress
* Ignoring Endogeneity
ivregress 2sls y1 x1 x2 x3s x4s
est sto m0
* Instrumenting both endogenous variables
ivregress 2sls y1 x1 x2 (x3s x4s=z1 z2)
est sto m1
* Instrumenting only one variable. Using instruments z1 z3
ivregress 2sls y1 x1 x2 x3s (x4s =z2)
est sto m2

* Omitting the uninstrumented variable
ivregress 2sls y1 x1 x2 (x4s =z2)
est sto m3


esttab m0 m1 m2 m3, compress se

program drop data_simulation
program drop data_ivregress


--------------------------------------------------------------
                 (1)          (2)          (3)          (4)  
                  y1           y1           y1           y1  
--------------------------------------------------------------
x1             0.512***     0.522***     0.523***     0.529***
            (0.0137)     (0.0147)     (0.0141)     (0.0180)  

x2            -0.441***    -0.488***    -0.464***    -0.274***
            (0.0140)     (0.0367)     (0.0150)     (0.0182)  

x3s            0.383***     0.461***     0.409***            
           (0.00832)     (0.0760)    (0.00989)                

x4s           -0.752***    -0.572***    -0.634***    -1.119***
           (0.00796)     (0.0789)     (0.0243)     (0.0261)  

_cons          1.023***     1.020***     1.021***     1.030***
            (0.0130)     (0.0138)     (0.0133)     (0.0170)  
--------------------------------------------------------------
N               5000         5000         5000         5000  
--------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Comment

Bobby Wood

Join Date: Jul 2017

Posts: 39
#7

22 Nov 2017, 10:12

I'm really confused by this genral argument that if you ommit a variable from a model you'll have inconsistent estimates.

1) If the ommitted variable does not affect the variable of interest then I don't understand why this should be problematic.
2) If the ommitted variable affects the variable of interest then the standard approach is to use an instrument.

As I mentioned before one of the main reasons to use IV is if you have ommited variables so this can't be generally right. Or do I totally miss something here?
Comment
Bobby Wood

Join Date: Jul 2017

Posts: 39
#8

27 Nov 2017, 02:04

No further opionions?

To give some additional information: x1 affects x2 but x2 doesn't affect x1 in my case.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#9

27 Nov 2017, 06:12

Your omitted variable must be uncorrelated with all exogenous regressors and all instruments.

In addition, the excluded variable might introduce heteroskedasticity, serial correlation etc. into the error term, requiring robust inference.

https://www.kripfganz.de/stata/
Comment
Bobby Wood

Join Date: Jul 2017

Posts: 39
#10

28 Nov 2017, 05:08

Originally posted by Sebastian Kripfganz View Post

Your omitted variable must be uncorrelated with all exogenous regressors and all instruments.

In addition, the excluded variable might introduce heteroskedasticity, serial correlation etc. into the error term, requiring robust inference.

Thanks for your reply. Do you have a reference on this?
Thus, fyou say that even if x2 has no impact on x1 but x1 has an impact on x2 I will still get an inconsistent coefficient for x1?
Comment

Announcement