IV with control function method - controls and interactions with endogenous regressor

Pia Andres

Join Date: Jan 2022

Posts: 4
#1

IV with control function method - controls and interactions with endogenous regressor

21 Jul 2025, 07:22

Hi all,

Apologies if I am double-posting, but I have struggled to find clear guidance or a post directly related to the below IV related questions - if someone could point me in the right direction I would greatly appreciate it.

I am implementing an IV estimation using the control function method, with different outcome variables Y. Some models are poisson, others OLS, but the endogenous regressor and the instrument are the same. I have always thought that
- in all IV estimations, the first and second stage need to include the same control variables
- if the second stage includes an interaction term between the endogenous regressor and some other variable, I need to have two first stages: one on the endogenous regressor itself, the other on the interaction term, where the instrument is interacted with the same variable.

I would like to confirm if the above are necessarily always correct. When using the control function method, I am including the residual from the first stage in the second stage, so would this not control for the endogenous part of the endogenous regressor as well as its interacted version? I understand that each additional endogenous regressor requires an additional instrument, but is an interaction of the same variable really an additional endogenous regressor? Moreover, if the control variables used in the second stage are not required for the validity of the instrument, do they need to be in the first stage?

So, if the second stage is

Y = b0 + b1 X1 + b2 X1*X2 + b3 X2 + b4 X3 + u + e

where u is the residual from a regression

X1 = a0 + a1 Z + u

would I need to also run

X1*X2 = a0 + a1 Z*X2 + u_int

and include u_int in the second stage? and would the first stage regressions have to necessarily include X2 and X3?

Last edited by Pia Andres; 21 Jul 2025, 07:30.
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3187
#2

21 Jul 2025, 08:09

I think you only need one first stage. That's a nice thing about the CF approach when you have interactions.
1 like
Comment
Pia Andres

Join Date: Jan 2022

Posts: 4
#3

21 Jul 2025, 10:54

Thank you, that's very helpful! And would the first stage need to include all the control variables as the second stage, or could it have a different set depending on what is needed for the instrument to be valid?
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10285

21 Jul 2025, 12:08

In the first stage, you regress the endogenous variable on instruments + exogenous variables and in the second stage, you regress the outcome on endogenous regressor, exogenous variables, and residual. So the exogenous variables have to be in both stages. In a linear model, 2SLS and the CF approach are equivalent. Here is an example:

Code:

webuse hsng2, clear
*IV2SLS
ivregress 2sls rent pcturban (hsngval = faminc i.region), vce(bootstrap, seed(07212025))

*CF
cap prog drop mybootstrap_prog
prog mybootstrap_prog
regress hsngval faminc i.region pcturban
predict vhat, resid
regress rent hsngval pcturban vhat
drop vhat
end
bootstrap _b , reps(50) nowarn nodots nodrop seed(07212025): mybootstrap_prog

Res.:

Code:

. ivregress 2sls rent pcturban (hsngval = faminc i.region), vce(bootstrap, seed(07212025))
(running ivregress on estimation sample)

Bootstrap replications (50): .........10.........20.........30.........40.........50 done

Instrumental-variables 2SLS regression            Number of obs   =         50
                                                  Wald chi2(2)    =      36.58
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.5989
                                                  Root MSE        =     22.166

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
        rent | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     hsngval |   .0022398   .0005969     3.75   0.000     .0010699    .0034098
    pcturban |    .081516    .437207     0.19   0.852    -.7753941     .938426
       _cons |   120.7065   17.98488     6.71   0.000      85.4568    155.9562
------------------------------------------------------------------------------
Endogenous: hsngval
Exogenous:  pcturban faminc 2.region 3.region 4.region

. 
. 
. 
. *CF

. 
. cap prog drop mybootstrap_prog

. 
. prog mybootstrap_prog
  1. 
. regress hsngval faminc i.region pcturban
  2. 
. predict vhat, resid
  3. 
. regress rent hsngval pcturban vhat
  4. 
. drop vhat
  5. 
. end

. 
. bootstrap _b , reps(50) nowarn nodots nodrop seed(07212025): mybootstrap_prog

Linear regression                                      Number of obs =      50
                                                       Replications  =      50
                                                       Wald chi2(3)  =   51.76
                                                       Prob > chi2   =  0.0000
                                                       R-squared     =  0.7542
                                                       Adj R-squared =  0.7382
                                                       Root MSE      = 18.0903

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
        rent | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     hsngval |   .0022398   .0005969     3.75   0.000     .0010699    .0034098
    pcturban |    .081516    .437207     0.19   0.852    -.7753941     .938426
        vhat |  -.0015889     .00073    -2.18   0.030    -.0030196   -.0001582
       _cons |   120.7065   17.98488     6.71   0.000      85.4568    155.9562
------------------------------------------------------------------------------

.

Comment

George Ford

Join Date: Aug 2014

Posts: 3187
#5

21 Jul 2025, 14:08

Theoretically, I don't think they have to be exactly the same (you can imagine a coefficient of zero), but that's the usual practice and typically how Stata will do it with its canned variants. Leaving some out may lead to bias, so I'd stick with the standard format.
Comment

Announcement

IV with control function method - controls and interactions with endogenous regressor

Comment

Comment

Comment

Comment