Addressing Endogeneity in Panel Data Using Control Function Approach

Nick Baradar

Join Date: May 2020

Posts: 93
#1

Addressing Endogeneity in Panel Data Using Control Function Approach

14 May 2025, 03:08

Hello all,

I am working with an unbalanced panel dataset comprising 350 firms over a 20-year period (yearly observations), and I am facing potential endogeneity issues with three of my explanatory variables. Unfortunately, the instruments I have considered so far appear to be weak, as indicated by Cragg-Donald Wald F-statistics below the conventional threshold of 10. As a result, I am exploring the use of the control function approach to address these endogeneity concerns.

I would greatly appreciate your input on the following two questions:

1. Can the control function approach be used to address endogeneity in multiple endogenous regressors simultaneously in a panel data context with fixed effects? If so, how should the first-stage regressions be specified? Should each endogenous regressor be regressed on the full set of exogenous variables and instruments, or is there a more efficient specification?
2. What is the correct procedure to obtain residuals from the first-stage regressions (i.e., the control functions) when using fixed effects estimators in panel data (e.g., with xtreg, fe in Stata)? I tried using predict res_first, resid, but this doesn’t seem to work properly in the fixed effects context. Is there a preferred command or method to extract the residuals needed for the second-stage regression?

Thank you very much in advance!

Best regards,
Nick
Tags: None
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#2

17 May 2025, 02:29

That is a good question. Is the relevance of instruments as important when doing the control function approach. My guess would be yes. Waiting for a better answer I think the new version of Stata (Stata19) has a built-in command for this.

Are your endogenous regressors continuous? If so, you can use the Gaussian copula to tackle endogeneity (Park and Gupta, 2012).
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#3

17 May 2025, 20:58

Nick: The CF approach can be used, and my paper with Riju Joshi shows how to do it in the unbalanced case. The problem is that the first-stage CF regression is identical to that for the fixed effects 2SLS estimator, and so, as Maxence said, the CF approach cannot generally help with weak IVs. One case it can is when nonlinear functions of an EEV appear in the equation, and then a single control function can be used to handle endogeneity of all nonlinear terms. But for the basic model where the first and second stages are linear, CF = 2SLS and so the same weak IV considerations still hold.
1 like
Comment
Nick Baradar

Join Date: May 2020

Posts: 93
#4

21 May 2025, 06:32

Thank you, Dr. Jeff Wooldridge, thank you, Maxence Morlet,

As Maxence coorectly guessed the relevance of instruments is still very important for the control function approach too. Technically, one (a reviewer) wants to see an F test > 10. My model is linear and for the same IV, the CF and 2SLS estimators are indeed identical. I will not be able to use either of these approaches as my IVs are weak (according to xtivreg2 command). As Maxence advised, I will try the Gaussian copula approach to address the endogeneity of my three independent variables and their interactions. Thank you, Maxence!

On another note, out of curiosity—if anyone knows of any built-in Stata functions that display the F-statistic (for instrument relevance) and J-statistic (for instrument exogeneity) in the context of a panel data fixed effects specification, I would greatly appreciate it.

Thank you again and best regards,
Nick
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#5

21 May 2025, 12:45

You're welcome! Regarding the F statistic, in ssc install invreg2, just specify the option "first"
Comment
Nick Baradar

Join Date: May 2020

Posts: 93
#6

22 May 2025, 02:47

Originally posted by Maxence Morlet View Post

You're welcome! Regarding the F statistic, in ssc install invreg2, just specify the option "first"

Do you mean "ssc install ivreg2" ?
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#7

23 May 2025, 00:30

Yes, typo...
Comment
Zhixiao Yao

Join Date: Sep 2021

Posts: 10
#8

13 Aug 2025, 08:55

Dear all,
My questions are also related to Addressing Endogeneity in Panel Data Using Control Function Approach, but from a slightly different angle, so I hope it’s fine to post them under this thread.

I use a control function (CF) approach in a staggered DiD gravity model, with the first stage being a cohort-specific probit:

,
where
Zijc: excluded instruments (distance, contiguity, language, WTO, common legal rigins, colonial ties, religious proximity index, etc.)

: Mundlak means of GDP, GDPpc, polity

My question is about instrument selection in the first stage:
If I add or drop one or two Zijc variables in the probit, the predicted probability (and thus the CF residuals) change slightly. How sensitive should I expect my final second-stage CF estimates to be to such changes?

Is there a recommended way to convince readers that my choice of Zijc set is valid and that the CF is working as intended? For example, should I show robustness checks with different instrument sets, or is there a more formal diagnostic in the CF context?

Any advice on best practice here would be very helpful!

Best,
Zhixiao

Last edited by Zhixiao Yao; 13 Aug 2025, 08:59.
Comment

Announcement

Addressing Endogeneity in Panel Data Using Control Function Approach

Comment

Comment

Comment

Comment

Comment

Comment

Comment