Identification Issues in two instruments for two two-endogenous variables case with xtivreg/xtivreg2

lakhi narayan

Join Date: Jul 2024

Posts: 11
#1

Identification Issues in two instruments for two two-endogenous variables case with xtivreg/xtivreg2

01 Aug 2025, 06:33

Hi, I am using 2sls for my research and I am using xtiverg in STATA. However, I have one doubt regarding identification when we have two endogenous variables and want to use two instruments.

Suppose, in this hypothetical model, y=a+b1x1+b2x2+b3x3+u, x1 and x2 are endogenous and we want to use z1 as instrument for x1 and z2 as instrument for x2. For this if we use ivreg command;
ivreg y (x1 x2= z1 z2) x3, vce(robust) then we get two first stage results, first one for x1, where z1,z2 and z3 are taken as exogenous variables and another one for x2, where z1,z2 and z3 are taken as exogenous variables;

Now my question is even if ivreg will give us results, is this model really identified? Because theoretically, when I tried to solve the model by imputing two equations (x1=constant+c1z1+c2z2+c3x3+error and x2=constant+d1z1+d2z2+d3x3+error), in my main (initial) model, I could not identify the coefficients.

Am I wrong here or is the model really unidentified? I need your suggestions.

Last edited by lakhi narayan; 01 Aug 2025, 06:37.
Tags: 2SLS, identification
Andrew Musau

Join Date: Oct 2014

Posts: 10285
#2

01 Aug 2025, 07:20

The rule is you need at least as many valid instruments as endogenous regressors. Here, you have 2 endogenous regressors and 2 instruments, so the model is just-identified. Now, if your is question is whether the instruments are valid and strong enough for identification, then that is a different issue. You would ideally want \(z1\) and \(z2\) to be sufficiently distinct in what they explain, where \(z1\) is strongly correlated with \(x1\) but not with \(x2\) and \(z2\) is strongly correlated with \(x2\) but not with \(x1\).
Comment
lakhi narayan

Join Date: Jul 2024

Posts: 11
#3

01 Aug 2025, 14:52

Thank you Prof. Andrew for your reply. Actually I was referring to following identification issue:
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10285
#4

01 Aug 2025, 16:06

I've tried to explain in #2 why it doesn't matter that Stata includes both instruments in each first-stage regression, provided that both instruments are relevant—but let me give it one more try.We have the model:

\[
P = \alpha + \beta_1 x_1 + \beta_2 x_2 + \beta_3 m + u
\]

where \(x_1\), \(x_2\) are endogenous variables, \(z_1\) is an instrument for \(x_1\) and \(z_2\) is an instrument for \(x_2\) and \(m\) is an exogenous regressor. You assume the following exclusion restrictions:

1. \(z_1\) affects \(x_1\) only, not \(x_2\)
2. \(z_2\) affects \(x_2\) only, not \(x_1\)

Stata’s default behavior in ivreg includes both \(z_1\) and \(z_2\) in both first-stage regressions. So let's work this out:

The first-stage equations are:

\[
x_1 = \tau + \phi_1 z_1 + \phi_2 z_2 + \phi_3 m + \varepsilon_1
\]
\[
x_2 = \gamma + \eta_1 z_1 + \eta_2 z_2 + \eta_3 m + \varepsilon_2
\]

Plugging these into the structural equation:

\[
P = \alpha + \beta_1 x_1 + \beta_2 x_2 + \beta_3 m + u
\]

Substitute \(x_1\) and \(x_2\):

\[
\begin{aligned}
P &= \alpha + \beta_1 (\tau + \phi_1 z_1 + \phi_2 z_2 + \phi_3 m + \varepsilon_1) + \beta_2 (\gamma + \eta_1 z_1 + \eta_2 z_2 + \eta_3 m + \varepsilon_2) + \beta_3 m + u \\
&= \alpha + \beta_1 \tau + \beta_2 \gamma + (\beta_1 \phi_1 + \beta_2 \eta_1) z_1 + (\beta_1 \phi_2 + \beta_2 \eta_2) z_2 + (\beta_1 \phi_3 + \beta_2 \eta_3 + \beta_3) m \\
&\quad + \beta_1 \varepsilon_1 + \beta_2 \varepsilon_2 + u
\end{aligned}
\]

Define:

\[
\pi_0 = \alpha + \beta_1 \tau + \beta_2 \gamma
\]
\[
\pi_2 = \beta_1 \phi_1 + \beta_2 \eta_1, \quad \pi_3 = \beta_1 \phi_2 + \beta_2 \eta_2
\]
\[
\pi_4 = \beta_1 \phi_3 + \beta_2 \eta_3 + \beta_3
\]

Then the reduced form becomes:

\[
P = \pi_0 + \pi_2 z_1 + \pi_3 z_2 + \pi_4 m + \beta_1 \varepsilon_1 + \beta_2 \varepsilon_2 + u
\]

But under our assumptions:

i) \(z_1\) is excluded from \(x_2\) \(\Rightarrow \eta_1 = 0\)
ii) \(z_2\) is excluded from \(x_1\) \(\Rightarrow \phi_2 = 0\)

So:

\[
\pi_2 = \beta_1 \phi_1, \quad \pi_3 = \beta_2 \eta_2
\]

Now the system becomes:

\[
\beta_1 = \frac{\pi_2}{\phi_1}, \quad \beta_2 = \frac{\pi_3}{\eta_2}
\]

To summarize, the system is identified only if you enforce the correct exclusion restrictions in your first-stage regressions.
Comment
lakhi narayan

Join Date: Jul 2024

Posts: 11
#5

01 Aug 2025, 22:43

Thank you so much,Prof. Andrew, for explaining the answer. I am really grateful.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#6

02 Aug 2025, 12:30

Rather than doing hard math -- which tends to mask the issue -- it's helpful to think in terms of when perfect collinearity is ruled out. Let x* be the linear function of z1, z2, and m represented by the linear projection (that is, without the error term). Similarly, y* is the linear projection onto (z1,z2,m). The 2SLS estimator is the sample version of OLS regression of p on x*, y*, and m. So the key is that these variables are not perfectly collinear (and we would like them not to be very highly correlated, but we sometimes can't control that). Clearly if, say, both ph2 and eta2 are zero, (x*,y*,m) form a perfectly collinear set. Andrew's case is useful because it shows phi1 != 0, phi2 = 0, eta1 = 0, eta2 != 0 leads to identification. But identification will usually hold if all four coefficients are nonzero. We'd like z1 to "mostly" predict x and z2 to "mostly" predict y, but both IVs can predict both x and y. We need to rule out that (phi1,phi2) is a multiple of (eta1,eta2). Otherwise, perfectly collinearity occurs in (x*,y*,m) and then we lose identification.
2 likes
Comment
lakhi narayan

Join Date: Jul 2024

Posts: 11
#7

04 Aug 2025, 04:57

Thank you so much,Professor Wooldridge, for the explanation. I am really grateful.
Comment

Announcement

Identification Issues in two instruments for two two-endogenous variables case with xtivreg/xtivreg2

Comment

Comment

Comment

Comment

Comment

Comment