Dear Statalist community,
I am currently working on a project where I want to perform regional-level Fixed Effects Instrumental Variables (FEIV) using individual-level data. The outcome variable is an individual-level categorical variable and the explanatory variable an individual-level dummy variable. Due to the likely simultaneity between these variables, I aim to utilize a continuous regional-level instrumental variable. In summary: Y is categorical and individual-level, X is a dummy and individual-level, Z is continuous and regional-level. I also want to control for yearly dummies and regional dummies reflecting regional fixed effects.
While the instrument initially appears to be strong (with a z-value of about 4), and even the adjusted R-squared looks relatively good (~0.35), the first stage of a simple ivregress 2sls Y (X = Z) controls i.regional i.year, vce(cluster regional) routine leads to predicted probabilities (X^hat) below 0 for approximately 20% of the observations. Additionally, the second stage results in a statistically significant coefficient that is unreasonably high in magnitude.
When I run a logit model to predict X_hat and then perform ivregress 2sls Y (X = X_hat) controls i.regional i.year, vce(cluster regional), I obtain a significant second stage coefficient that is only about 10% of the magnitude of the previous one. The marginal effect in the logit regression is quite close to the first stage in the linear model.
Now, onto my questions:
Niklas
I am currently working on a project where I want to perform regional-level Fixed Effects Instrumental Variables (FEIV) using individual-level data. The outcome variable is an individual-level categorical variable and the explanatory variable an individual-level dummy variable. Due to the likely simultaneity between these variables, I aim to utilize a continuous regional-level instrumental variable. In summary: Y is categorical and individual-level, X is a dummy and individual-level, Z is continuous and regional-level. I also want to control for yearly dummies and regional dummies reflecting regional fixed effects.
While the instrument initially appears to be strong (with a z-value of about 4), and even the adjusted R-squared looks relatively good (~0.35), the first stage of a simple ivregress 2sls Y (X = Z) controls i.regional i.year, vce(cluster regional) routine leads to predicted probabilities (X^hat) below 0 for approximately 20% of the observations. Additionally, the second stage results in a statistically significant coefficient that is unreasonably high in magnitude.
When I run a logit model to predict X_hat and then perform ivregress 2sls Y (X = X_hat) controls i.regional i.year, vce(cluster regional), I obtain a significant second stage coefficient that is only about 10% of the magnitude of the previous one. The marginal effect in the logit regression is quite close to the first stage in the linear model.
Now, onto my questions:
- Is it plausible that the observations with negative predicted probabilities in the first stage are substantially impacting the significant differences observed in the second stage coefficients?
- Is there any possibility to run a first stage logit and a second stage ordered logit model in this situation?
Niklas