Hi,
Consider the following model:
yvar = a + b1*x1var + b2*x1var^2 + b3*x2var + b4'*controls + error (eq 1)
As yvar is binary, I estimate (eq 1) by logit.
x1var and x2var are continuous and endogenous independent variables.
I also have z1var and z2var instrumental variables: z1var is an instrument for x1var and z2var is an instrument for x2var.
Therefore, I would like to implement a control function approach to account for the endogeneity of both x1var (and x1var squared) and x2var.
I have read Wooldridge textbook and I have read multiple posts here, but I am still having some difficulties.
Basically, I have tried two different control functions to estimate (eq 1) and I get very different results, unexpectedly.
First, I estimate a plain and simple control function (CF1) as follows:
Then, I try a more flexible control function (CF2) as follows:
However, the results that I obtain when using CF1 or CF2 are completely different in terms of sign, magnitude and statistical significance.
In principle, I would prefer CF2 as the control function is more flexible.
However, I am uncertain whether there is something wrong with CF2.
Do you see any obvious reason why the two control functions CF1 and CF2 produce completely different results? Which control functions would you prefer?
Thanks,
Lukas
Consider the following model:
yvar = a + b1*x1var + b2*x1var^2 + b3*x2var + b4'*controls + error (eq 1)
As yvar is binary, I estimate (eq 1) by logit.
x1var and x2var are continuous and endogenous independent variables.
I also have z1var and z2var instrumental variables: z1var is an instrument for x1var and z2var is an instrument for x2var.
Therefore, I would like to implement a control function approach to account for the endogeneity of both x1var (and x1var squared) and x2var.
I have read Wooldridge textbook and I have read multiple posts here, but I am still having some difficulties.
Basically, I have tried two different control functions to estimate (eq 1) and I get very different results, unexpectedly.
First, I estimate a plain and simple control function (CF1) as follows:
Code:
*first stage reg x1var z1var z2var ${controls} predict resid_1, res reg x2var z1var z2var ${controls} predict resid_2, res *second stage logit yvar c.x1var##c.x1var x2var ${controls} resid_1 resid_2 *both stages are bootstrapped to get correct standard errors
Code:
*first stage gen x1var_2=x1var^2 reg x1var c.z1var##c.z1var z2var ${controls} predict resid_1, res reg x1var_2 c.z1var##c.z1var z2var ${controls} predict resid_2, res reg x2var c.z1var##c.z1varz2var ${controls} predict resid_3, res *second stage logit yvar c.x1var##c.x1var x2var ${controls} resid_1 resid_2 resid_3 *both stages are bootstrapped to get correct standard errors
In principle, I would prefer CF2 as the control function is more flexible.
However, I am uncertain whether there is something wrong with CF2.
Do you see any obvious reason why the two control functions CF1 and CF2 produce completely different results? Which control functions would you prefer?
Thanks,
Lukas
Comment