Dear all,
I am using the two stage residuals inclusion method to control for endogeneity in the relationship between depression and two variables of interest: the number of hours worked and being a caregiver. Since there are two endogenous variables, there are also two instruments.
The first regressor (number of hours worked) is continuous. Firstly, I regress the endogenous regressor on its instrument using a linear model. In that case, the residuals are the usual ones and I include it in the regression of the second stage.
The second regressor (being a caregiver) is a discrete ordered variable scored from 0 to 4. Again, I regress the endogenous regressor on its instrument using an ordered probit. I want to calculate the generalized residuals of the first stage concerning the "being a caregiver" variable in order to include it in the second stage as a regressor. Stata does not provide residuals after xtoprobit.
I tried to calculate the generalized residuals by myself but I am not sure of my calculations.
Here is a part of my code:
***control function
**1st stage
*discrete ordered regressor
xtoprobit caregivingp distancep sister brother Female grandchild Austria Germany Sweden Spain Italy Denmark Switzerland Belgium Slovenia Estonia Portugal The_Netherlands Luxembourg Poland if age>=50
predict zb, xb
gen res1 = 0
replace res1 = (-normalden(_b[/cut1] - zb))/(normal(_b[/cut1] - zb)) if (caregivingp == 0)
replace res1 = (normalden(_b[/cut1] - zb) - normalden(_b[/cut2] - zb))/(normal(_b[/cut2] - zb)-normal(_b[/cut1] - zb)) if (caregivingp == 1)
replace res1 = (normalden(_b[/cut2] - zb) - normalden(_b[/cut3] - zb))/(normal(_b[/cut3] - zb)-normal(_b[/cut2] - zb)) if (caregivingp == 2)
replace res1 = (normalden(_b[/cut3] - zb) - normalden(_b[/cut4] - zb))/(normal(_b[/cut4] - zb)-normal(_b[/cut3] - zb)) if (caregivingp == 3)
replace res1 = (normalden(_b[/cut4] - zb))/(1 - normal(_b[/cut4] - zb)) if (caregivingp == 4)
*continuous regressor
xtreg hours_worked hours_worked_lag sister brother Female grandchild Austria Germany Sweden Spain Italy Denmark Switzerland Belgium Slovenia Estonia Portugal The_Netherlands Luxembourg Poland if age>=50
predict res2, res
*2nd stage
xtoprobit Depression caregivingp hours_worked res1 res2 Female sister brother grandchild Austria Germany Sweden Spain Italy Denmark Switzerland Belgium Slovenia Estonia Portugal The_Netherlands Luxembourg Poland if age>=50 Thanks !
Marie Blaise
I am using the two stage residuals inclusion method to control for endogeneity in the relationship between depression and two variables of interest: the number of hours worked and being a caregiver. Since there are two endogenous variables, there are also two instruments.
The first regressor (number of hours worked) is continuous. Firstly, I regress the endogenous regressor on its instrument using a linear model. In that case, the residuals are the usual ones and I include it in the regression of the second stage.
The second regressor (being a caregiver) is a discrete ordered variable scored from 0 to 4. Again, I regress the endogenous regressor on its instrument using an ordered probit. I want to calculate the generalized residuals of the first stage concerning the "being a caregiver" variable in order to include it in the second stage as a regressor. Stata does not provide residuals after xtoprobit.
I tried to calculate the generalized residuals by myself but I am not sure of my calculations.
Here is a part of my code:
***control function
**1st stage
*discrete ordered regressor
xtoprobit caregivingp distancep sister brother Female grandchild Austria Germany Sweden Spain Italy Denmark Switzerland Belgium Slovenia Estonia Portugal The_Netherlands Luxembourg Poland if age>=50
predict zb, xb
gen res1 = 0
replace res1 = (-normalden(_b[/cut1] - zb))/(normal(_b[/cut1] - zb)) if (caregivingp == 0)
replace res1 = (normalden(_b[/cut1] - zb) - normalden(_b[/cut2] - zb))/(normal(_b[/cut2] - zb)-normal(_b[/cut1] - zb)) if (caregivingp == 1)
replace res1 = (normalden(_b[/cut2] - zb) - normalden(_b[/cut3] - zb))/(normal(_b[/cut3] - zb)-normal(_b[/cut2] - zb)) if (caregivingp == 2)
replace res1 = (normalden(_b[/cut3] - zb) - normalden(_b[/cut4] - zb))/(normal(_b[/cut4] - zb)-normal(_b[/cut3] - zb)) if (caregivingp == 3)
replace res1 = (normalden(_b[/cut4] - zb))/(1 - normal(_b[/cut4] - zb)) if (caregivingp == 4)
*continuous regressor
xtreg hours_worked hours_worked_lag sister brother Female grandchild Austria Germany Sweden Spain Italy Denmark Switzerland Belgium Slovenia Estonia Portugal The_Netherlands Luxembourg Poland if age>=50
predict res2, res
*2nd stage
xtoprobit Depression caregivingp hours_worked res1 res2 Female sister brother grandchild Austria Germany Sweden Spain Italy Denmark Switzerland Belgium Slovenia Estonia Portugal The_Netherlands Luxembourg Poland if age>=50 Thanks !
Marie Blaise