Dear Forum members, your help will be highly appreciated.
I have a dependent variable (y) which is a dollar value. To reduce the skewness in this variable I am log transforming this variable.
I have a continuous endogenous independent variable (x) that is between 0 and 1. I hypothesize an inverted U relationship between y and x.
I have an exogenous variable z that I can use as IV for x. Their correlation is around 0.2, and the theoretical exclusion condition is strong. Since I have a single instrumental variable but need to specify the square term of the endogenous variable, I follow the approach outlined here : https://www.statalist.org/forums/for...quadratic-term
The results indicate that the instrument strongly identifies the endogenous variable: (Cragg-Donald Wald F statistic): 386.066 and (Kleibergen-Paap rk Wald F statistic): 55.354.
I also find the inverted U relationship as per my hypothesis.
The issue
When I run "margins" or find predicted values: they are much higher than y_ln. I run the following code:
While y_ln has a mean 0.75 median and median 1.2, predicted_y_ln has a mean of 6.43 and median of 6.69, and other quartiles are very high as well. It looks like the predictions are systematically higher.
Is this by any chance an expected behavior (2SLS and log transformed dependant variable)? What could be going wrong? If you can point me to stuff that I can read to understand things better and fix the issue I will highly appreciate that.
Thanks!
I have a dependent variable (y) which is a dollar value. To reduce the skewness in this variable I am log transforming this variable.
I have a continuous endogenous independent variable (x) that is between 0 and 1. I hypothesize an inverted U relationship between y and x.
I have an exogenous variable z that I can use as IV for x. Their correlation is around 0.2, and the theoretical exclusion condition is strong. Since I have a single instrumental variable but need to specify the square term of the endogenous variable, I follow the approach outlined here : https://www.statalist.org/forums/for...quadratic-term
Code:
y_ln = log(y) x2 = x^2 reg x z other_controls predict xhat gen xhat2 = xhat^2 ivreghdfe 2sls y_ln (x x2 = xhat xhat2) other_contorls, robust absorb(fe1 fe2 fe3)
I also find the inverted U relationship as per my hypothesis.
The issue
When I run "margins" or find predicted values: they are much higher than y_ln. I run the following code:
Code:
predict predicted_y_ln sum predicted_y_ln, det sum y_ln, det
Is this by any chance an expected behavior (2SLS and log transformed dependant variable)? What could be going wrong? If you can point me to stuff that I can read to understand things better and fix the issue I will highly appreciate that.
Thanks!
Comment