2SLS with log dependent variable - predicted values are systematically very high

ns sn

Join Date: Nov 2023

Posts: 8
#1

2SLS with log dependent variable - predicted values are systematically very high

11 Nov 2023, 20:49

Dear Forum members, your help will be highly appreciated.

I have a dependent variable (y) which is a dollar value. To reduce the skewness in this variable I am log transforming this variable.

I have a continuous endogenous independent variable (x) that is between 0 and 1. I hypothesize an inverted U relationship between y and x.

I have an exogenous variable z that I can use as IV for x. Their correlation is around 0.2, and the theoretical exclusion condition is strong. Since I have a single instrumental variable but need to specify the square term of the endogenous variable, I follow the approach outlined here : https://www.statalist.org/forums/for...quadratic-term

Code:

y_ln = log(y) x2 = x^2 reg x z other_controls predict xhat gen xhat2 = xhat^2 ivreghdfe 2sls y_ln (x x2 = xhat xhat2) other_contorls, robust absorb(fe1 fe2 fe3)

The results indicate that the instrument strongly identifies the endogenous variable: (Cragg-Donald Wald F statistic): 386.066 and (Kleibergen-Paap rk Wald F statistic): 55.354.

I also find the inverted U relationship as per my hypothesis.

The issue

When I run "margins" or find predicted values: they are much higher than y_ln. I run the following code:

Code:

predict predicted_y_ln sum predicted_y_ln, det sum y_ln, det

While y_ln has a mean 0.75 median and median 1.2, predicted_y_ln has a mean of 6.43 and median of 6.69, and other quartiles are very high as well. It looks like the predictions are systematically higher.

Is this by any chance an expected behavior (2SLS and log transformed dependant variable)? What could be going wrong? If you can point me to stuff that I can read to understand things better and fix the issue I will highly appreciate that.

Thanks!

Last edited by ns sn; 11 Nov 2023, 20:55.
Tags: regression
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2189
#2

12 Nov 2023, 11:11

I don’t know how predict works after ivreghdfe. It might not include the estimates of the fixed effects. Still, I don’t know if that would make them systematically too large.
Comment
ns sn

Join Date: Nov 2023

Posts: 8
#3

12 Nov 2023, 20:55

Dear Prof. Wooldridge,
Thank you so much for your reply! It indeed looks like an ivreghdfe issue.

I tried run the same spec with ivreg2.

Code:

ivreg2 2sls y_ln (x x2 = xhat xhat2) other_contorls i.fe1 i.fe2 i.fe3, robust

The standard errors are very slightly different from ivreghdfe (only in the fourth decimal points). When I predict now, the values are totally as one would expect. The predicted values median is 1.33 vs the sample median of 1.20.

The main issue then is the slowness of ivreg2 : it takes 2 hours to run my data, whereas ivreghdfe runs it in less than a minute.

Is there anything that I can do in ivreghdfe during estimation or prediction to get the correct values? BTW, the wrong values appear in the margins command after ivreghdfe as well.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#4

12 Nov 2023, 23:34

Dear ns sn,

According to this page, you will need the option d when you estimate, and the option xbd when you predict, for the fixed effects to be included in the prediction (which, as Jeff suggested, is the problem). Check the help file of reghdfe for more details.

Best wishes,

Joao
Comment
ns sn

Join Date: Nov 2023

Posts: 8
#5

13 Nov 2023, 20:42

Thanks, Professor Silva. I went through what you suggested. With a minor modification (need "resid" at the time of estimation), it works as expected.

However, now I am stuck with another issue. Apparently, margins command doesn't like this "predict(xbd)"

The following code produces an error

Code:

margins, at(x = (0(0.05)1)) expression(predict(xbd))

The error message "prediction is a function of possibly stochastic quantities other than e(b)".
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2189
#6

13 Nov 2023, 22:36

It looks like you simple want the function plotted from x = 0 to x = 1, in increments of 0.05. I'm not really sure why. The fixed effects imply a different intercept for each i, so Stata can't know which ones to use. Why not obtain the overall intercept as the average and then plot that function? The shape of the function is the same if you drop the fixed effects.
Comment
ns sn

Join Date: Nov 2023

Posts: 8
#7

14 Nov 2023, 07:30

Dear Prof. Wooldridge,
I am looking to replicate the margins command after ivereg when I specify the fixed effects in the form of "i.fe1" "i.fe2" etc. I am basically looking to replicate the following code using ivreghdfe:

Code:

ivreg2 2sls y_ln (x x2 = xhat xhat2) other_contorls i.fe1 i.fe2 i.fe3, robust margins, at(x = (0(0.05)1))

Will the average of fixed effects idea you mentioned - will that be the same?

Thanks.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2189
#8

14 Nov 2023, 07:57

I don't think you're computing what is most interesting. First, your margins command ignores the fact that x2 = x^2, so you'll be getting the wrong predicted values. Plus, I think you want the marginal effects:

Code:

ivregress 2sls y_ln (c.x c.x#cx = xhat xhat2) other_controls i.fe1 i.fe2 i.fe3, robust margins, dydx(x) at(x = (0(0.05)1)
Comment
ns sn

Join Date: Nov 2023

Posts: 8
#9

14 Nov 2023, 08:14

My apologies for making the mistake in typing the code here. I actually am using the factor notation you mentioned for x. The exact spec I used is as follows:

Code:

ivreg2 2sls y_ln (c.x##c.x = c.xhat##c.xhat) other_contorls i.fe1 i.fe2 i.fe3, robust margins, at(x = (0(0.05)1))

I was looking for the predicted values instead of dydx because I wanted to make a point like the following in the paper:
" When x increases from the current sample mean to the inflection point (which I computed), y_ln (or, exp(y_ln)) increases a certain percentage (or a certain value)"

Is this reasonable?

Thanks

Last edited by ns sn; 14 Nov 2023, 08:14. Reason: Fixed the code alignment for better readability.
Comment
ns sn

Join Date: Nov 2023

Posts: 8
#10

14 Nov 2023, 11:33

Prof. Wooldridge, I missed your point earlier. I now understand the value of "margins, dydx(x)" since this negates the need for thinking about all other model parameters, fixed effects, keeping them at means etc.

However, I have more conceptual follow-up questions which I thought should better be in a different thread. So, I wrote that down here: https://www.statalist.org/forums/for...-change-notion

I hope that you and other experts get a couple of minutes to provide your comments on that. Thanks for your time.
Comment

Announcement

2SLS with log dependent variable - predicted values are systematically very high

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment