Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2SLS with log dependent variable - predicted values are systematically very high

    Dear Forum members, your help will be highly appreciated.

    I have a dependent variable (y) which is a dollar value. To reduce the skewness in this variable I am log transforming this variable.

    I have a continuous endogenous independent variable (x) that is between 0 and 1. I hypothesize an inverted U relationship between y and x.

    I have an exogenous variable z that I can use as IV for x. Their correlation is around 0.2, and the theoretical exclusion condition is strong. Since I have a single instrumental variable but need to specify the square term of the endogenous variable, I follow the approach outlined here : https://www.statalist.org/forums/for...quadratic-term
    Code:
     y_ln = log(y) x2 = x^2  
    reg x z other_controls  
    
    predict xhat  
    gen xhat2 = xhat^2  
    ivreghdfe 2sls y_ln (x x2 = xhat xhat2) other_contorls, robust absorb(fe1 fe2 fe3)
    The results indicate that the instrument strongly identifies the endogenous variable: (Cragg-Donald Wald F statistic): 386.066 and (Kleibergen-Paap rk Wald F statistic): 55.354.

    I also find the inverted U relationship as per my hypothesis.

    The issue

    When I run "margins" or find predicted values: they are much higher than y_ln. I run the following code:
    Code:
    predict predicted_y_ln
    sum predicted_y_ln, det
    
    sum y_ln, det
    While y_ln has a mean 0.75 median and median 1.2, predicted_y_ln has a mean of 6.43 and median of 6.69, and other quartiles are very high as well. It looks like the predictions are systematically higher.

    Is this by any chance an expected behavior (2SLS and log transformed dependant variable)? What could be going wrong? If you can point me to stuff that I can read to understand things better and fix the issue I will highly appreciate that.

    Thanks!
    Last edited by ns sn; 11 Nov 2023, 20:55.

  • #2
    I don’t know how predict works after ivreghdfe. It might not include the estimates of the fixed effects. Still, I don’t know if that would make them systematically too large.

    Comment


    • #3
      Dear Prof. Wooldridge,
      Thank you so much for your reply! It indeed looks like an ivreghdfe issue.

      I tried run the same spec with ivreg2.

      Code:
       
       ivreg2 2sls y_ln (x x2 = xhat xhat2) other_contorls i.fe1 i.fe2 i.fe3, robust
      The standard errors are very slightly different from ivreghdfe (only in the fourth decimal points). When I predict now, the values are totally as one would expect. The predicted values median is 1.33 vs the sample median of 1.20.

      The main issue then is the slowness of ivreg2 : it takes 2 hours to run my data, whereas ivreghdfe runs it in less than a minute.

      Is there anything that I can do in ivreghdfe during estimation or prediction to get the correct values? BTW, the wrong values appear in the margins command after ivreghdfe as well.



      Comment


      • #4
        Dear ns sn,

        According to this page, you will need the option d when you estimate, and the option xbd when you predict, for the fixed effects to be included in the prediction (which, as Jeff suggested, is the problem). Check the help file of reghdfe for more details.

        Best wishes,

        Joao

        Comment


        • #5
          Thanks, Professor Silva. I went through what you suggested. With a minor modification (need "resid" at the time of estimation), it works as expected.

          However, now I am stuck with another issue. Apparently, margins command doesn't like this "predict(xbd)"

          The following code produces an error
          Code:
          margins, at(x = (0(0.05)1)) expression(predict(xbd))
          The error message "prediction is a function of possibly stochastic quantities other than e(b)".

          Comment


          • #6
            It looks like you simple want the function plotted from x = 0 to x = 1, in increments of 0.05. I'm not really sure why. The fixed effects imply a different intercept for each i, so Stata can't know which ones to use. Why not obtain the overall intercept as the average and then plot that function? The shape of the function is the same if you drop the fixed effects.

            Comment


            • #7
              Dear Prof. Wooldridge,
              I am looking to replicate the margins command after ivereg when I specify the fixed effects in the form of "i.fe1" "i.fe2" etc. I am basically looking to replicate the following code using ivreghdfe:

              Code:
               
               ivreg2 2sls y_ln (x x2 = xhat xhat2) other_contorls i.fe1 i.fe2 i.fe3, robust   
               margins, at(x = (0(0.05)1))
              Will the average of fixed effects idea you mentioned - will that be the same?

              Thanks.

              Comment


              • #8
                I don't think you're computing what is most interesting. First, your margins command ignores the fact that x2 = x^2, so you'll be getting the wrong predicted values. Plus, I think you want the marginal effects:

                Code:
                ivregress 2sls y_ln (c.x c.x#cx = xhat xhat2) other_controls i.fe1 i.fe2 i.fe3, robust
                margins, dydx(x) at(x = (0(0.05)1)

                Comment


                • #9
                  My apologies for making the mistake in typing the code here. I actually am using the factor notation you mentioned for x. The exact spec I used is as follows:
                  Code:
                    ivreg2 2sls y_ln (c.x##c.x = c.xhat##c.xhat) other_contorls i.fe1 i.fe2 i.fe3, robust    
                  
                  margins, at(x = (0(0.05)1))
                  I was looking for the predicted values instead of dydx because I wanted to make a point like the following in the paper:
                  " When x increases from the current sample mean to the inflection point (which I computed), y_ln (or, exp(y_ln)) increases a certain percentage (or a certain value)"

                  Is this reasonable?

                  Thanks
                  Last edited by ns sn; 14 Nov 2023, 08:14. Reason: Fixed the code alignment for better readability.

                  Comment


                  • #10
                    Prof. Wooldridge, I missed your point earlier. I now understand the value of "margins, dydx(x)" since this negates the need for thinking about all other model parameters, fixed effects, keeping them at means etc.

                    However, I have more conceptual follow-up questions which I thought should better be in a different thread. So, I wrote that down here: https://www.statalist.org/forums/for...-change-notion

                    I hope that you and other experts get a couple of minutes to provide your comments on that. Thanks for your time.

                    Comment

                    Working...
                    X