Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference between two predictions after a log-linear regression

    Hello everyone,

    I am working on cross section data and I am trying to run a simulation of the impact of climate change on farmers' incomes based on climate scenarios, but I am not sure how to proceed. I am using a log specification and took account the logarithmic transformation but I have somme issues to determine the predicted change.

    Here is the code I used :


    Code:
    regress logrevpast plumoymensV sqplumoymensV plumoyssecV sqplumoyssecV tempmoy sqtempmoy age ib0.sexehead taillemen densite distville distforage patexpss ib3.agrozone, vce(robust)
    
    quietly predict lyhat0
    
    generate yhatnormal0 = exp(lyhat0)*exp(0.5*e(rmse )^2)
    regress logrevpast precsmp1 sqprecsmp1 plumoyssecV sqplumoyssecV tempsmp1 sqtemsmp1 age ib0.sexehead taillemen densite distville distforage patexpss ib3.agrozone, vce(robust)
    
    quietly predict lyhat1
    
    generate yhatnormal1 = exp(lyhat1)*exp(0.5*e(rmse )^2)
    
    regress logrevpast precspp1 sqprecspp1 plumoyssecV sqplumoyssecV tempspp1 sqtemspp1 age ib0.sexehead taillemen densite distville distforage patexpss ib3.agrozone, vce(robust)
    
    quietly predict lyhat2
    
    generate yhatnormal2 = exp(lyhat2)*exp(0.5*e(rmse )^2)
    Code:
    sum yhatnormal0 yhatnormal1 yhatnormal2

    Code:
    sum yhatnormal0 yhatnormal1 yhatnormal2
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
     yhatnormal0 |        918     1858089     1107744   373023.8    4974372
     yhatnormal1 |        918     1842156     1299330     246173    7328710
     yhatnormal2 |        918     1842156     1299330     246173    7328710
    What I don't understand is why I get the same results for the predicted values yhatnormal2 and yhatnormal3 while the climate variables of these models are completely different. Is there a condition to indicate in the Stata command line.

    Thank you in advance for your help














  • #2
    You do not include enough detail for us to see the difference between these two models. Instead of using regress with the dependent variable in logs and then exponentiate the predicted log values, use poisson with the -vce(robust)- option. See an argument for this in https://blog.stata.com/2011/08/22/us...tell-a-friend/. Otherwise, show your exact commands and output from Stata, or better yet, provide a sample of your data for any specific comments with regards to #1.

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      You do not include enough detail for us to see the difference between these two models. Instead of using regress with the dependent variable in logs and then exponentiate the predicted log values, use poisson with the -vce(robust)- option. See an argument for this in https://blog.stata.com/2011/08/22/us...tell-a-friend/. Otherwise, show your exact commands and output from Stata, or better yet, provide a sample of your data for any specific comments with regards to #1.
      Hi Andrew, Thanks for your anwer. I tried what you indicated by using poisson regression but it was the same issue. Please find an example below :

      Code:
      sysuse auto.dta
      generate ln_price=ln(price)
      
      generate mpg1 = runiform(0,50)
      generate gear_ratio1 = runiform(1,3)
      generate mpg2 = runiform(0,100)
      generate gear_ratio2 = runiform(1,5)
      
      
      
      Code:
      sum mpg*
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
               mpg |         74     21.2973    5.785503         12         41
              mpg1 |         74    26.56594    14.73477   1.427843   49.02557
              mpg2 |         74    52.86307    28.62948   .1363096   96.71743
      Code:
      sum gear_ratio*
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
        gear_ratio |         74    3.014865    .4562871       2.19       3.89
       gear_ratio1 |         74     1.94509    .6035447   1.006104   2.988914
       gear_ratio2 |         74    3.108425    1.186182    1.06802   4.964527
      regress ln_price mpg gear_ratio displacement length, vce(robust) quietly predict lyhat0 generate yhatnormal0 = exp(lyhat0)*exp(0.5*e(rmse )^2) regress ln_price mpg1 gear_ratio1 displacement length, vce(robust) quietly predict lyhat1 generate yhatnormal1 = exp(lyhat1)*exp(0.5*e(rmse )^2) regress ln_price mpg2 gear_ratio2 displacement length, vce(robust) quietly predict lyhat2 generate yhatnormal2 = exp(lyhat2)*exp(0.5*e(rmse )^2) sum yhatnormal0 yhatnormal1 yhatnormal2
      Code:
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
       yhatnormal0 |         74    6125.026    1394.416   3348.871   10099.73
       yhatnormal1 |         74    6129.828    1231.907    4294.25   9564.826
       yhatnormal2 |         74    6129.512    1224.339   4293.026   9322.334
      In these regressions, the only variables that will change are mpg and gear_ratio . I don't know if I am missing something with the predict command. And for my real case, the values of the income are the same for the two scenarios with complete different values for the explanatory variables.










      Comment


      • #4
        first, I agree with Andrew Musau that poisson is a better way to go here

        second, your question is not clear to me in #3: are you concerned that #'s 1 and 2 are similar or that they differ from #0 or...? I see two issues: (1) 0 is based on actual data but #'s 1 and 2 are based on random dated generated from a uniform distribution; (2) your way of calculating the predicted values, although common many years ago is less common now; you might want to take a look at an article I wrote in STB29 and a program there called -predlog- and, of course, the references (available for free at the Stata web site)

        Comment


        • #5
          Originally posted by Rich Goldstein View Post
          first, I agree with Andrew Musau that poisson is a better way to go here

          second, your question is not clear to me in #3: are you concerned that #'s 1 and 2 are similar or that they differ from #0 or...? I see two issues: (1) 0 is based on actual data but #'s 1 and 2 are based on random dated generated from a uniform distribution; (2) your way of calculating the predicted values, although common many years ago is less common now; you might want to take a look at an article I wrote in STB29 and a program there called -predlog- and, of course, the references (available for free at the Stata web site)
          Thank you for you answer Rich. I will look at your command -predlog-.

          As I mentioned, I am trying to conduct an analysis of the impact of climate change on farmers' income by considering two climate scenarios: an optimistic scenario and a pessimistic scenario with different values of precipitation and temperature.

          My main problem is that I don't understand why I end up with similar values when I try to predict income with these two completely different scenarios.




          Comment


          • #6
            without a data example (use -dataex- as described in the FAQ), I would just be guessing at this point and I am not interested in guessing right now

            Comment


            • #7
              My main problem is that I don't understand why I end up with similar values when I try to predict income with these two completely different scenarios.
              Linear prediction is straightforward. The main issue is that the contribution of the variables mpg and gear_ratio to the magnitude of the prediction is small, i.e., their coefficients are not large. The biggest impact comes from the constant terms in these models.

              Code:
              sysuse auto.dta, clear
              generate ln_price=ln(price)
              
              generate mpg1 = runiform(0,50)
              generate gear_ratio1 = runiform(1,3)
              generate mpg2 = runiform(0,100)
              generate gear_ratio2 = runiform(1,5)
              
              sum mpg* gear*
              
              regress ln_price mpg gear_ratio displacement length, vce(robust)
              
              quietly predict lyhat0
              
              generate yhatnormal0 = exp(lyhat0)*exp(0.5*e(rmse )^2)
              
              regress ln_price mpg1 gear_ratio1 displacement length, vce(robust)
              
              quietly predict lyhat1
              
              generate yhatnormal1 = exp(lyhat1)*exp(0.5*e(rmse )^2)
              
              regress ln_price mpg2 gear_ratio2 displacement length, vce(robust)
              
              quietly predict lyhat2
              
              sum lyhat*
              
              generate yhatnormal2 = exp(lyhat2)*exp(0.5*e(rmse )^2)
              
              sum yhatnormal0 yhatnormal1 yhatnormal2
              
              di _b[mpg2]*mpg2 + _b[gear_ratio2]*gear_ratio2+ _b[displacement]*displacement+ _b[length]*length+ _b[_cons]
              l mpg2 gear_ratio2 displacement length lyhat2 in 1
              
              di  _b[_cons]
              di _b[mpg2]*mpg2 + _b[gear_ratio2]*gear_ratio2

              Res.:

              Code:
              . 
              . di _b[mpg2]*mpg2 + _b[gear_ratio2]*gear_ratio2+ _b[displacement]*displacement+ _b[length]*length+ _b[_cons]
              8.4891705
              
              . 
              . l mpg2 gear_ratio2 displacement length lyhat2 in 1
              
                   +---------------------------------------------------+
                   |     mpg2   gear_r~2   displa~t   length    lyhat2 |
                   |---------------------------------------------------|
                1. | 17.36788    2.00012        121      186   8.48917 |
                   +---------------------------------------------------+
              
              . 
              . 
              . 
              . di  _b[_cons]
              7.6561261
              
              . 
              . di _b[mpg2]*mpg2 + _b[gear_ratio2]*gear_ratio2
              .03847502

              For the first observation, only (.03847502/ 8.48917)*100 = 0.45% of the magnitude of the predicted value is due to these two variables. Same scenario for the other model.
              Last edited by Andrew Musau; 09 Oct 2021, 14:02.

              Comment

              Working...
              X