Difference between two predictions after a log-linear regression

Ousmane Ndiaye

Join Date: Oct 2021
Posts: 3

Difference between two predictions after a log-linear regression

09 Oct 2021, 07:57

Hello everyone,

I am working on cross section data and I am trying to run a simulation of the impact of climate change on farmers' incomes based on climate scenarios, but I am not sure how to proceed. I am using a log specification and took account the logarithmic transformation but I have somme issues to determine the predicted change.

Here is the code I used :

Code:

regress logrevpast plumoymensV sqplumoymensV plumoyssecV sqplumoyssecV tempmoy sqtempmoy age ib0.sexehead taillemen densite distville distforage patexpss ib3.agrozone, vce(robust)

quietly predict lyhat0

generate yhatnormal0 = exp(lyhat0)*exp(0.5*e(rmse )^2)
regress logrevpast precsmp1 sqprecsmp1 plumoyssecV sqplumoyssecV tempsmp1 sqtemsmp1 age ib0.sexehead taillemen densite distville distforage patexpss ib3.agrozone, vce(robust)

quietly predict lyhat1

generate yhatnormal1 = exp(lyhat1)*exp(0.5*e(rmse )^2)

regress logrevpast precspp1 sqprecspp1 plumoyssecV sqplumoyssecV tempspp1 sqtemspp1 age ib0.sexehead taillemen densite distville distforage patexpss ib3.agrozone, vce(robust)

quietly predict lyhat2

generate yhatnormal2 = exp(lyhat2)*exp(0.5*e(rmse )^2)

Code:

sum yhatnormal0 yhatnormal1 yhatnormal2

Code:

sum yhatnormal0 yhatnormal1 yhatnormal2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
 yhatnormal0 |        918     1858089     1107744   373023.8    4974372
 yhatnormal1 |        918     1842156     1299330     246173    7328710
 yhatnormal2 |        918     1842156     1299330     246173    7328710

What I don't understand is why I get the same results for the predicted values yhatnormal2 and yhatnormal3 while the climate variables of these models are completely different. Is there a condition to indicate in the Stata command line.

Thank you in advance for your help

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 10195
#2

09 Oct 2021, 08:44

You do not include enough detail for us to see the difference between these two models. Instead of using regress with the dependent variable in logs and then exponentiate the predicted log values, use poisson with the -vce(robust)- option. See an argument for this in https://blog.stata.com/2011/08/22/us...tell-a-friend/. Otherwise, show your exact commands and output from Stata, or better yet, provide a sample of your data for any specific comments with regards to #1.
2 likes
Comment

Ousmane Ndiaye

Join Date: Oct 2021
Posts: 3

09 Oct 2021, 12:33

Originally posted by Andrew Musau View Post

You do not include enough detail for us to see the difference between these two models. Instead of using regress with the dependent variable in logs and then exponentiate the predicted log values, use poisson with the -vce(robust)- option. See an argument for this in https://blog.stata.com/2011/08/22/us...tell-a-friend/. Otherwise, show your exact commands and output from Stata, or better yet, provide a sample of your data for any specific comments with regards to #1.

Hi Andrew, Thanks for your anwer. I tried what you indicated by using poisson regression but it was the same issue. Please find an example below :

Code:

sysuse auto.dta
generate ln_price=ln(price)

generate mpg1 = runiform(0,50)
generate gear_ratio1 = runiform(1,3)
generate mpg2 = runiform(0,100)
generate gear_ratio2 = runiform(1,5)



	Code:
	sum mpg*

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         mpg |         74     21.2973    5.785503         12         41
        mpg1 |         74    26.56594    14.73477   1.427843   49.02557
        mpg2 |         74    52.86307    28.62948   .1363096   96.71743


	Code:
	sum gear_ratio*

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
  gear_ratio |         74    3.014865    .4562871       2.19       3.89
 gear_ratio1 |         74     1.94509    .6035447   1.006104   2.988914
 gear_ratio2 |         74    3.108425    1.186182    1.06802   4.964527
regress ln_price mpg gear_ratio displacement length, vce(robust)

quietly predict lyhat0

generate yhatnormal0 = exp(lyhat0)*exp(0.5*e(rmse )^2)

regress ln_price mpg1 gear_ratio1 displacement length, vce(robust)

quietly predict lyhat1

generate yhatnormal1 = exp(lyhat1)*exp(0.5*e(rmse )^2)

regress ln_price mpg2 gear_ratio2 displacement length, vce(robust)

quietly predict lyhat2

generate yhatnormal2 = exp(lyhat2)*exp(0.5*e(rmse )^2)

sum yhatnormal0 yhatnormal1 yhatnormal2

Code:

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
 yhatnormal0 |         74    6125.026    1394.416   3348.871   10099.73
 yhatnormal1 |         74    6129.828    1231.907    4294.25   9564.826
 yhatnormal2 |         74    6129.512    1224.339   4293.026   9322.334

In these regressions, the only variables that will change are mpg and gear_ratio . I don't know if I am missing something with the predict command. And for my real case, the values of the income are the same for the two scenarios with complete different values for the explanatory variables.

Comment

Rich Goldstein

Join Date: Mar 2014

Posts: 4463
#4

09 Oct 2021, 12:54

first, I agree with Andrew Musau that poisson is a better way to go here

second, your question is not clear to me in #3: are you concerned that #'s 1 and 2 are similar or that they differ from #0 or...? I see two issues: (1) 0 is based on actual data but #'s 1 and 2 are based on random dated generated from a uniform distribution; (2) your way of calculating the predicted values, although common many years ago is less common now; you might want to take a look at an article I wrote in STB29 and a program there called -predlog- and, of course, the references (available for free at the Stata web site)
1 like
Comment
Ousmane Ndiaye

Join Date: Oct 2021

Posts: 3
#5

09 Oct 2021, 13:42

Originally posted by Rich Goldstein View Post

first, I agree with Andrew Musau that poisson is a better way to go here

second, your question is not clear to me in #3: are you concerned that #'s 1 and 2 are similar or that they differ from #0 or...? I see two issues: (1) 0 is based on actual data but #'s 1 and 2 are based on random dated generated from a uniform distribution; (2) your way of calculating the predicted values, although common many years ago is less common now; you might want to take a look at an article I wrote in STB29 and a program there called -predlog- and, of course, the references (available for free at the Stata web site)

Thank you for you answer Rich. I will look at your command -predlog-.

As I mentioned, I am trying to conduct an analysis of the impact of climate change on farmers' income by considering two climate scenarios: an optimistic scenario and a pessimistic scenario with different values of precipitation and temperature.

My main problem is that I don't understand why I end up with similar values when I try to predict income with these two completely different scenarios.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4463
#6

09 Oct 2021, 13:55

without a data example (use -dataex- as described in the FAQ), I would just be guessing at this point and I am not interested in guessing right now
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10195

09 Oct 2021, 13:59

My main problem is that I don't understand why I end up with similar values when I try to predict income with these two completely different scenarios.

Linear prediction is straightforward. The main issue is that the contribution of the variables mpg and gear_ratio to the magnitude of the prediction is small, i.e., their coefficients are not large. The biggest impact comes from the constant terms in these models.

Code:

sysuse auto.dta, clear
generate ln_price=ln(price)

generate mpg1 = runiform(0,50)
generate gear_ratio1 = runiform(1,3)
generate mpg2 = runiform(0,100)
generate gear_ratio2 = runiform(1,5)

sum mpg* gear*

regress ln_price mpg gear_ratio displacement length, vce(robust)

quietly predict lyhat0

generate yhatnormal0 = exp(lyhat0)*exp(0.5*e(rmse )^2)

regress ln_price mpg1 gear_ratio1 displacement length, vce(robust)

quietly predict lyhat1

generate yhatnormal1 = exp(lyhat1)*exp(0.5*e(rmse )^2)

regress ln_price mpg2 gear_ratio2 displacement length, vce(robust)

quietly predict lyhat2

sum lyhat*

generate yhatnormal2 = exp(lyhat2)*exp(0.5*e(rmse )^2)

sum yhatnormal0 yhatnormal1 yhatnormal2

di _b[mpg2]*mpg2 + _b[gear_ratio2]*gear_ratio2+ _b[displacement]*displacement+ _b[length]*length+ _b[_cons]
l mpg2 gear_ratio2 displacement length lyhat2 in 1

di  _b[_cons]
di _b[mpg2]*mpg2 + _b[gear_ratio2]*gear_ratio2

Res.:

Code:

. 
. di _b[mpg2]*mpg2 + _b[gear_ratio2]*gear_ratio2+ _b[displacement]*displacement+ _b[length]*length+ _b[_cons]
8.4891705

. 
. l mpg2 gear_ratio2 displacement length lyhat2 in 1

     +---------------------------------------------------+
     |     mpg2   gear_r~2   displa~t   length    lyhat2 |
     |---------------------------------------------------|
  1. | 17.36788    2.00012        121      186   8.48917 |
     +---------------------------------------------------+

. 
. 
. 
. di  _b[_cons]
7.6561261

. 
. di _b[mpg2]*mpg2 + _b[gear_ratio2]*gear_ratio2
.03847502

For the first observation, only (.03847502/ 8.48917)*100 = 0.45% of the magnitude of the predicted value is due to these two variables. Same scenario for the other model.

Last edited by Andrew Musau; 09 Oct 2021, 14:02.

Announcement

Difference between two predictions after a log-linear regression

Comment

Comment

Comment

Comment

Comment

Comment