Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Back transforming logarithmic regression prediction with Duan's smearing estimator

    Hi

    I am running an income regression in Stata 12.

    Due to the non-linear relationship between income and its determinants I use the logarithm of income in the OLS regression, therefore the predicted values (lsincome) are also in logarithmic form. I need to transform these back into non-logarithmic form, and at first I used a simple formula:

    ​​​​​​
    Code:
    reg logincome i.educ age age2 married male black hispan1 speakengwell i.occ, nocons robust
    
    scalar RMSE = e(rmse)
    
    predict lsincome, xb
    
    gen predincome = exp(lsincome)*exp(RMSE^2/2)
    ​​​​​​However, I found out that the residuals need to be normally distributed for this to work, and mine are NOT. I read about Duan's smearing estimator here: https://davegiles.blogspot.com/2014/12/s.html, which is supposed to work even if the residuals are not normally distributed. My question is how do I implement this in Stata?

    (Please find data example below)

    Many thanks

    Stella

    ​​​​​​
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float logincome byte(educ age) float age2 byte(married male black) float hispan1 byte speakengwell float occ
    10.985292 5 28  784 0 0 1 0 1 18
    10.778956 3 33 1089 0 1 0 0 1 24
     9.998797 5 28  784 0 1 0 0 1  5
    11.112448 5 31  961 0 0 0 0 1  5
       10.859 4 35 1225 0 0 0 0 1 10
    9.1049795 4 30  900 1 0 0 0 1 15
    10.819778 4 34 1156 1 1 1 0 1 23
     9.392662 5 27  729 0 0 0 0 1 10
     10.83958 5 31  961 1 0 0 0 1 12
     9.714746 3 20  400 0 0 1 0 1 19
     8.556414 3 24  576 0 1 1 0 1 15
     9.480368 4 19  361 0 1 0 0 1 23
     9.350102 3 26  676 0 0 1 0 1  8
    10.596635 5 30  900 1 0 0 0 1 12
     10.22557 5 27  729 0 0 0 0 1 10
    10.819778 3 31  961 1 1 0 0 1 23
     10.37349 3 31  961 1 1 0 0 1 23
     9.852194 4 27  729 0 1 1 0 1 24
    10.404263 3 25  625 1 1 0 0 1 23
    10.714417 3 35 1225 0 0 0 1 0 19
     9.952278 5 29  841 0 1 0 0 1 24
     9.836279 4 24  576 0 0 1 0 1 12
    10.596635 4 23  529 1 1 0 0 1 19
    10.308952 4 34 1156 0 1 1 0 1 24
     10.23996 3 30  900 1 1 1 0 1 14
    10.714417 5 28  784 1 1 0 0 1  4
     9.546813 6 24  576 0 1 0 0 1 19
     9.903487 3 34 1156 1 0 0 0 1 24
    10.071963 4 30  900 0 0 1 0 1 13
    10.714417 3 26  676 1 1 0 0 1 25
     9.615806 4 33 1089 1 0 0 0 1 18
     12.25009 5 27  729 0 0 0 0 1  2
    10.341743 4 23  529 0 1 0 0 1  4
     9.680344 4 25  625 0 1 0 0 1 19
     9.574984 3 35 1225 0 1 0 0 1 17
    10.911446 5 32 1024 1 0 0 0 1 12
    10.778956 5 26  676 1 0 0 0 1 12
    10.819778 5 34 1156 0 1 0 0 1 21
     9.952278 3 25  625 0 1 1 0 1 24
     10.23996 4 28  784 0 1 1 0 1 14
     10.04325 4 30  900 1 1 0 0 1 19
     10.04325 3 35 1225 1 1 0 0 1 24
    10.518673 4 25  625 0 1 1 0 1 14
    10.460242 3 32 1024 0 0 1 0 1 13
     9.615806 4 35 1225 0 0 1 0 1 19
     9.472705 5 30  900 0 0 0 0 1 10
     10.23996 5 29  841 1 1 0 0 1 25
     8.006368 5 25  625 0 0 1 0 1 15
    10.341743 4 22  484 1 1 1 0 1 19
    10.434115 5 34 1156 0 0 0 0 1 19
    9.1049795 3 33 1089 1 0 1 0 1 18
    10.668956 5 25  625 1 1 0 0 1 19
     9.798127 5 31  961 0 0 0 0 1 19
     9.903487 5 27  729 1 1 0 0 1 25
      11.0021 3 33 1089 1 1 0 0 1 23
    10.308952 3 23  529 1 1 0 0 1 11
       10.859 5 28  784 0 1 0 0 1 18
    10.819778 3 33 1089 0 1 0 0 1 23
    10.518673 5 23  529 0 0 0 0 1 19
     9.472705 3 28  784 1 0 0 0 1 18
    10.878047 4 28  784 0 1 0 0 1  4
     11.03489 6 35 1225 1 0 0 0 1  1
    10.586837 4 25  625 0 1 0 0 1 23
     7.244227 4 21  441 0 0 1 0 1 24
    11.440354 4 35 1225 1 1 0 0 1 19
     9.903487 1 31  961 1 1 0 1 0 21
      11.0021 4 33 1089 1 1 0 0 1 14
     9.740969 4 34 1156 0 0 1 0 1 13
    11.326596 3 34 1156 1 1 0 0 1  1
    10.596635 5 24  576 0 0 0 0 1  9
     8.881836 4 33 1089 0 0 0 0 1 18
    10.596635 5 31  961 1 0 0 0 1 12
    10.341743 5 30  900 0 1 0 0 0  3
    10.553205 6 32 1024 0 0 0 0 1 10
    10.434115 4 27  729 0 1 0 0 1 23
    10.165852 5 30  900 1 0 0 0 1 11
     9.769957 4 21  441 0 1 0 0 1 25
    10.292146 4 34 1156 0 0 1 0 1 19
    10.491274 5 26  676 1 1 0 0 1  3
    10.621327 5 25  625 0 0 0 0 1 18
    11.571195 6 29  841 0 0 0 0 1  9
     10.89859 5 29  841 1 1 0 0 1 12
     10.91509 3 31  961 1 1 0 1 0  1
     9.952278 3 21  441 0 1 0 0 1 26
     7.937375 5 34 1156 1 0 0 0 1 10
    10.714417 4 24  576 0 1 0 0 1 24
    10.714417 5 33 1089 1 0 1 0 1 12
     9.667766 5 27  729 1 0 0 0 1 19
    11.512925 3 32 1024 0 1 1 0 1 24
    10.668956 5 27  729 1 1 0 0 1  4
    10.308952 3 30  900 0 0 1 0 1 13
     9.409191 1 34 1156 0 0 1 0 1 18
    10.308952 6 34 1156 1 0 0 0 1  5
     10.08581 4 27  729 0 1 0 0 1  1
     9.126959 4 34 1156 0 0 1 0 1 13
     10.12663 4 31  961 1 1 0 0 1 16
      7.17012 4 34 1156 1 0 0 0 1 19
     10.08581 4 33 1089 1 1 0 0 1 24
    11.711777 6 28  784 1 0 0 0 1 12
    10.106428 5 24  576 1 1 0 0 1 18
    end
    label values educ educ
    label def educ 1 "No high school", modify
    label def educ 3 "High school graduate", modify
    label def educ 4 "Some college", modify
    label def educ 5 "College graduate", modify
    label def educ 6 "Post graduate", modify
    label values age AGE
    label def AGE 19 "19", modify
    label def AGE 20 "20", modify
    label def AGE 21 "21", modify
    label def AGE 22 "22", modify
    label def AGE 23 "23", modify
    label def AGE 24 "24", modify
    label def AGE 25 "25", modify
    label def AGE 26 "26", modify
    label def AGE 27 "27", modify
    label def AGE 28 "28", modify
    label def AGE 29 "29", modify
    label def AGE 30 "30", modify
    label def AGE 31 "31", modify
    label def AGE 32 "32", modify
    label def AGE 33 "33", modify
    label def AGE 34 "34", modify
    label def AGE 35 "35", modify
    label values occ occ
    label def occ 1 "Management, Business, Science, and Arts", modify
    label def occ 2 "Business Operations Specialists", modify
    label def occ 3 "Financial Specialists", modify
    label def occ 4 "Computer and Mathematical", modify
    label def occ 5 "Architecture and Engineering", modify
    label def occ 8 "Community and Social Services", modify
    label def occ 9 "Legal", modify
    label def occ 10 "Education, Training, and Library", modify
    label def occ 11 "Arts, Design, Entertainment, Sports, and Media", modify
    label def occ 12 "Healthcare Practitioners and Technicians", modify
    label def occ 13 "Healthcare Support", modify
    label def occ 14 "Protective Service", modify
    label def occ 15 "Food Preparation and Serving", modify
    label def occ 16 "Building and Grounds Cleaning and Maintenance", modify
    label def occ 17 "Personal Care and Service", modify
    label def occ 18 "Sales and Related", modify
    label def occ 19 "Office and Administrative Support", modify
    label def occ 21 "Construction", modify
    label def occ 23 "Installation, Maintenance, and Repair", modify
    label def occ 24 "Production", modify
    label def occ 25 "Transportation and Material Moving", modify
    label def occ 26 "Military Specific", modify
    Last edited by Stella Pipping; 23 Aug 2018, 05:33.

  • #2
    there are user-written commands that will do this (e.g., my -predlog-; use -search- or -findit- to locate and download); also, the formula is fairly simple: (1) the smearing factor itself is the mean exponentiated residual from the model; (2) multiply the smearing factor by the exponentiated predicted value

    note, however, that, used simply as above, you are assuming that there is no change in the smearing factor as your "X" changes - this is not necessarily the case; you might be better off using poisson regression; see Bill Gould's blog at
    https://blog.stata.com/2011/08/22/us...tell-a-friend/

    Comment


    • #3
      Stella:
      as an aside to Rich's helpful reply, you may want to consider -glm- with a log link, that makes back-transforming on the raw scale straightforward (see, among others, https://www.herc.ox.ac.uk/downloads/...mic-evaluation, pages 104-106; https://www.stata.com/bookstore/heal...s-using-stata/, Chapter 5).
      Please consider using factor variables notation for creating interactions and squared terms (see -help fvvarlist-):
      Code:
      age age2
      can be more efficiently replaced by (let alone the virtuous relationship with -margins- and -marginsplot- wonderful commands):

      Code:
      c.age##c.age
      Last edited by Carlo Lazzaro; 23 Aug 2018, 06:01.
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment

      Working...
      X