Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I interpret a log level and log log model when my independent variable is already a percentage?

    Hi everyone

    I have a regression in which I would like to estimate the effect of migrant share (from 0% to 100%) on years of schooling. Because of skewness, I transformed migrant share to log. As my explanatory variable is already a percentage, I do not know how to interpret the coefficient. And I would like to know if someone could help me with that.

    Regression: reg yearsofschooling ln(migrantshare)
    The beta (coefficient) that I get is 6.8.

    Finally, how would it change if I use a log-log model? reg ln(yearsofschooling) ln(migrantshare)
    The coefficient, in this case, is 1.38


    Thanks a lot!!!


  • #2
    Well, a skewed distribution is not, by itself, a reason to log-transform a variable. What you should do is examine graphically the relationship between years of schooling and migrant share, and see if it looks highly non-linear. If it's reasonably linear, there is no reason to transform anything. If it's not, then you should graphically explore log or other transforms on either variable or both.

    Nevertheless, just for the sake of explaining the approach to interpreting models where a variable has ben log transformed, let's assume that the log-transform of migrant share was indeed appropriate.

    The coeficient in a linear regression represents the rate of change in the outcome variable per unit change in the predictor. Since your original variable is denominated in percents, a unit change in your variable is 1 percentage point. Now, the change in log migrant_share that corresponds to a 1 percentage point change in migrant_share depends on what the starting value of migrant_share is. So your problem does not have a single answer. You have to pick a baseline value of migrant_share as an example. People often pick the mean value for this. For the sake of illustration, let's say that the mean value of migrant_share is 28%. Then a 1 percentage point increase brings us to 29%. log(0.29) - log(0.28) = log(0.29/0.28) = 0.035 to 3 decimal places. Your coefficient is 6.8, so the corresponding change in years of schooling is 6.8*0.035 = 0.238. You could then summarize this calculation by stating that starting from a mean migrant share of 28%, a 1 percentage point increase in migrant share is associated with a 0.238 year increase in schooling.

    In the log-log model, the change in log years of schooling is 1.38*0.035 = 0.0483. So years of schooling will then increase by a multiplicative factor of exp(0.0483) = 1.049. So this could be summarized as starting from a mean migrant share of 28%, a 1 percentage point increase in migrant share is associated with a 4.9% increase in schooling.

    Again, though, the key issue here is which model, linear, log-lnear, linear-log, or log-log actually best reflects the real world data generating process. You need to look into that and not get distracted by things like skewness.

    Comment


    • #3
      Also, remember that you can get predicted values at specified values of the rhs variables using the margins command. While Clyde's answer is much more elegant, predictive margins are often simpler.

      Comment

      Working...
      X