Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multi-level regression with log-transformed variables

    I'm running regression models for my project on physical activity and social mobility

    I ran the following commands to get the log of my outcome variables of interest (and to get rid of my 0 values):
    gen logtotalpa=log(totalpa+0.01)
    gen logtotalmod=log(totalmod+0.01)
    gen logtotalvig=log(totalvig+0.01)
    gen logtotalwalk=log(totalwalk+0.01)
    gen logsum_vigandmod=log(sum_vigandmod+0.01)

    I then ran the following linear regression commands:
    1. mixed logtotalpa scale_1 scale_2 i.age i.gender i.area i.parenthood || country:
    2. mixed logtotalmod scale_1 scale_2 i.age i.gender i.area i.parenthood || country:
    3. mixed logtotalvig scale_1 scale_2 i.age i.gender i.area i.parenthood || country:
    4. mixed logtotalwalk scale_1 scale_2 i.age i.gender i.area i.parenthood || country:
    5. mixed logsum_vigandmod scale_1 scale_2 i.age i.gender i.area i.parenthood || country:
    My issue is mainly with 2 and 3: the number of observations is way lower than the others so I'm wondering if it's something I've done wrong when creating the log variables, particularly logtotalmod and logtotalvig.

    I've attached tables of frequences for the variables above in case you can figure out something I'm missing.


    Any advice would be greatly appreciated!
    Attached Files

  • #2
    It is probably better to do log (x + 1) rather than log (x + 0.01). I suspect that is the problem. It is even better to use a Poisson or negative binomial model rather than a log transformation. See Cameron and Trivedi (2009) for more details.

    Cameron, A. C., & Trivedi, P. K. (2009). Microeconometrics using Stata (Vol. 5, p. 706). College Station, TX: Stata press.

    Comment


    • #3
      Thanks for replying Chris! I tried to do log(x+1) and the poisson model but the models still had the same amount of observations.
      Attached Files

      Comment


      • #4
        Try looking at the non-transformed dependent variables. If you do an OLS regression with the non-transformed dependent variables, do you also lose observations? I suspect it is due to the dependent variable.

        Comment


        • #5
          It seems pretty clear that you have missing data on some of your dependent variables and probably explanatory variables. No transformation is going to change that. Please use the command delimiters and show us what happens when you type

          Code:
          sum totalpa totalmod totalvig totalwalk
          Actually, I just looked at your output. For the latter two variables you only have 20,300 non-missing values. For age, there are only 19,645 non-missing observations. And so on. Stata only uses complete cases in any estimation method as the default.

          Comment

          Working...
          X