Margins from a mixed model with a log-transformed outcome

Jan Hultgren

Join Date: Jun 2014

Posts: 12
#1

Margins from a mixed model with a log-transformed outcome

21 Apr 2021, 22:47

I run a mixed model of a log-transformed outcome using ‘mixed’ (and the dfmethod(kroger) option with REML estimation because the sample size is only 36). I want to calculate predictive margins with confidence intervals. The ‘margins’ command produces estimates on the log scale so I cannot just back-transform them as E(Y|X) = EXP(XB). I figure I should instead estimate them as E(Y|X) = EXP(XB)*EXP(σ^2/2), where σ^2 is the variance of the errors.

How can I obtain σ^2 and incorporate it in a margins command? It will probably be something like this:

Code:

margins, expression(exp(predict(xb))*(exp(???)))

This is my model command, simplified:

Code:

mixed lnyvar i.xvar1 i.xvar2 || id:, reml dfmethod(kroger)

Grateful for your help!
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10190

22 Apr 2021, 02:25

On retrieving the var(Residual) estimate after mixed:

Code:

webuse pig
mixed weight week || _all: R.id || _all: R.week, reml dfmethod(kroger)
di exp(`=e(b)[1, "lnsig_e:_cons"]')^2

Res.:

Code:

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
_all: Identity               |
                   var(R.id) |   15.15273   3.224447       9.98523    22.99448
-----------------------------+------------------------------------------------
_all: Identity               |
                 var(R.week) |   .1113996   .1075948      .0167785    .7396272
-----------------------------+------------------------------------------------
               var(Residual) |   4.296981   .3133894       3.72463    4.957282
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 476.10                Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

.
. di exp(`=e(b)[1, "lnsig_e:_cons"]')^2
4.2969807

Comment

Jan Hultgren

Join Date: Jun 2014

Posts: 12
#3

22 Apr 2021, 05:03

Thanks Andrew! I doubt if I can just insert the value in a margins command. I think it would include it as though it were a known value measured without error. I'm pretty sure it needs to account for the standard error of the variance, but I don't know how to achieve that.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10190

22 Apr 2021, 06:54

I do not really follow what you are doing, so the following may not be relevant. If you run

Code:

margins, expression(exp(predict(xb))

are you not just averaging the predictions over observations? For example:

Code:

webuse pig, clear
mixed weight week || _all: R.id || _all: R.week, reml dfmethod(kroger)
margins if id==2, expression(exp(predict(xb))) 
margins if id==2, expression(exp(predict(xb))) by(week) post
local id2obs 1.week
forval i=2/9{
    local id2obs "`id2obs' + `i'.week"
}
lincom (`id2obs')/9

Res.:

Code:

. margins if id==2, expression(exp(predict(xb))) 

Predictive margins                              Number of obs     =          9

Expression   : exp(predict(xb))

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   5.31e+31   3.32e+31     1.60   0.110    -1.20e+31    1.18e+32
------------------------------------------------------------------------------

. 
. margins if id==2, expression(exp(predict(xb))) by(week) post

Predictive margins                              Number of obs     =          9

Expression   : exp(predict(xb))
over         : week

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        week |
          1  |   1.27e+11   7.93e+10     1.60   0.110    -2.87e+10    2.82e+11
          2  |   6.31e+13   3.83e+13     1.65   0.099    -1.19e+13    1.38e+14
          3  |   3.14e+16   1.86e+16     1.69   0.092    -5.08e+15    6.79e+16
          4  |   1.56e+19   9.13e+18     1.71   0.087    -2.27e+18    3.35e+19
          5  |   7.77e+21   4.52e+21     1.72   0.085    -1.08e+21    1.66e+22
          6  |   3.87e+24   2.26e+24     1.71   0.087    -5.61e+23    8.30e+24
          7  |   1.93e+27   1.14e+27     1.69   0.092    -3.12e+26    4.16e+27
          8  |   9.58e+29   5.81e+29     1.65   0.099    -1.81e+29    2.10e+30
          9  |   4.77e+32   2.98e+32     1.60   0.110    -1.08e+32    1.06e+33
------------------------------------------------------------------------------

. 
. local id2obs 1.week

. 
. forval i=2/9{
  2. 
.     local id2obs "`id2obs' + `i'.week"
  3. 
. }

. 
. lincom (`id2obs')/9

 ( 1)  .1111111*1bn.week + .1111111*2.week + .1111111*3.week + .1111111*4.week + .1111111*5.week +
       .1111111*6.week + .1111111*7.week + .1111111*8.week + .1111111*9.week = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   5.31e+31   3.32e+31     1.60   0.110    -1.20e+31    1.18e+32
------------------------------------------------------------------------------

Comment

Jan Hultgren

Join Date: Jun 2014

Posts: 12
#5

22 Apr 2021, 07:42

Well, I guess the margins themselves should be possible to calculate correctly using the point estimate of the error variance, as you suggested in #2. They are group averages of the observations. But I think correct confidence intervals need to take the standard error of the point estimate of the variance into account as well. The error variance presented in the model output is also an estimate, with a random error described by its standard error. So I need the margins command to also take that error into account.

Can someone else confirm my reasoning and suggest a solution?
Comment
Jan Hultgren

Join Date: Jun 2014

Posts: 12
#6

24 Apr 2021, 01:47

In StataNews 34-2, Chuck Huber made this argument (In the spotlight: Interpreting models for log-transformed outcomes). He suggested the following margins command after running gsem (lnwage is the outcome.

Code:

margins, expression(exp(predict(eta))*(exp((_b[/var(e.lnwage)])/2)))

Huber:

gsem also estimated the standard error of that variance, and margins will incorporate that standard error into its calculations

My problem is that I run mixed instead of gsem and the code provided by Huber therefore doesn't work. I'm looking for the appropriate code to make margins integrate the standard error of the estimated error variance. I assume that the point estimates are correct without this correction, while the confidence intervals are incorrect .

Grateful for any help.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4406
#7

24 Apr 2021, 04:15

The residual variance's point estimate after -mixed- is obtained from the coefficient vector.

Code:

exp(_b[lnsig_e:_cons])^2

I can't vouch for its correctness for your intended use, but the analogue of Chuck Huber's -margins- expression would be (after fitting the model with -mixed-)

Code:

tempname var scalar define `var' = exp(_b[lnsig_e:_cons])^2 margins <whatever>, expression( exp(predict(xb)) * ( exp( (`var')/2) ) )

Have you examined the residuals

Code:

predict double residuals, residuals pnorm residuals qnorm residuals

after fitting the -mixed- model to the untransformed outcome variable values? And they're not acceptably normal-like?

One hitch that I would think complicates the analysis of transformed hierarchical outcome data is that usually you assume that it's only the residuals that are skew, and the random effect of the cluster is still normally distributed. That is, the data-generating process is something like

Code:

observed value = <fixed effects> + random effect (assumed normal) + exp(normally distributed error)

and so logarithmically transforming the observed values makes a mess of the random effect's estimates. If the quantile plots of the predicted residuals are unacceptable, then have you considered just leaving things in the log-transformed state, and interpreting, reporting and discussing from there?
Comment
Jan Hultgren

Join Date: Jun 2014

Posts: 12
#8

24 Apr 2021, 13:41

Thanks a lot, Joseph!

The code you provided does the trick and I do get reasonable confidence intervals. But even more interesting is the rest of you message. When testing the same model with the outcome variable untransformed, I get roughly Normal first- and second-level residuals - at least as close to Normal as with the transformation. The margins actually look much the same, although there are some important differences. Nevertheless, this convinces me to construct a model with the outcome variable untransformed. Only that I will do it once again from start, accounting for possible confounding by extraneous factors. Great, this really helped me!
Comment

Announcement

Margins from a mixed model with a log-transformed outcome

Comment

Comment

Comment

Comment

Comment

Comment

Comment