Help with pseudo R^2 of random effect after running xtmixed

Julien Dagenais

Join Date: Sep 2016

Posts: 34
#1

Help with pseudo R^2 of random effect after running xtmixed

31 Mar 2017, 08:41

Dear Statalist,
I’m running a mixed effects linear regression, and I’m stuck with some post-estimation calculations. The idea of the study is to assess whether a given surgeon (random effect) influences surgical outcomes after adjusting for (and measuring) the influence of fixed patient and surgeon factors. I’ve done some testing and I’ve ensured that mixed models are appropriate here. I’m mostly interested in % variability. I’m using a pseudo R^2 method to calculate the % variability contributed by the fixed effects (by inputing the chi2 output from testparm). This method has previously been accepted in our field. However, I’m stuck trying to assess the % variability contributed by the random effect of Surgeon_ID. Here is the model:

mixed EBL_OR i.agecat male race c.Preop_GFR i.bmicat i.cci_cat2 c.Tm1Sz1 i.renal || Surgeon_No:, var mle

One method I found is to use the variance partition coefficient (VPC), or equivalently the intraclass correlation coefficient (ICC) to assess the % variability contributed by “between surgeon” factors. However, the % variability contributed by the fixed effects (using pseudo r^2) and the random effects (using VPC) don’t add up to 100%. f the VPC is 20% for example, then it follows that the "within group" variability is 80%. But the pseudo r^2 estimate I get of all fixed factors is not 80%.

Maybe this has to do with using two different methods to calculate variability? In that case, is there a method for calculating the pseudo r^2 from the random effect? I've scoured the web and can't find much unfortunately. Thanks for any help!

Regards,
Julien
Tags: None
Julien Dagenais

Join Date: Sep 2016

Posts: 34
#2

03 Apr 2017, 07:49

Dear all,
Any suggestions? To try and simplify the question, I'm looking for a way to measure the pseudo r^2 for the random effects component in a mixed effects linear regression with random intercept. I have a method for the pseudo r^2 for the fixed components that relies on the chi-sq post-estimation output, but I haven't figured out how to use the var-covar output of the random effects to achieve a solution that is similar to a pseudo r^2.

An alternative that may work would be if there's a way to go ahead with the ICC calculation of the "between groups" effects and use the ICC estimate of the within-group effects to estimate each of the individual fixed effect's contribution to % variability.

Thanks again
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

03 Apr 2017, 08:27

I'm not quite sure why you expect the sum of the fixed and random effects variability to add up to 100%. You still have the unexplained variability not accounted for by the model.

But a better presentation of your problem would perhaps elicit more responses. Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

Section 12.1 is particularly pertinent

12.1 What to say about your commands and your problem

Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!
...
Never say just that something "doesn't work" or "didn't work", but explain precisely in what sense you didn't get what you wanted.

Show us your command, the output of the command, and explain the calculations you are doing in terms of that output. Less abstraction, more concrete. The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
Comment
Julien Dagenais

Join Date: Sep 2016

Posts: 34
#4

03 Apr 2017, 10:27

Thanks William, and I apologize. I'll try to clarify.

I'm not quite sure why you expect the sum of the fixed and random effects variability to add up to 100%. You still have the unexplained variability not accounted for by the model.

I'm actually getting fixed effects plus random effects of over 100%. Here's how I get the fixed effects. It's by estimating the pseudo r^2

Code:

mixed EBL_OR i.agecat male race c.Preop_GFR i.bmicat i.cci_cat2 c.Tm1Sz1 i.renal || Surgeon_No:, var mle testparm i.agecat male race c.Preop_GFR i.bmicat i.cci_cat2 c.Tm1Sz1 i.renal mat N_g = e(N_g) gen ng = N_g[1,1] display (r(chi2)/ng)/(1+(r(chi2)/ng)) .86388682

The random effects is just

Code:

estat icc .1742844

I found it odd that the % variability from fixed effects (86.4%) and random effects (17.4%) would add up to >100%. But than again, I'm using two different techniques to get at the % variability. I didn't know if there was a pseudo-r^2 method for getting the random effect; otherwise if I could use the post-estimation command "estat icc" to obtain the within-group variability (82.6% in this case), and then use that to get the individual fixed effect of each covariate in the model.

Much appreciated
Comment

Julien Dagenais

Join Date: Sep 2016
Posts: 34

06 Apr 2017, 10:52

To revive a dead horse and generate some discussion:

I've worked out another way of getting at variance decomposition using two different methods. However, I get very different results. Any thoughts? For the sake of simplicity, let's start with the fixed effects.

Code:

// McFaddens
* Constant only model
xtmixed EBL_OR || Surgeon_No:, var 
scalar llr = e(ll)

* Simple regression model
regress EBL_OR i.agecat male race c.Preop_GFR i.bmicat i.cci_cat2 c.Tm1Sz1 i.renal 
predict allrisk, xb
scalar llf = e(ll)

* Regular mixed model
xtmixed EBL_OR allrisk || Surgeon_No:, var 
scalar llu = e(ll)

* Computation of the pseudo r2 from the addition of fixed effects
scalar pr2 = 1 - llu/llr
di "Pseudo-R2:fixed " pr2
Pseudo-R2:fixed .14296773

So per this method, 14.3% of the variability of the outcome EBL_OR is explained by the addition of the fixed covariates

Code:

// Other
* Constant only model
xtmixed EBL_OR || Surgeon_No:
matrix b = e(b)
local id_var_col = colnumb(b, "lns1_1_1:_cons")
scalar id_variance = exp(b[1, `id_var_col'])^2

local res_var_col = colnumb(b, "lnsig_e:_cons")
scalar res_variance = exp(b[1, `res_var_col'])^2

scalar var_r = id_variance + res_variance


* Regular mixed model
xtmixed EBL_OR allrisk || Surgeon_No:
matrix b = e(b)
local id_var_col = colnumb(b, "lns1_1_1:_cons")
scalar id_variance = exp(b[1, `id_var_col'])^2

local res_var_col = colnumb(b, "lnsig_e:_cons")
scalar res_variance = exp(b[1, `res_var_col'])^2

scalar var_u = id_variance + res_variance


* Computation of the pseudo r2 from the addition of fixed effects
di "pR2 " 1 - var_u/var_r
pR2 .04467612

This gives me 4.5% of variability being explained by fixed effects. Am I missing something here?

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10285
#6

07 Apr 2017, 02:06

I think your confusion arises in how you interpret the output from different approaches.

So per this method, 14.3% of the variability of the outcome EBL_OR is explained by the addition of the fixed covariates

1. McFadden's pseudo R2 statistic (intended for binary dependent variables) cannot be interpreted in the same way as you interpret R2 in the linear model. Instead, it gives you the improvement in the log-likelihood that results from including additional predictors in the model. Because you can estimate OLS using maximum likelihood, it is easy to verify that R2 and pseudo R2 are not the same. Here, I estimate a model with regress (least squares) and glm (maximum likelihood), compute McFadden's pseudo R2 for the latter and show that it is not the same as R2 (this is nonsensical, intended only for illustration purposes)

Code:

. clear . sysuse auto (1978 Automobile Data) . quietly regress price mpg weight . scalar R2 = e(r2) . quietly glm price, family(gaussian) link(identity) . scalar llr = e(ll) . quietly glm price mpg weight, family(gaussian) link(identity) . scalar llu = e(ll) . scalar pseudo_R2= 1 - llu/llr . di R2 .29338912 . di pseudo_R2 .01846909

2. Usually, if a Stata command does not give you an R2 statistic, there is a good reason behind it. You can read here about the difficulty of the concept of R2 in a mixed model.

https://www.r-bloggers.com/mixed-model-r2-updated/
1 like
Comment
Julien Dagenais

Join Date: Sep 2016

Posts: 34
#7

07 Apr 2017, 08:43

Appreciate your response Andrew.

Indeed I've come to realize that obtaining an r^2 statistic for mixed models is fraught with peril. However, it seems there are some workarounds that are accepted in our field. One is to calculate the % decrease in variance from the constant only model with the addition of fixed covariates. I've adapted that technique in the 3rd approach (code had some errors, updated below). It is based off the this article by Sheetz et al: https://www.ncbi.nlm.nih.gov/pubmed/25272279

Does anyone see a problem with what I've done below?

Code:

// Other * Constant only model xtmixed EBL_OR || Surgeon_No: matrix b = e(b) local id_var_col_r = colnumb(b, "lns1_1_1:_cons”) // Calling Surgeon_No variance scalar id_variance_r = exp(b[1, `id_var_col_r'])^2 // Calculating Surgeon_No variance local res_var_col_r = colnumb(b, "lnsig_e:_cons”) // Calling residual variance scalar res_variance_r = exp(b[1, `res_var_col_r'])^2 // Calculating residual variance scalar var_r = id_variance_r + res_variance_r // Calculating total variance * Regular mixed model xtmixed EBL_OR i.agecat male race c.Preop_GFR i.bmicat i.cci_cat2 c.Tm1Sz1 i.renal || Surgeon_No: matrix b = e(b) local id_var_col_u = colnumb(b, "lns1_1_1:_cons”) // Calling Surgeon_No variance scalar id_variance_u = exp(b[1, `id_var_col_u'])^2 // Calculating Surgeon_No variance local res_var_col_u = colnumb(b, "lnsig_e:_cons”) // Calling residual variance scalar res_variance_u = exp(b[1, `res_var_col_u'])^2 // Calculating residual variance scalar var_u = id_variance_u + res_variance_u // Calculating total variance * Computation of the change in variance from the addition of fixed effects di "pR2 " 1 - id_variance_u/id_variance_r pR2 .0759

The same group does an equivalent analysis of a different question here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4777662/
But now in Figure 1, they're estimating how much of the between-hospital variation is due to fixed ("Patient characteristics, procedural volume, and hospital characteristics") and random effects ("The remaining variation"). And they get answers summing to 100%.

Is it safe to say that in the example above that 7.6% of the between-surgeon variance is attributable to fixed effects and that another 92.4% is from random effects?

Thanks for help for this difficult topic.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10285
#8

04 May 2020, 18:22

In #6, read "Because you can estimate OLS using maximum likelihood" as "Because you can estimate the linear model using both OLS and maximum likelihood".
Comment

Announcement