inflated coefficients in logistic multilevel models?

Katharina Rogen

Join Date: Feb 2017

Posts: 16
#1

inflated coefficients in logistic multilevel models?

07 Feb 2017, 09:16

Hello all,

I am estimating logistic multilevel models (patients in hospitals) using the meqrlogit command. I am mainly interested in how skincolor affects the probability of recieveing a certain treatment.

So generally my basic syntax looks like this:

meqrlogit treatment skincolor controlvarlevel1 controlvarlevel2 || hospitals:

I have no problem estimating my models, however the coefficient for the skincolor looks a bit odd and differs considerably from the effect thatl I get when estimating the same model with clustered standard errors (logit treatment skincolor controlvarlevel1 controlvarlevel2, vce(cluster hospitals). In the multilevel model, the effect of skincolor is more than twice as big as the effect in the model with clustered standard errors (example output at the bottom of the post). Moreover I have estimated a linear probability model (mixed treatment skincolor controlvarlevel1 controlvarlevel2 || hospitals: ) and the effect of skincolor here is comparable to the (marginal) effect in the model with clustered standard errors.

So I assumed that the coefficient in the logistic multilevel model is a bit off. My question now is wether this "inflated" coefficient might be due to to the rescaling of the variance on the lowest level in logistic models (to 3.29). If the differences between patients would be whats most relevant for recieving the treatment, the variance on the lowest level should be comparatively large; if this variance then gets rescaled to 3.29, could this lead to larger coefficients instead? And if (not) so, is there a (good) way to deal with this?

Thank you,

Katharina

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

example logistic multilevel

meqrlogit treatment i.skincolor || hospital:

Refining starting values:

Iteration 0: log likelihood = -875.10667
Iteration 1: log likelihood = -753.56783

Performing gradient-based optimization:

Iteration 0: log likelihood = -748.07931
Iteration 1: log likelihood = -746.49785

Mixed-effects logistic regression Number of obs = 8005
Group variable: hospital Number of groups = 721

Integration points = 7 Wald chi2(2) = 20.24
Log likelihood = -746.48846 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
treatment | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
skincolor |
medium skin | -.025434 .2384883 -0.11 0.915 -.4928625 .4419945
darker skin | -.9323545 .2362081 -3.95 0.000 -1.395314 -.4693951

_cons | 3.527563 .2945631 11.98 0.000 2.95023 4.104896
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
hospital: Identity |
var(_cons) | 14.11532 1.947717 10.77052 18.49885
------------------------------------------------------------------------------
LR test vs. logistic regression: chibar2(01) = 456.45 Prob>=chibar2 = 0.0000

example clustered model

logit treatment i.skincolor, vce(cluster hospital)

Iteration 0: log pseudolikelihood = -980.03726
Iteration 1: log pseudolikelihood = -974.73265

Logistic regression Number of obs = 8005
Log pseudolikelihood = -974.71422 Pseudo R2 = 0.0054

------------------------------------------------------------------------------
| Robust
treatment | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
skincolor |
medium skin | -.0200408 .103726 -0.19 0.847 -.22334 .1832585
darker skin | -.3890346 .089785 -4.33 0.000 -.5650099 -.2130592
|
_cons | 1.386294 .1024637 13.53 0.000 1.185469 1.58712
------------------------------------------------------------------------------

margins, dydx(*)

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
skincolor |
medium skin | -.0032258 .0166943 -0.19 0.847 -.0359461 .0294944
darker skin | -.0694805 .0158784 -4.38 0.000 -.1006016 -.0383595
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

example linear probability model

mixed treatment i.skincolor || hospital:

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log likelihood = -653.34893
Iteration 1: log likelihood = -653.34893

Computing standard errors:

Mixed-effects ML regression Number of obs = 8005
Group variable: hospital Number of groups = 721

Log likelihood = -653.34893 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
treatment | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
skincolor |
medium skin | -.0017033 .0148745 -0.11 0.909 -.0308568 .0274502
darker skin | -.0619637 .014662 -4.23 0.000 -.0907006 -.0332267
|
_cons | .79759 .0163144 48.89 0.000 .7656144 .8295657
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
hospital: Identity |
var(_cons) | .1120834 .0073523 .098561 .1274611
-----------------------------+------------------------------------------------
var(Residual) | .0611654 .0025791 .0563136 .0664351
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 687.67 Prob >= chibar2 = 0.00
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30093
#2

07 Feb 2017, 11:18

I disagree with your overall interpretation. I don't think the multi-level coefficient is inflated. The single-level model coefficient has unresolved confounding bias. The use of a clustered variance estimator can reduce the bias in the standard errors the results from within-hospital correlation, but it does not adjust for differences in level of treatment frequency among the hospitals. To do that you need the second level of the model (or something equivalent to it.)

As for interpreting the coefficients of a logistic model, remember that the only units they have are the inverses of the units of the predictor variables they correspond to. Their interpretation is as the logarithms of the odds ratios. For that to work, the rescaling of the lowest level variance to that of the standard logistic distribution is crucial. It is not a problem to deal with, it is the solution to the problem. Yes, if you used probit instead of logistic modeling you would get different coefficients due to the scaling of the lowest level variance to 1 in a probit model--but the probit coefficients do not give odds ratios when exponentiated--they give normal deviations. So that, too, is not a problem to be solved; it is the solution to making probit regression work. More to your specific situation, the linear probability model is a different model altogether from logit or probit. Its coefficients have a still different interpretation and for them to have that interpretation, the lowest level variance must be estimated from the residuals in the model. Again, that is handled automatically for you by Stata. You should have no expectation that the coefficients from the linear probability model and a logistic model would resemble each other. They will, if they are not very close to zero, in general have the same fine, and the z/t statistics should be fairly close (which they all are in what you show). But that's where the similarity ends.

But as between the multilevel and the single level logistic model, the multilevel is the clear winner here.
Comment
Katharina Rogen

Join Date: Feb 2017

Posts: 16
#3

08 Feb 2017, 10:07

Dear Mr. Schechter,

thank you so much! That has actually been really helpful and did clear things up! So I will "trust" my multilevel model

For my final presentation of results I´ve decided to show the coefficients (resp. odds ratios), as well as the predicted probabilities for skincolor (in a different table), which I will calculate after the model estimation (-predict-), to put the large coefficients a bit in perspective ( I think)

Thank you,

Katharina
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#4

09 Feb 2017, 01:54

Originally posted by Katharina Rogen View Post

the predicted probabilities for skincolor (in a different table), which I will calculate after the model estimation (-predict-)

The command margins is explicitly designed for creating such tables.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Announcement

inflated coefficients in logistic multilevel models?

Comment

Comment

Comment