Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • inflated coefficients in logistic multilevel models?

    Hello all,

    I am estimating logistic multilevel models (patients in hospitals) using the meqrlogit command. I am mainly interested in how skincolor affects the probability of recieveing a certain treatment.

    So generally my basic syntax looks like this:

    meqrlogit treatment skincolor controlvarlevel1 controlvarlevel2 || hospitals:

    I have no problem estimating my models, however the coefficient for the skincolor looks a bit odd and differs considerably from the effect thatl I get when estimating the same model with clustered standard errors (logit treatment skincolor controlvarlevel1 controlvarlevel2, vce(cluster hospitals). In the multilevel model, the effect of skincolor is more than twice as big as the effect in the model with clustered standard errors (example output at the bottom of the post). Moreover I have estimated a linear probability model (mixed treatment skincolor controlvarlevel1 controlvarlevel2 || hospitals: ) and the effect of skincolor here is comparable to the (marginal) effect in the model with clustered standard errors.

    So I assumed that the coefficient in the logistic multilevel model is a bit off. My question now is wether this "inflated" coefficient might be due to to the rescaling of the variance on the lowest level in logistic models (to 3.29). If the differences between patients would be whats most relevant for recieving the treatment, the variance on the lowest level should be comparatively large; if this variance then gets rescaled to 3.29, could this lead to larger coefficients instead? And if (not) so, is there a (good) way to deal with this?

    Thank you,

    Katharina

    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    example logistic multilevel

    meqrlogit treatment i.skincolor || hospital:

    Refining starting values:

    Iteration 0: log likelihood = -875.10667
    Iteration 1: log likelihood = -753.56783

    Performing gradient-based optimization:

    Iteration 0: log likelihood = -748.07931
    Iteration 1: log likelihood = -746.49785

    Mixed-effects logistic regression Number of obs = 8005
    Group variable: hospital Number of groups = 721

    Integration points = 7 Wald chi2(2) = 20.24
    Log likelihood = -746.48846 Prob > chi2 = 0.0000

    ------------------------------------------------------------------------------
    treatment | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    skincolor |
    medium skin | -.025434 .2384883 -0.11 0.915 -.4928625 .4419945
    darker skin | -.9323545 .2362081 -3.95 0.000 -1.395314 -.4693951

    _cons | 3.527563 .2945631 11.98 0.000 2.95023 4.104896
    ------------------------------------------------------------------------------

    ------------------------------------------------------------------------------
    Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
    -----------------------------+------------------------------------------------
    hospital: Identity |
    var(_cons) | 14.11532 1.947717 10.77052 18.49885
    ------------------------------------------------------------------------------
    LR test vs. logistic regression: chibar2(01) = 456.45 Prob>=chibar2 = 0.0000


    example clustered model

    logit treatment i.skincolor, vce(cluster hospital)

    Iteration 0: log pseudolikelihood = -980.03726
    Iteration 1: log pseudolikelihood = -974.73265

    Logistic regression Number of obs = 8005
    Log pseudolikelihood = -974.71422 Pseudo R2 = 0.0054


    ------------------------------------------------------------------------------
    | Robust
    treatment | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    skincolor |
    medium skin | -.0200408 .103726 -0.19 0.847 -.22334 .1832585
    darker skin | -.3890346 .089785 -4.33 0.000 -.5650099 -.2130592
    |
    _cons | 1.386294 .1024637 13.53 0.000 1.185469 1.58712
    ------------------------------------------------------------------------------

    margins, dydx(*)

    ------------------------------------------------------------------------------
    | Delta-method
    | dy/dx Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    skincolor |
    medium skin | -.0032258 .0166943 -0.19 0.847 -.0359461 .0294944
    darker skin | -.0694805 .0158784 -4.38 0.000 -.1006016 -.0383595
    ------------------------------------------------------------------------------
    Note: dy/dx for factor levels is the discrete change from the base level.


    example linear probability model

    mixed treatment i.skincolor || hospital:

    Performing EM optimization:

    Performing gradient-based optimization:

    Iteration 0: log likelihood = -653.34893
    Iteration 1: log likelihood = -653.34893

    Computing standard errors:

    Mixed-effects ML regression Number of obs = 8005
    Group variable: hospital Number of groups = 721

    Log likelihood = -653.34893 Prob > chi2 = 0.0000

    ------------------------------------------------------------------------------
    treatment | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    skincolor |
    medium skin | -.0017033 .0148745 -0.11 0.909 -.0308568 .0274502
    darker skin | -.0619637 .014662 -4.23 0.000 -.0907006 -.0332267
    |
    _cons | .79759 .0163144 48.89 0.000 .7656144 .8295657
    ------------------------------------------------------------------------------

    ------------------------------------------------------------------------------
    Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
    -----------------------------+------------------------------------------------
    hospital: Identity |
    var(_cons) | .1120834 .0073523 .098561 .1274611
    -----------------------------+------------------------------------------------
    var(Residual) | .0611654 .0025791 .0563136 .0664351
    ------------------------------------------------------------------------------
    LR test vs. linear regression: chibar2(01) = 687.67 Prob >= chibar2 = 0.00

  • #2
    I disagree with your overall interpretation. I don't think the multi-level coefficient is inflated. The single-level model coefficient has unresolved confounding bias. The use of a clustered variance estimator can reduce the bias in the standard errors the results from within-hospital correlation, but it does not adjust for differences in level of treatment frequency among the hospitals. To do that you need the second level of the model (or something equivalent to it.)

    As for interpreting the coefficients of a logistic model, remember that the only units they have are the inverses of the units of the predictor variables they correspond to. Their interpretation is as the logarithms of the odds ratios. For that to work, the rescaling of the lowest level variance to that of the standard logistic distribution is crucial. It is not a problem to deal with, it is the solution to the problem. Yes, if you used probit instead of logistic modeling you would get different coefficients due to the scaling of the lowest level variance to 1 in a probit model--but the probit coefficients do not give odds ratios when exponentiated--they give normal deviations. So that, too, is not a problem to be solved; it is the solution to making probit regression work. More to your specific situation, the linear probability model is a different model altogether from logit or probit. Its coefficients have a still different interpretation and for them to have that interpretation, the lowest level variance must be estimated from the residuals in the model. Again, that is handled automatically for you by Stata. You should have no expectation that the coefficients from the linear probability model and a logistic model would resemble each other. They will, if they are not very close to zero, in general have the same fine, and the z/t statistics should be fairly close (which they all are in what you show). But that's where the similarity ends.

    But as between the multilevel and the single level logistic model, the multilevel is the clear winner here.

    Comment


    • #3
      Dear Mr. Schechter,

      thank you so much! That has actually been really helpful and did clear things up! So I will "trust" my multilevel model

      For my final presentation of results I´ve decided to show the coefficients (resp. odds ratios), as well as the predicted probabilities for skincolor (in a different table), which I will calculate after the model estimation (-predict-), to put the large coefficients a bit in perspective ( I think)

      Thank you,

      Katharina

      Comment


      • #4
        Originally posted by Katharina Rogen View Post
        the predicted probabilities for skincolor (in a different table), which I will calculate after the model estimation (-predict-)
        The command margins is explicitly designed for creating such tables.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment

        Working...
        X