Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stratification vs. Interaction in Multilevel regression

    Hello everyone!

    I am conducting a multilevel analysis using melogit in STATA 17. I have nested data and I want to conduct a multilevel regression.

    In my dataset my outcome is Obesity (variable = BMI_catOb), my exposure is the density of unhealthy retailers around schools that was categorized into tertiles (variable= ter_DE_Unhealthy) and I want to test for an interaction between ter_DE_Unhealthy and type of school (public=1 private=2) (variable =type_school)

    1. I added an interaction term and I got the below results.

    Code:
    • melogit BMI_catOb i.ter_DE_Unhealthy##i.type_school || sclid:, or
    Mixed-effects logistic regression Number of obs = 2,465 Group variable: sclid Number of groups = 50 Obs per group: min = 20 avg = 49.3 max = 52 Integration method: mvaghermite Integration pts. = 7 Wald chi2(5) = 17.97 Log likelihood = -1127.8462 Prob > chi2 = 0.0030 ---------------------------------------------------------------------------------------------- BMI_catOb | Odds ratio Std. err. z P>|z| [95% conf. interval] -----------------------------+---------------------------------------------------------------- ter_DE_Unhealthy | 2 | 1.739632 .3244362 2.97 0.003 1.20701 2.507287 3 | 1.961274 .377236 3.50 0.000 1.345291 2.859304 | type_school | private | 1.946864 .423782 3.06 0.002 1.270723 2.982773 | ter_DE_Unhealthy#type_school | 2#private | .477391 .1624196 -2.17 0.030 .2450617 .9299784 3#private | .6114784 .1829374 -1.64 0.100 .3401929 1.0991 | _cons | .1227784 .0187516 -13.73 0.000 .0910166 .1656242 -----------------------------+---------------------------------------------------------------- sclid | var(_cons)| .0255674 .0343918 .001831 .3570048 ----------------------------------------------------------------------------------------------
    Based on the above table I can deduce that for public school (type_school=1) the Odds ratios for tertile#2 of ter_DE_Unehalthy is equal to 1.739 and OR for tertile#3 is equal to 1.961. would that be a correct?

    2. Then I tried to stratify by school. I am presenting below the results for public school (type_school=1)

    Code:
    • melogit BMI_catOb i.ter_DE_Unhealthy if type_school==1 || sclid:, or
    Mixed-effects logistic regression Number of obs = 1,751 Group variable: sclid Number of groups = 35 Obs per group: min = 49 avg = 50.0 max = 52 Integration method: mvaghermite Integration pts. = 7 Wald chi2(2) = 12.12 Log likelihood = -772.43028 Prob > chi2 = 0.0023 ---------------------------------------------------------------------------------- BMI_catOb | Odds ratio Std. err. z P>|z| [95% conf. interval] -----------------+---------------------------------------------------------------- ter_DE_Unhealthy | 2 | 1.745646 .340815 2.85 0.004 1.190611 2.559425 3 | 1.967592 .3969151 3.36 0.001 1.325029 2.921762 | _cons | .1217391 .0194386 -13.19 0.000 .0890256 .1664735 -----------------+---------------------------------------------------------------- sclid | var(_cons)| .0439233 .047755 .0052148 .369957 ----------------------------------------------------------------------------------
    Based on the above table the ORs are 1.745 and 1.967 for tertile#2 and tertile#3, respectively. I am surprised that the ORs are not the same. What could be the reason? is it related to the melogit command?

    Because if I use a logistic regression, I get the same ORs in the interaction model and the stratified model (cf. below). OR for tertile#2 of ter_DE_Unhealthy is 1.733 and for tertile#3 it is 1.954 and we have exactly the same coefficients when I stratified by type of school.

    1-Interaction:
    Code:
    . logistic BMI_catOb i.ter_DE_Unhealthy##type_school
    
    Logistic regression                                     Number of obs =  2,465
                                                            LR chi2(5)    =  22.41
                                                            Prob > chi2   = 0.0004
    Log likelihood = -1128.1909                             Pseudo R2     = 0.0098
    
    ----------------------------------------------------------------------------------------------
                       BMI_catOb | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
    -----------------------------+----------------------------------------------------------------
                ter_DE_Unhealthy |
                              2  |   1.733965   .3015293     3.17   0.002     1.233161    2.438151
                              3  |   1.954811   .3493899     3.75   0.000     1.377104    2.774871
                                 |
                     type_school |
                        private  |   1.942005   .3921945     3.29   0.001     1.307216     2.88505
                                 |
    ter_DE_Unhealthy#type_school |
                      2#private  |   .4783882   .1502061    -2.35   0.019     .2585336    .8852051
                      3#private  |   .6124526    .168066    -1.79   0.074     .3576764    1.048708
                                 |
                           _cons |   .1241535   .0177497   -14.59   0.000     .0938137    .1643053
    ----------------------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.
    2-Stratificaion:
    Code:
    . logistic BMI_catOb i.ter_DE_Unhealthy if type_school==1
    
    Logistic regression                                     Number of obs =  1,751
                                                            LR chi2(2)    =  16.16
                                                            Prob > chi2   = 0.0003
    Log likelihood = -773.02716                             Pseudo R2     = 0.0103
    
    ----------------------------------------------------------------------------------
           BMI_catOb | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
    -----------------+----------------------------------------------------------------
    ter_DE_Unhealthy |
                  2  |   1.733965   .3015293     3.17   0.002     1.233161    2.438152
                  3  |   1.954811   .3493899     3.75   0.000     1.377104    2.774872
                     |
               _cons |   .1241535   .0177497   -14.59   0.000     .0938137    .1643053
    ----------------------------------------------------------------------------------
    Many thanks !!

  • #2
    Notice the values aren't exactly the same, but they are very close. I am tempted to attribute the difference to rounding error after floating-point arithmetic.

    Comment


    • #3
      For fixed effect models, you’re basically looking at the same approach in two different estimation methods. You can stratify by estimating separate models for each level of your stratification factor. Or, you can estimate a single model with effect covariate interacted with your stratification factor variable. Then, the equation implied for each level of the stratification variable will reproduce the same model as if you had estimated them separately. Note however you might see slight differences due to rounding or efficiency due to estimation of one vs many models.

      The single model approach has some convenience because if you wish to accept the same effect for a given covariate across models, then you only need to omit the interaction with your stratification factor. In these situations though, you would not necessarily expect to have the same estimates as described above because of the (implied) constraint.
      Last edited by Leonardo Guizzetti; 27 Jul 2023, 16:07.

      Comment


      • #4
        More arguments in favour of the single model approach can be found here (in the answer by user172180): Regarding Peter Flom's comment about the stratified analysis being easier for lay audiences, I think that by using -margins- to display fitted values and -contrast- with the @ operator, you can present results from the single model that are just as easy for that lay audience to follow. YMMV. HTH.
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          I'm guessing the differences are due to the fact that if you stratify then separate variance parameters are estimated for different school types. If you pool, even if you use full interactions you are imposing a common variance of the unobserved effects. So I don't think the difference is rounding error. In a linear model, you wouldn't see any difference because the OLS estimates of the mean parameters are the same whether you pool will full interactions or do separate regressions.

          Comment


          • #6
            if you use full interactions you are imposing a common variance of the unobserved effects.
            This makes sense to me. I did not consider the variance of the unobserved effects when I was trying to think this through this afternoon.
            Last edited by Daniel Schaefer; 27 Jul 2023, 21:54.

            Comment


            • #7
              Thank you all for your answers !

              Comment

              Working...
              X