Multilevel logistic regression

faiza zain

Join Date: Sep 2019

Posts: 8
#1

Multilevel logistic regression

02 Sep 2019, 03:39

Hi
I am running an multilevel logistic regression on dhs data of ukraine.I amnot sure about the result.I need an expert opnion ,some one who could assure that I did it right..I want to investigate the factors of tobacco smoking.For random intercept I used region.Which is categorical variable(North,South,east,west).For level1 model my variables are(age,gender,work or not,highest education level,marital status).I run random model first without introducing level1 variables.I used xtmelogit smoke || region:

Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

region: Identity
var(_cons) .0660989 .0433564 .0182752 .23907

LR test vs. logistic model: chibar2(01) = 122.11 Prob >= chibar2 = 0.0000

Than I use patient risk score by running logistic regression on all predictor variables.Later used that smokerisk variable for fixed effect model.

Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

region: Identity
var(_cons) .0660989 .0433564 .0182752 .23907

LR test vs. logistic model: chibar2(01) = 122.11 Prob >= chibar2 = 0.0000

. xi:logistic smoke age i.ms i.highedu i.wealthi i.cwork i.gender
i.ms _Ims_0-5 (naturally coded; _Ims_0 omitted)
i.highedu _Ihighedu_0-3 (naturally coded; _Ihighedu_0 omitted)
i.wealthi _Iwealthi_1-5 (naturally coded; _Iwealthi_1 omitted)
i.cwork _Icwork_0-1 (naturally coded; _Icwork_0 omitted)
i.gender _Igender_0-1 (naturally coded; _Igender_0 omitted)

Logistic regression Number of obs = 12,210
LR chi2(15) = 2764.91
Prob > chi2 = 0.0000
Log likelihood = -4906.05 Pseudo R2 = 0.2198

smoke Odds Ratio Std. Err. z P>z [95% Conf. Interval]

age 1.004373 .0034311 1.28 0.201 .997671 1.011121
_Ims_1 1.314048 .1226059 2.93 0.003 1.094437 1.577727
_Ims_2 3.14551 .4351557 8.28 0.000 2.39847 4.125227
_Ims_3 1.641184 .289318 2.81 0.005 1.161722 2.318528
_Ims_4 3.149873 .3790349 9.53 0.000 2.488084 3.987686
_Ims_5 3.116891 .5685119 6.23 0.000 2.180043 4.45634
_Ihighedu_1 3.54832 3.780326 1.19 0.235 .4397094 28.63386
_Ihighedu_2 .4787269 .4745827 -0.74 0.457 .0685891 3.341341
_Ihighedu_3 .3140836 .3114419 -1.17 0.243 .0449783 2.193247
_Iwealthi_2 .8653848 .070379 -1.78 0.075 .7378766 1.014927
_Iwealthi_3 1.242929 .1050261 2.57 0.010 1.053224 1.466804
_Iwealthi_4 1.108127 .1003171 1.13 0.257 .9279647 1.323267
_Iwealthi_5 1.047131 .093587 0.52 0.606 .8788707 1.247604
_Icwork_1 1.589846 .1090885 6.76 0.000 1.38979 1.818699
_Igender_1 12.57861 .7383958 43.13 0.000 11.21153 14.11238
_cons .1005307 .1002092 -2.30 0.021 .0142501 .7092155

Note: _cons estimates baseline odds.

. predict smokerisk,xb

. xtmelogit smoke smokerisk region:,var

Refining starting values:

Iteration 0: log likelihood = -4876.2025 (not concave)
Iteration 1: log likelihood = -4871.9337
Iteration 2: log likelihood = -4871.5384

Performing gradient-based optimization:

Iteration 0: log likelihood = -4871.5384
Iteration 1: log likelihood = -4871.2656
Iteration 2: log likelihood = -4871.2629
Iteration 3: log likelihood = -4871.2629

Mixed-effects logistic regression Number of obs = 12,210
Group variable: region Number of groups = 5

Obs per group:
min = 1,889
avg = 2,442.0
max = 3,145

Integration points = 7 Wald chi2(1) = 2230.59
Log likelihood = -4871.2629 Prob > chi2 = 0.0000

smoke Coef. Std. Err. z P>z [95% Conf. Interval]

smokerisk .9993888 .0211604 47.23 0.000 .9579151 1.040862
_cons -.0028393 .1065424 -0.03 0.979 -.2116585 .2059799

Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

region: Identity
var(_cons) .0509872 .0341417 .0137241 .1894251

LR test vs. logistic model: chibar2(01) = 69.57 Prob >= chibar2 = 0.0000

so the variance droped from .0660989 to .0509872 .
This actually means .22862256 or almost 23 percent regional variance can be explained by personal characteristics.
When I try to run this model with cluster number result get messed up instead of decrease in variance it increases.I am confused kindly advice.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

02 Sep 2019, 10:52

When I try to run this model with cluster number

What does this mean? Please show the code and the output.

I will also make the general comment that while it is not strictly speaking wrong to use a 4-level variable, region, as a random intercept, it is dicey. That is a very small sample of region-space, and the variance component estimate at that level will be pretty meaningless as a result. Moreover, from the names of the regions, north, south, east, and west, it sounds like the regions are in fact all of the regions in the country. I think it would be more appropriate to just do this as a simple logistic regression with region as one of the predictor variables, rather than doing a multi-level model in this case.

Finally, get rid of the -xi:- prefixes. It doesn't actually do anything here except make your output harder to read. Keep all the i.'s, but eliminate -xi:- itself. Then you are using factor-variable notation (see -help fvvarlist-), which is much better and also opens up the possibility of using the -margins- command later to help interpret your findings.
Comment
faiza zain

Join Date: Sep 2019

Posts: 8
#3

02 Sep 2019, 13:47

hi
Thank you so much for the response and suggestion:if you would help me understand this output.I tried to run multilevel model with cluster number and i get following results.
xtmelogit smoke cluster_num:,var

Refining starting values:

Iteration 0: log likelihood = -6173.3441
Iteration 1: log likelihood = -6139.9731
Iteration 2: log likelihood = -6138.9532

Performing gradient-based optimization:

Iteration 0: log likelihood = -6138.9532
Iteration 1: log likelihood = -6138.9519
Iteration 2: log likelihood = -6138.9519

Mixed-effects logistic regression Number of obs = 12,210
Group variable: cluster_num Number of groups = 500

Obs per group:
min = 1
avg = 24.4
max = 71

Integration points = 7 Wald chi2(0) = .
Log likelihood = -6138.9519 Prob > chi2 = .

smoke Coef. Std. Err. z P>z [95% Conf. Interval]

_cons -1.382858 .0384116 -36.00 0.000 -1.458144 -1.307573

Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

cluster_num: Identity
var(_cons) .4145091 .0477905 .3306705 .5196042

LR test vs. logistic model: chibar2(01) = 299.10 Prob >= chibar2 = 0.0000

. logistic smoke age i.wealthi i.gender i.cwork i.highedu

Logistic regression Number of obs = 12,210
LR chi2(10) = 2575.18
Prob > chi2 = 0.0000
Log likelihood = -5000.9126 Pseudo R2 = 0.2048

smoke Odds Ratio Std. Err. z P>z [95% Conf. Interval]

age 1.011971 .0029652 4.06 0.000 1.006176 1.017799

wealthi
Poorer .8028593 .0642915 -2.74 0.006 .6862412 .9392951
Middle 1.177615 .0979283 1.97 0.049 1.000504 1.386078
Richer 1.079127 .0961087 0.86 0.393 .9062822 1.284938
Richest .975636 .0858883 -0.28 0.779 .8210202 1.159369

gender
Male 10.80233 .5715927 44.97 0.000 9.738161 11.98278

cwork
Yes 1.741987 .1161309 8.33 0.000 1.528618 1.985139

highedu
Primary 5.171662 5.526414 1.54 0.124 .6368544 41.99718
Secondary .6259353 .6236485 -0.47 0.638 .088805 4.411855
Higher .4003688 .3989989 -0.92 0.358 .0567768 2.82325

_cons .0934087 .0936148 -2.37 0.018 .0131011 .6659882

Note: _cons estimates baseline odds.

. predict smokerisk, xb

. xtmelogit smoke smokerisk cluster_num:,var

Refining starting values:

Iteration 0: log likelihood = -4836.0464
Iteration 1: log likelihood = -4817.1706
Iteration 2: log likelihood = -4813.0308

Performing gradient-based optimization:

Iteration 0: log likelihood = -4813.0308
Iteration 1: log likelihood = -4813.0239
Iteration 2: log likelihood = -4813.0239

Mixed-effects logistic regression Number of obs = 12,210
Group variable: cluster_num Number of groups = 500

Obs per group:
min = 1
avg = 24.4
max = 71

Integration points = 7 Wald chi2(1) = 2087.12
Log likelihood = -4813.0239 Prob > chi2 = 0.0000

smoke Coef. Std. Err. z P>z [95% Conf. Interval]

smokerisk 1.10148 .0241103 45.68 0.000 1.054225 1.148735
_cons .0231843 .0526071 0.44 0.659 -.0799237 .1262924

Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

cluster_num: Identity
var(_cons) .675071 .073819 .544842 .8364276

LR test vs. logistic model: chibar2(01) = 375.78 Prob >= chibar2 = 0.0000
The variance increased to 0.675071 I dont understand this.I think I should take your advice but kindly for learning I want to know how to interpret this result.Or this is just meaningless?
Thank you so much for the help.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

02 Sep 2019, 14:41

in #1 you are trying to calculate proportions of variance attributable to regional and personal characteristics in some way that I don't understand. These are logistic regressions, multi-level at that, and you are applying a logic that really only works for linear regressions with R². The logistic model is more complicated and there is no statistic that is truly analogous to a linear regression's R², nor can the estimated variance components be compared in the way you are using them. In particular, you cannot in a logistic regression partition the total outcome variance into a sum of the variance of the predicted values (xb) and the variance of the residuals (or residuals + higher level residuals). The sum of the residual variance(s) and the variance of xb's will change from one model to the next. Remember, in a logistic model, there is no orthogonality between residuals (at any level) and the predicted values, so we do not have a Pythagorean situation the way we do in linear regression.

Basically, there is no reason to expect any particular magnitude or direction of change in the variance components at any level based on the addition of new predictors to the model.
Comment
faiza zain

Join Date: Sep 2019

Posts: 8
#5

02 Sep 2019, 16:03

okay thank you so much.
Comment
faiza zain

Join Date: Sep 2019

Posts: 8
#6

04 Sep 2019, 07:02

Mixed-effects logistic regression Number of obs = 12,210
Group variable: cluster_num Number of groups = 500

Obs per group:
min = 1
avg = 24.4
max = 71

Integration method: mvaghermite Integration pts. = 7

Wald chi2(3) = 2057.40
Log likelihood = -4875.2058 Prob > chi2 = 0.0000

smoke Coef. Std. Err. z P>z [95% Conf. Interval]

cwork
Yes .4715606 .0706877 6.67 0.000 .3330152 .610106

gender
Male 2.636477 .0595999 44.24 0.000 2.519663 2.753291
age .0138215 .003147 4.39 0.000 .0076535 .0199895
_cons -3.276884 .1278783 25.63 0.000 -3.527521 -3.026247

cluster_num
var(_cons) .6691699 .0731469 .5401217 .829051

LR test vs. logistic model: chibar2(01) = 374.55 Prob >= chibar2 = 0.0000

.

Can you please give me suggestion about the above results?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

04 Sep 2019, 09:14

The best suggestion I can give you is to repost it placing it between code delimiters so it is easily readable. (See Forum FAQ #12 for instructions on using code deilmiters.)

But that isn't what you had in mind, I suppose. What kind of suggestion are you looking for? Suggestion about what?
Comment

faiza zain

Join Date: Sep 2019
Posts: 8

08 Sep 2019, 13:50

The model

Code:

  melogit smoke age i.cwork i.gender || cluster_num:

my data

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str15 id str3 ccode long cluster_num byte(age smoke ms cwork) float gender
"       16140  2" "UA5" 16 30 0 1 1 0
"19142  2"        "UA5" 19 37 1 4 1 1
"        3 75  2" "UA5"  3 31 0 1 1 0
"       25161  2" "UA5" 25 47 0 1 1 0
"13140  2"        "UA5" 13 20 1 0 0 1
"       20133  2" "UA5" 20 34 0 1 0 0
"       19121  3" "UA5" 19 40 0 1 0 0
"14 71  4"        "UA5" 14 22 0 0 1 1
"       24 73  2" "UA5" 24 46 0 1 1 0
"5 37  2"         "UA5"  5 26 1 4 1 1
"       13 51  2" "UA5" 13 27 0 2 1 0
"       23 59  1" "UA5" 23 46 0 4 0 0
"17257  3"        "UA5" 17 26 1 0 1 1
"        4 33  1" "UA5"  4 48 0 1 1 0
"20 34  2"        "UA5" 20 21 0 0 1 1
"       12 13  2" "UA5" 12 31 1 2 1 0
"17 54  2"        "UA5" 17 49 0 5 1 1
"       16 59  1" "UA5" 16 49 0 1 1 0
"14 51  2"        "UA5" 14 46 1 4 0 1
"9 86  2"         "UA5"  9 22 0 1 1 1
"       15100  2" "UA5" 15 42 0 1 1 0
"        2 52  1" "UA5"  2 49 0 3 1 0
"       14136  2" "UA5" 14 44 0 5 0 0
"        8120  2" "UA5"  8 30 0 1 1 0
"       17211  2" "UA5" 17 48 1 1 1 0
"7 91  2"         "UA5"  7 22 1 0 1 1
"        8120  2" "UA5"  8 30 0 1 1 0
"       23128  1" "UA5" 23 48 1 2 1 0
"       11 71  2" "UA5" 11 32 1 3 0 0
"       16120  3" "UA5" 16 42 0 1 1 0
"        3 40  1" "UA5"  3 44 1 1 1 0
"       12 77  2" "UA5" 12 41 0 3 1 0
"       20 39  1" "UA5" 20 47 0 1 0 0
"       21139  1" "UA5" 21 30 1 2 0 0
"       19 48  2" "UA5" 19 39 0 1 0 0
"        5 21  2" "UA5"  5 44 0 1 1 0
"       13121  2" "UA5" 13 30 0 1 0 0
"        9 76  2" "UA5"  9 49 0 1 1 0
"5 53  1"         "UA5"  5 42 1 1 1 1
"       24183  2" "UA5" 24 48 0 1 0 0
"        4 82  1" "UA5"  4 26 0 1 1 0
"       19 48  2" "UA5" 19 39 0 1 0 0
"        9 80  2" "UA5"  9 25 0 1 0 0
"9 53  3"         "UA5"  9 21 1 1 1 1
"       12100  2" "UA5" 12 37 0 4 1 0
"       20133  2" "UA5" 20 34 0 1 0 0
"       11166  6" "UA5" 11 24 0 1 1 0
"       21139  1" "UA5" 21 30 1 2 0 0
"       13  4  3" "UA5" 13 45 0 4 1 0
"       21129  1" "UA5" 21 24 1 2 0 0
"       23 42  2" "UA5" 23 27 1 1 0 0
"       20105  2" "UA5" 20 37 0 1 0 0
"15 73  1"        "UA5" 15 45 1 1 1 1
"       19100  1" "UA5" 19 49 0 1 0 0
"       10126  2" "UA5" 10 26 0 1 1 0
"6 29  3"         "UA5"  6 17 0 0 0 1
"        6 45  1" "UA5"  6 41 1 3 0 0
"       21 40  2" "UA5" 21 43 0 1 1 0
"        6 99  2" "UA5"  6 37 0 1 1 0
"4106  2"         "UA5"  4 24 1 0 1 1
"       25 12  2" "UA5" 25 37 0 1 1 0
"16 79  1"        "UA5" 16 41 0 5 1 1
"        4  8  3" "UA5"  4 40 0 1 0 0
"       18 67  2" "UA5" 18 28 0 1 0 0
"       21129  1" "UA5" 21 24 1 2 0 0
"        2118  3" "UA5"  2 27 0 1 1 0
"       10104  2" "UA5" 10 30 0 1 1 0
"        6 88  1" "UA5"  6 34 0 1 0 0
"       26  8  3" "UA5" 26 28 1 1 0 0
"       12 82  1" "UA5" 12 34 0 1 1 0
"       19 59  1" "UA5" 19 35 0 4 0 0
"9 93  1"         "UA5"  9 38 1 1 1 1
"15 80  3"        "UA5" 15 15 1 0 0 1
"       17100  1" "UA5" 17 49 0 1 1 0
"       16 13  1" "UA5" 16 42 0 1 1 0
"       24175  3" "UA5" 24 21 0 1 0 0
"        8120  2" "UA5"  8 30 0 1 1 0
"13112  2"        "UA5" 13 19 1 0 1 1
"       15  6  2" "UA5" 15 29 0 2 0 0
"3 46  6"         "UA5"  3 17 1 0 0 1
"       10109  3" "UA5" 10 28 1 0 1 0
"       16110  1" "UA5" 16 40 1 2 1 0
"        6 45  1" "UA5"  6 41 1 3 0 0
"       23  3  1" "UA5" 23 45 0 1 1 0
"       20 39  1" "UA5" 20 47 0 1 0 0
"25 12  3"        "UA5" 25 38 1 1 1 1
"25148  1"        "UA5" 25 41 0 0 0 1
"16 69  1"        "UA5" 16 35 1 0 0 1
"       25 31  2" "UA5" 25 19 0 1 0 0
"12 82  2"        "UA5" 12 30 1 1 1 1
"       19121  3" "UA5" 19 40 0 1 0 0
"       18152  2" "UA5" 18 34 0 1 1 0
"       26  8  3" "UA5" 26 28 1 1 0 0
"       24132  2" "UA5" 24 28 1 1 0 0
"       11 83  2" "UA5" 11 32 0 1 0 0
"       26 95  1" "UA5" 26 28 0 4 1 0
"       25 31  2" "UA5" 25 19 0 1 0 0
"       16 24  1" "UA5" 16 45 0 1 0 0
"19121  2"        "UA5" 19 46 1 1 0 1
"       19  7  2" "UA5" 19 46 0 1 1 0
end
label values smoke MV463A
label def MV463A 0 "No", modify
label def MV463A 1 "Yes", modify
label values ms MV501
label def MV501 0 "Never married", modify
label def MV501 1 "Married", modify
label def MV501 2 "Living together", modify
label def MV501 3 "Widowed", modify
label def MV501 4 "Divorced", modify
label def MV501 5 "Not living together", modify
label values cwork MV714
label def MV714 0 "No", modify
label def MV714 1 "Yes", modify
label values gender gender
label def gender 0 "Female", modify
label def gender 1 "Male", modify

I want your opnion about the results.Is this multilevel model better than simple logistic regression?And I want to confirm the interpretation of the results.According to the results there is almost 67% variance between clusters.Is this how the interpretation of Cluster_num (Var(cons_)) done?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

08 Sep 2019, 14:12

According to the results there is almost 67% variance between clusters.Is this how the interpretation of Cluster_num (Var(cons_)) done?

No, that is not correct. The 0.67 number represents the absolute variance of the cluster intercepts, it is not a percent of variance. The percent of variance can be gotten by running -estat icc-. Or, you can calculate the percent of variance by hand: the variance at the residual level in a logistic model is always pi²/3, which is about 3.29. So the proportion of variance at the cluster level is 0.67/(0.67+3.29) which is approximately 0.17, so 17% of the unexplained variance is at the cluster level.

That said, what conclusion are you drawing from this? The proportion of variance at the cluster level is whatever it is: a high or low level doesn't say anything about the validity of your model one way or another.

As for whether the multilevel model is better than a simple logistic regression, the answer is yes. If you had only a very small proportion of variance at the cluster level, then arguably a flat logistic regression model would be just as good--but at 17%, the multi-level model is accounting better for the data. If you like to make these decisions using statistical tests, the output of your -melogit- command itself includes such a test at the very end where it says:

Code:

LR test vs. logistic model: chibar2(01) = 374.55 Prob >= chibar2 = 0.0000

So the likelihood ratio test of multilevel vs logistic model has a very low p-value. (I don't recommend using statistical tests for model selection--I'm just showing you how to do it if you want to use that approach.) You could also calculate AIC and BIC for both models and see how much change there is (-estat ic- gives these statistics following a regression.)
Comment
faiza zain

Join Date: Sep 2019

Posts: 8
#10

10 Sep 2019, 02:57

Thank you so much for the help.
Comment
faiza zain

Join Date: Sep 2019

Posts: 8
#11

17 Sep 2019, 16:22

Hi!
Can you explain me how to find odds ratio using multilevel model?What is the code?
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4464
#12

17 Sep 2019, 19:06

use the "or" option; see

Code:

help melogit
Comment
faiza zain

Join Date: Sep 2019

Posts: 8
#13

19 Sep 2019, 16:54

Thank you.
Is this equation is allright?
Beta0+Beta1*(gender)ij+u1j*(gender)ij+beta2*(cwork )ij+u2j*(cwork)ij+beta3*(ms)ij+u3j*(ms)ij+uoj
beta0 fixed intercept
uoj random intercept variance
beta1*(gender)ij general effect of the level 1 variable (gender)ij
(gender)ij is the observed value of the predictor variable for respondent i in cluster j.
u1j residual term associated with the deviation of the specific effect of the level 1 predictor gender in a cluster from the overall affect of gender across all clusters.
Thank you so much for your time and effort.
Comment

Announcement

Multilevel logistic regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment