Increase in the variance component of level 2 when individual-level covariates are added to a multilevel logistic regression

Luis Ortiz

Join Date: Dec 2014

Posts: 97
#1

Increase in the variance component of level 2 when individual-level covariates are added to a multilevel logistic regression

13 Oct 2023, 08:13

Dear all,

In the next (empty) model, where I try to explain expectation of university graduation among teenagers in PISA, I consider the nesting of individuals (teenagers) into schools (level 2) and countries (level 3). But the model does not have any covariate at any level. In order to know to what extent a multilevel model is justified, I want to know how variance is distributed; or, in other words, how variance depends on the fact that observations are grouped at each level.

This is the reason why I run 'estat icc' after the empty model. The output tells me that the % of the residual variance that is accounted for by the clustering of individuals into countries is 13.2, far below the variance that is explained by how individuals are nested in schools within countries 39.8%

PHP Code:

. xtmelogit expect_ISCED5A if fisced4!=5 || country3: || schoolid: Refining starting values: [Iterations omitted] Mixed-effects logistic regression Number of obs = 152,968 ---------------------------------------------------------------------------- | No. of Observations per group Integration Group variable | groups Minimum Average Maximum points ----------------+----------------------------------------------------------- country3 | 28 3,130 5,463.1 11,565 7 schoolid | 6,159 1 24.8 242 7 ---------------------------------------------------------------------------- Wald chi2(0) = . Log likelihood = -86627.398 Prob > chi2 = . ------------------------------------------------------------------------------ expect_IS~5A | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- _cons | -.3936643 .1619796 -2.43 0.015 -.7111385 -.0761901 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects parameters | Estimate Std. err. [95% conf. interval] -----------------------------+------------------------------------------------ country3: Identity | sd(_cons) | .8508562 .1152485 .6524704 1.109562 -----------------------------+------------------------------------------------ schoolid: Identity | sd(_cons) | 1.211522 .0149724 1.18253 1.241226 ------------------------------------------------------------------------------ LR test vs. logistic model: chi2(2) = 37745.26 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference. . estat icc Residual intraclass correlation ------------------------------------------------------------------------------ Level | ICC Std. err. [95% conf. interval] -----------------------------+------------------------------------------------ country3 | .13207 .03105 .0821347 .205565 schoolid|country3 | .3998355 .0219032 .3577718 .4434301 ------------------------------------------------------------------------------

My puzzle comes when I add a number of individual-level covariates to the previous (empty) model. The variance component corresponding to schools within countries decreases (not much), but, to my, surprise the residual variance to be attributed to country level increases, instead of decreasing (see below: 0.29, instead of 0.13). .

PHP Code:

. xtmelogit expect_ISCED5A immig3 famstruc3 Above_mode Below_mode PV1MATH PV1READ positive_att vocational ib4.fisced4 if fisced4!=5 || country3: || schoolid:

PHP Code:

. estat icc Residual intraclass correlation ------------------------------------------------------------------------------ Level | ICC Std. err. [95% conf. interval] -----------------------------+------------------------------------------------ country3 | .294492 .0567162 .1964395 .4161412 schoolid|country3 | .3608058 .051421 .2672012 .4663342 ------------------------------------------------------------------------------

Could anybody provide an explanation for this? I would have expected that part of the country-level variance would have been absorbed or captured by the individual-level variables introduced in the second model (compositional effect), but the opposite happens.

Many thanks for your attention

Luis Ortiz
Tags: empty model, ICC, Multilevel Analysis, variance component
Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#2

13 Oct 2023, 12:49

Something is wrong with the output you are showing for the first model. The individual-level residual variance is missing. While it can be calculated from the variance components shown and the ICCs, it should be in your output.

Moreover, you don't show the full output from the adjusted model. You only show the results from -estat icc-. But the dynamics of variance partitioning cannot be understood just from -estat icc- in isolation. The variance components at all three levels of the model are going to shift, and the total unexplained variance is going to decline. The the icc's are like ratios of variance components: the change in an ICC depends on the relative change of its numerator and denominator. When everything is in play, you really need to look at the variance components themselves to understand what happens to the distribution of unexplained variance across levels.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#3

13 Oct 2023, 18:17

Originally posted by Luis Ortiz View Post

. . . to my, surprise the residual variance to be attributed to country level increases, instead of decreasing. . .

Could it be that one or more of your predictors correlates with the outcome and the random-effects estimator is no longer statistically consistent? For example, I've heard that these international student-performance surveys suffer from selection bias, where some countries put a more representative sample of their entire student population through the test while others are rather selective as to which students sit for the examination.

By the way, is that Above_mode Below_mode pair of predictors for the mode of the school? For the country? Overall? (I assume that there's an At_mode category that you're omitting, that is, that the two predictors are not mutually exclusive and collectively [jointly] exhaustive.)
1 like
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#4

14 Oct 2023, 03:27

Originally posted by Joseph Coveney View Post

. . . predictors correlates with the outcome. . .

I hope that the intention was clear despite the precaffeinated rambling.
1 like
Comment
Luis Ortiz

Join Date: Dec 2014

Posts: 97
#5

16 Oct 2023, 12:35

Dear Clyde and Joseph,

Thanks for your attention to my query.

I was not aware of doing anything wrong with my first model. My data is hierarchical, with tree levels: students, schools and countries. The first model is meant to be an empty one; there is just the dependent variable there (expect_ISCED5A) which informs of the expectation of university graduation among the interviewees (yes/no). The "if fisced4!=5" is a condition meant to discard all cases where father's education was not declared or was unknown. It does not affect the fact that the model is an empty one.

Stata does not seem to provide the individual-level residual variance by default; am I missing anything here??. Is there any option that I should add to the xtmelogit so that Stata provides the residual variance at individual-level? I have tried again with the option var for both models. What you said is precisely what I am trying to do, Clyde: knowing how the distribution of unexplained variance across levels changes from one model to the other.

PHP Code:

. xtmelogit expect_ISCED5A if fisced4!=5 || country3: || schoolid:, var Refining starting values: Iteration 0: Log likelihood = -86753.948 Iteration 1: Log likelihood = -86672.615 Iteration 2: Log likelihood = -86633.841 Performing gradient-based optimization: Iteration 0: Log likelihood = -86633.841 Iteration 1: Log likelihood = -86627.843 Iteration 2: Log likelihood = -86627.402 Iteration 3: Log likelihood = -86627.398 Mixed-effects logistic regression Number of obs = 152,968 ---------------------------------------------------------------------------- | No. of Observations per group Integration Group variable | groups Minimum Average Maximum points ----------------+----------------------------------------------------------- country3 | 28 3,130 5,463.1 11,565 7 schoolid | 6,159 1 24.8 242 7 ---------------------------------------------------------------------------- Wald chi2(0) = . Log likelihood = -86627.398 Prob > chi2 = . ------------------------------------------------------------------------------ expect_IS~5A | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- _cons | -.3936643 .1619796 -2.43 0.015 -.7111385 -.0761901 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects parameters | Estimate Std. err. [95% conf. interval] -----------------------------+------------------------------------------------ country3: Identity | var(_cons) | .7239562 .1961198 .4257177 1.231127 -----------------------------+------------------------------------------------ schoolid: Identity | var(_cons) | 1.467786 .0362788 1.398376 1.540642 ------------------------------------------------------------------------------ LR test vs. logistic model: chi2(2) = 37745.26 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference.

The following example, drawn from the 'xtmelogit postestimation' Stata manual, consists on a two-level model but the sd(_cons) is only provided for the second level (in this case, "patient"). In other words, the residual level at individual level is not provided either; am I wrong?

The reason that I did not paste the output of the second model is not to make my post excessively long. But here it goes. The model adds individual-level covariates (student's attributes) to the previous model.

PHP Code:

. xtmelogit expect_ISCED5A immig3 famstruc3 Above_mode Below_mode PV1MATH PV1READ positive_att vocational ib4.fisced4 if fisced4!=5 || country3: || schoolid > :, var Refining starting values: Iteration 0: Log likelihood = -69328.053 (not concave) Iteration 1: Log likelihood = -68828.807 Iteration 2: Log likelihood = -68005.858 Performing gradient-based optimization: Iteration 0: Log likelihood = -68005.858 Iteration 1: Log likelihood = -67759.343 Iteration 2: Log likelihood = -67757.538 Iteration 3: Log likelihood = -67757.537 Mixed-effects logistic regression Number of obs = 139,889 ---------------------------------------------------------------------------- | No. of Observations per group Integration Group variable | groups Minimum Average Maximum points ----------------+----------------------------------------------------------- country3 | 27 2,940 5,181.1 10,984 7 schoolid | 6,000 1 23.3 232 7 ---------------------------------------------------------------------------- Wald chi2(11) = 20448.68 Log likelihood = -67757.537 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------------ expect_ISCED5A | Coefficient Std. err. z P>|z| [95% conf. interval] -------------------+---------------------------------------------------------------- immig3 | .7391065 .0293237 25.21 0.000 .6816332 .7965799 famstruc3 | -.0785518 .0165989 -4.73 0.000 -.1110851 -.0460185 Above_mode | .1960051 .0298145 6.57 0.000 .1375698 .2544404 Below_mode | -.5960308 .0234743 -25.39 0.000 -.6420397 -.550022 PV1MATH | .0055413 .0001219 45.45 0.000 .0053024 .0057803 PV1READ | .0054015 .0001207 44.74 0.000 .0051649 .0056381 positive_att | .1264264 .0036177 34.95 0.000 .1193359 .133517 vocational | -1.662125 .0332901 -49.93 0.000 -1.727372 -1.596877 | fisced4 | Lower sec or less | -1.279878 .0231672 -55.25 0.000 -1.325285 -1.234471 Upper sec | -1.058469 .0199266 -53.12 0.000 -1.097525 -1.019414 Upper vocational | -.8408695 .025305 -33.23 0.000 -.8904665 -.7912725 | _cons | -4.835085 .2544421 -19.00 0.000 -5.333783 -4.336388 ------------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects parameters | Estimate Std. err. [95% conf. interval] -----------------------------+------------------------------------------------ country3: Identity | var(_cons) | 1.515721 .4137598 .8876853 2.588089 -----------------------------+------------------------------------------------ schoolid: Identity | var(_cons) | .3413107 .0131528 .3164812 .3680881 ------------------------------------------------------------------------------ LR test vs. logistic model: chi2(2) = 26247.51 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference.

Regarding your question about Above_mode Below_mode, Joseph, yes, you intuition is right: the reference category ("at_mode") is missing in the line. This variable "indicates whether students are at a modal grade in a country or whether they are above or below the modal grade" (PISA, 2003, Technical Report)

Again, many thanks for your attention to my initial post, and to this subsequent correction

All the best

Luis Ortiz
Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#6

16 Oct 2023, 12:52

Sorry, I didn't read your outputs carefully enough. I thought they came from -mixed-. For -meqrlogit- (formerly known as -xtmelogit-), and also for -melogit- there is no output of the residual level variance component because, by definition, it is fixed at (pi)²/3, the variance of the standard logistic distribution. This also means that when you add variables, there is no way for variance to shift into or out of the bottom level--it can only be absorbed by the fixed effects or shift among the higher levels.

I'm afraid I don't have any more specific thoughts about your original question.
Comment
Luis Ortiz

Join Date: Dec 2014

Posts: 97
#7

17 Oct 2023, 05:46

Many thanks for your guidance, Clyde.

All the best

Luis
Comment

Announcement

Increase in the variance component of level 2 when individual-level covariates are added to a multilevel logistic regression

Comment

Comment

Comment

Comment

Comment

Comment