Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Increase in the variance component of level 2 when individual-level covariates are added to a multilevel logistic regression

    Dear all,

    In the next (empty) model, where I try to explain expectation of university graduation among teenagers in PISA, I consider the nesting of individuals (teenagers) into schools (level 2) and countries (level 3). But the model does not have any covariate at any level. In order to know to what extent a multilevel model is justified, I want to know how variance is distributed; or, in other words, how variance depends on the fact that observations are grouped at each level.

    This is the reason why I run 'estat icc' after the empty model. The output tells me that the % of the residual variance that is accounted for by the clustering of individuals into countries is 13.2, far below the variance that is explained by how individuals are nested in schools within countries 39.8%

    PHP Code:
    xtmelogit expect_ISCED5A if fisced4!=|| country3: || schoolid:

    Refining starting values

    [
    Iterations omitted]

    Mixed-effects logistic regression               Number of obs     =    152,968

    ----------------------------------------------------------------------------
                    |     
    Noof       Observations per group       Integration
     Group variable 
    |     groups    Minimum    Average    Maximum      points
    ----------------+-----------------------------------------------------------
           
    country3 |         28      3,130    5,463.1     11,565           7
           schoolid 
    |      6,159          1       24.8        242           7
    ----------------------------------------------------------------------------

                                                    
    Wald chi2(0)      =          .
    Log likelihood = -86627.398                     Prob chi2       =          .

    ------------------------------------------------------------------------------
    expect_IS~5A Coefficient  Stderr.      z    P>|z|     [95confinterval]
    -------------+----------------------------------------------------------------
           
    _cons |  -.3936643   .1619796    -2.43   0.015    -.7111385   -.0761901
    ------------------------------------------------------------------------------

    ------------------------------------------------------------------------------
      
    Random-effects parameters  |   Estimate   Stderr.     [95confinterval]
    -----------------------------+------------------------------------------------
    country3Identity           |
                       
    sd(_cons) |   .8508562   .1152485      .6524704    1.109562
    -----------------------------+------------------------------------------------
    schoolidIdentity           |
                       
    sd(_cons) |   1.211522   .0149724       1.18253    1.241226
    ------------------------------------------------------------------------------
    LR test vslogistic modelchi2(2) = 37745.26            Prob chi2 0.0000

    Note
    LR test is conservative and provided only for reference.

    estat icc

    Residual intraclass correlation

    ------------------------------------------------------------------------------
                           
    Level |        ICC   Stderr.     [95confinterval]
    -----------------------------+------------------------------------------------
                        
    country3 |     .13207     .03105      .0821347     .205565
               schoolid
    |country3 |   .3998355   .0219032      .3577718    .4434301
    ------------------------------------------------------------------------------ 

    My puzzle comes when I add a number of individual-level covariates to the previous (empty) model. The variance component corresponding to schools within countries decreases (not much), but, to my, surprise the residual variance to be attributed to country level increases, instead of decreasing (see below: 0.29, instead of 0.13). .

    PHP Code:
    xtmelogit expect_ISCED5A immig3 famstruc3 Above_mode Below_mode PV1MATH PV1READ positive_att vocational ib4.fisced4 if fisced4!=|| country3: || schoolid
    PHP Code:
    estat icc

    Residual intraclass correlation

    ------------------------------------------------------------------------------
                           
    Level |        ICC   Stderr.     [95confinterval]
    -----------------------------+------------------------------------------------
                        
    country3 |    .294492   .0567162      .1964395    .4161412
               schoolid
    |country3 |   .3608058    .051421      .2672012    .4663342
    ------------------------------------------------------------------------------ 

    Could anybody provide an explanation for this? I would have expected that part of the country-level variance would have been absorbed or captured by the individual-level variables introduced in the second model (compositional effect), but the opposite happens.

    Many thanks for your attention

    Luis Ortiz

  • #2
    Something is wrong with the output you are showing for the first model. The individual-level residual variance is missing. While it can be calculated from the variance components shown and the ICCs, it should be in your output.

    Moreover, you don't show the full output from the adjusted model. You only show the results from -estat icc-. But the dynamics of variance partitioning cannot be understood just from -estat icc- in isolation. The variance components at all three levels of the model are going to shift, and the total unexplained variance is going to decline. The the icc's are like ratios of variance components: the change in an ICC depends on the relative change of its numerator and denominator. When everything is in play, you really need to look at the variance components themselves to understand what happens to the distribution of unexplained variance across levels.

    Comment


    • #3
      Originally posted by Luis Ortiz View Post
      . . . to my, surprise the residual variance to be attributed to country level increases, instead of decreasing. . .
      Could it be that one or more of your predictors correlates with the outcome and the random-effects estimator is no longer statistically consistent? For example, I've heard that these international student-performance surveys suffer from selection bias, where some countries put a more representative sample of their entire student population through the test while others are rather selective as to which students sit for the examination.

      By the way, is that Above_mode Below_mode pair of predictors for the mode of the school? For the country? Overall? (I assume that there's an At_mode category that you're omitting, that is, that the two predictors are not mutually exclusive and collectively [jointly] exhaustive.)

      Comment


      • #4
        Originally posted by Joseph Coveney View Post
        . . . predictors correlates with the outcome. . .
        I hope that the intention was clear despite the precaffeinated rambling.

        Comment


        • #5
          Dear Clyde and Joseph,

          Thanks for your attention to my query.

          I was not aware of doing anything wrong with my first model. My data is hierarchical, with tree levels: students, schools and countries. The first model is meant to be an empty one; there is just the dependent variable there (expect_ISCED5A) which informs of the expectation of university graduation among the interviewees (yes/no). The "if fisced4!=5" is a condition meant to discard all cases where father's education was not declared or was unknown. It does not affect the fact that the model is an empty one.

          Stata does not seem to provide the individual-level residual variance by default; am I missing anything here??. Is there any option that I should add to the xtmelogit so that Stata provides the residual variance at individual-level? I have tried again with the option var for both models. What you said is precisely what I am trying to do, Clyde: knowing how the distribution of unexplained variance across levels changes from one model to the other.

          PHP Code:
          xtmelogit expect_ISCED5A if fisced4!=|| country3: || schoolid:, var

          Refining starting values

          Iteration 0:  Log likelihood = -86753.948  
          Iteration 1
          :  Log likelihood = -86672.615  
          Iteration 2
          :  Log likelihood = -86633.841  

          Performing gradient
          -based optimization

          Iteration 0:  Log likelihood = -86633.841  
          Iteration 1
          :  Log likelihood = -86627.843  
          Iteration 2
          :  Log likelihood = -86627.402  
          Iteration 3
          :  Log likelihood = -86627.398  

          Mixed
          -effects logistic regression               Number of obs     =    152,968

          ----------------------------------------------------------------------------
                          |     
          Noof       Observations per group       Integration
           Group variable 
          |     groups    Minimum    Average    Maximum      points
          ----------------+-----------------------------------------------------------
                 
          country3 |         28      3,130    5,463.1     11,565           7
                 schoolid 
          |      6,159          1       24.8        242           7
          ----------------------------------------------------------------------------

                                                          
          Wald chi2(0)      =          .
          Log likelihood = -86627.398                     Prob chi2       =          .

          ------------------------------------------------------------------------------
          expect_IS~5A Coefficient  Stderr.      z    P>|z|     [95confinterval]
          -------------+----------------------------------------------------------------
                 
          _cons |  -.3936643   .1619796    -2.43   0.015    -.7111385   -.0761901
          ------------------------------------------------------------------------------

          ------------------------------------------------------------------------------
            
          Random-effects parameters  |   Estimate   Stderr.     [95confinterval]
          -----------------------------+------------------------------------------------
          country3Identity           |
                            var(
          _cons) |   .7239562   .1961198      .4257177    1.231127
          -----------------------------+------------------------------------------------
          schoolidIdentity           |
                            var(
          _cons) |   1.467786   .0362788      1.398376    1.540642
          ------------------------------------------------------------------------------
          LR test vslogistic modelchi2(2) = 37745.26            Prob chi2 0.0000

          Note
          LR test is conservative and provided only for reference


          The following example, drawn from the 'xtmelogit postestimation' Stata manual, consists on a two-level model but the sd(_cons) is only provided for the second level (in this case, "patient"). In other words, the residual level at individual level is not provided either; am I wrong?

          Click image for larger version

Name:	Sin título.png
Views:	1
Size:	144.3 KB
ID:	1730462


          The reason that I did not paste the output of the second model is not to make my post excessively long. But here it goes. The model adds individual-level covariates (student's attributes) to the previous model.

          PHP Code:
          xtmelogit expect_ISCED5A immig3 famstruc3 Above_mode Below_mode PV1MATH PV1READ positive_att vocational ib4.fisced4 if fisced4!=|| country3: || schoolid
          > :, var

          Refining starting values

          Iteration 0:  Log likelihood = -69328.053  (not concave)
          Iteration 1:  Log likelihood = -68828.807  
          Iteration 2
          :  Log likelihood = -68005.858  

          Performing gradient
          -based optimization

          Iteration 0:  Log likelihood = -68005.858  
          Iteration 1
          :  Log likelihood = -67759.343  
          Iteration 2
          :  Log likelihood = -67757.538  
          Iteration 3
          :  Log likelihood = -67757.537  

          Mixed
          -effects logistic regression               Number of obs     =    139,889

          ----------------------------------------------------------------------------
                          |     
          Noof       Observations per group       Integration
           Group variable 
          |     groups    Minimum    Average    Maximum      points
          ----------------+-----------------------------------------------------------
                 
          country3 |         27      2,940    5,181.1     10,984           7
                 schoolid 
          |      6,000          1       23.3        232           7
          ----------------------------------------------------------------------------

                                                          
          Wald chi2(11)     =   20448.68
          Log likelihood 
          = -67757.537                     Prob chi2       =     0.0000

          ------------------------------------------------------------------------------------
              
          expect_ISCED5A Coefficient  Stderr.      z    P>|z|     [95confinterval]
          -------------------+----------------------------------------------------------------
                      
          immig3 |   .7391065   .0293237    25.21   0.000     .6816332    .7965799
                   famstruc3 
          |  -.0785518   .0165989    -4.73   0.000    -.1110851   -.0460185
                  Above_mode 
          |   .1960051   .0298145     6.57   0.000     .1375698    .2544404
                  Below_mode 
          |  -.5960308   .0234743   -25.39   0.000    -.6420397    -.550022
                     PV1MATH 
          |   .0055413   .0001219    45.45   0.000     .0053024    .0057803
                     PV1READ 
          |   .0054015   .0001207    44.74   0.000     .0051649    .0056381
                positive_att 
          |   .1264264   .0036177    34.95   0.000     .1193359     .133517
                  vocational 
          |  -1.662125   .0332901   -49.93   0.000    -1.727372   -1.596877
                             
          |
                     
          fisced4 |
          Lower sec or less  |  -1.279878   .0231672   -55.25   0.000    -1.325285   -1.234471
                  Upper sec  
          |  -1.058469   .0199266   -53.12   0.000    -1.097525   -1.019414
           Upper vocational  
          |  -.8408695    .025305   -33.23   0.000    -.8904665   -.7912725
                             
          |
                       
          _cons |  -4.835085   .2544421   -19.00   0.000    -5.333783   -4.336388
          ------------------------------------------------------------------------------------

          ------------------------------------------------------------------------------
            
          Random-effects parameters  |   Estimate   Stderr.     [95confinterval]
          -----------------------------+------------------------------------------------
          country3Identity           |
                            var(
          _cons) |   1.515721   .4137598      .8876853    2.588089
          -----------------------------+------------------------------------------------
          schoolidIdentity           |
                            var(
          _cons) |   .3413107   .0131528      .3164812    .3680881
          ------------------------------------------------------------------------------
          LR test vslogistic modelchi2(2) = 26247.51            Prob chi2 0.0000

          Note
          LR test is conservative and provided only for reference

          Regarding your question about Above_mode Below_mode, Joseph, yes, you intuition is right: the reference category ("at_mode") is missing in the line. This variable "indicates whether students are at a modal grade in a country or whether they are above or below the modal grade" (PISA, 2003, Technical Report)

          Again, many thanks for your attention to my initial post, and to this subsequent correction

          All the best

          Luis Ortiz
          Attached Files

          Comment


          • #6
            Sorry, I didn't read your outputs carefully enough. I thought they came from -mixed-. For -meqrlogit- (formerly known as -xtmelogit-), and also for -melogit- there is no output of the residual level variance component because, by definition, it is fixed at (pi)2/3, the variance of the standard logistic distribution. This also means that when you add variables, there is no way for variance to shift into or out of the bottom level--it can only be absorbed by the fixed effects or shift among the higher levels.

            I'm afraid I don't have any more specific thoughts about your original question.

            Comment


            • #7
              Many thanks for your guidance, Clyde.

              All the best

              Luis

              Comment

              Working...
              X