Extremely large variance in multilevel model

Yoosoo Chun

Join Date: Oct 2015

Posts: 4
#1

Extremely large variance in multilevel model

31 Oct 2015, 08:58

Dear everyone,

I'm running a three-level regression using meglm, and I'm getting extremely high residual variances (~10E+08) at level 2 and 3. I'm trying to understand why I'm getting such high variance and how to fix this.

For my model, I am using the following code:

meglm h11 [pw=v005] || v000: || v001:, link(logit) family(bernoulli)

where h11 is a binary (0 or 1) outcome variable, v000 and v001 are variables indicating the levels, and v005 is the weight variable at level 1. My sample size is ~300,000 at level 1, ~2000 at level 2, and 30 at level 3. I'm getting similar residual variance when I include 5+ other explanatory variables.

Any comments would be much appreciated. Thank you.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35726
#2

31 Oct 2015, 10:38

Also cross-posted at http://stats.stackexchange.com/quest...ltilevel-model

Please note our cross-posting policy, which is that you should tell us about it. http://www.statalist.org/forums/help#crossposting

There Tim commented: "Please describe your data and your model in greater details, because otherwise it is a guessing game"
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#3

31 Oct 2015, 12:32

How are the probability weights calculated for your data? Are they individual level probability weights that capture the inverse probability of the subject being selected, or are the weights generated through some type of stratified clustered random sampling design? As Nick - and Tim referenced by Nick - stated above it really is impossible to help you without better understanding your data. Do you get similarly large variance estimates when you fit a two level model? Are there actually three levels or are you trying to specify a cross classified model? What do the levels represent? What frequency do you observe for the outcome variable? Is it extremely rare? How many level two and level three units do you have? Without knowing your data with at least some minimal level of detail these might be some of the questions that people would need to know answers to in order to provide you with advice/suggestions on how best to move forward.
Comment
Yoosoo Chun

Join Date: Oct 2015

Posts: 4
#4

31 Oct 2015, 14:34

Thank you very much for your comments, and I apologize I was not aware of the cross-posting policy. I will make sure to follow it next time.

As for my data:

1. The data consists of three levels (not cross-classified); the levels are:
Level 1: children (n=300,000) / Level 2: district (n=2000) / Level 3: country (n=30)
2. The outcome(y=1) is around ~20% of Level 1 units.
3. The variance estimates from two level models are:
With children and district levels : small variance estimates (~0.5)
With children and country levels: large variance (~10E+08)
4. The weights are generated from stratified cluster sampling design.

I have tried the following in meglm, but I still did not get more reasonable variances:
1. Increasing the # level 3 units (country) to 50 with extra data (variance was still ~10E+08)
2. startgrid, using 0.0005 to 100 for both level 2 and level 3 random effect variances
(startgrid(0.0000005 0.00005 0.0005 0.005 0.05 50 100 _cons[v000] _cons[v000>v001]))
3. startvalues(zero)
4. Intpoints(15) (reduced variance by half, but still in the order of E+08)
5. changing family to binomial
6. removing weight

I also attempted to constrain the variances using fixed() but I got an "fix() is not allowed" error.

Your input will be much appreciated. Thank you so much for your time in advance.

Last edited by Yoosoo Chun; 31 Oct 2015, 14:42.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#5

31 Oct 2015, 15:14

Are there any countries for which y = 0 in all or almost all observations (or y = 1 in all or almost all observations)?
Comment
Yoosoo Chun

Join Date: Oct 2015

Posts: 4
#6

31 Oct 2015, 18:13

Thank you very much for the suggestion.

I examined the outcome distribution as per your suggestion, and also tried several other things but there still seems to be a problem:

1. Removing extreme countries
There were three countries with less than 10% of observations with y=1.
Removing these three countries reduced the variance to 1E+07. The new sample sizes were Lv1. 270,000/ Lv3:27.

2. Intpoints(195) (maximum # allowed)
This also reduced the variance by a factor of 10 (1E+07).

3. Reducing sample size
I kept the # countries the same, but removed half of the lv2 units (districts), and the corresponding children sample.
The variance was estimated at 0.5

I'm not sure what #3 is indicating, but I certainly don't wish to lose the power I have with full sample.

I'd really appreciate any further suggestions.

Last edited by Yoosoo Chun; 31 Oct 2015, 18:28.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#7

31 Oct 2015, 20:48

It would surprise me considerably if the effect seen in your 3rd effort in #6 were really due to reduced sample size. It is more likely that there is something odd about the data in some of the level 2 units that you removed. And if the offending level 2 units are somewhat clustered in a small subset of the level 3 units, that might account for the problem surfacing as an unreasonable variance component at level 3.

You might want to look at the distribution of your outcome variable in the units you removed, as well as the sample sizes in those units, and see if there is anything odd about some of them.

Here are some other thoughts. Consider fitting a probit model instead of a logit model and see if the problem persists. (-help meprobit-). If you get a fit with probit, remember that the scale in a probit model is different from that of the logit model when examining regression coefficients.

Another possibility is trying to fit the logit model using -meqrlogit-: same model, but a different estimator.

By the way, in future posts, if you have not solved the problem, it would be more helpful to post the exact command(s) used followed by the full Stata output (in a code block--see FAQ) than to just tell us an approximate value of the variance component. It is possible that other parts of the output give clues to what's going wrong.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#8

01 Nov 2015, 05:08

Are you interested in estimating the random intercept of countries? I don't remember the references off the top of my head, but if you search the 2013 AERA archives you should be able to find several papers that evaluated the stability of parameter estimates in the HLM/MLM context when the number of higher levels is small. There are certainly cases where 20 or so higher level units would be sufficient, but my guess - along with what Clyde has already suggested - is that you have some issues with the rarity of cases across the higher level units.

What is the outcome measure? If it is a child specific outcome why would you expect the between country variance to be smaller and is it meaningful to make between country comparisons on your outcome? For example, assessments like PISA are fine for making simple descriptive comparisons, but in the context of a country like the US where state and district-level education policies affect student outcomes more directly than federal policy, it wouldn't be meaningful to attribute an inference to a country level differences without adding myriad covariates to adjust the parameter estimates for these different context effects (and in doing so making it much more difficult to fit the model to the data). If you have a similar case, it might be worth considering -svyset-ing your data and fitting the model with district/country-level fixed effects.
Comment

Yoosoo Chun

Join Date: Oct 2015
Posts: 4

01 Nov 2015, 21:13

Thank you for your suggestions.

Upon trying meprobit and meqrlogit, I found that meqrlogit leads to reasonable variance estimation. Please find below the melogit and meqrlogit outputs from the same data. meprobit led similar results as melogit. While meqrlogit worked, I still need to fit my model using meglm in order to incorporate sampling weights (unsupported by meqrlogit).

With meglm, I tried using other likelihood maximization algorithms using the technique option (bhhh, dfp, bfgs) but the models never converged.
I'm wondering if there is a way to set a limit to the estimated variance values in meglm?

Thank you so much for your time!

Code:

 melogit h11  [pw=v005] || v000:||v001:, difficult intpoints(20)

Fitting fixed-effects model:

Iteration 0:   log likelihood = -113652.15  
Iteration 1:   log likelihood = -113640.33  
Iteration 2:   log likelihood = -113640.33  

Refining starting values:

Grid node 0:   log likelihood = -249117.87

Fitting full model:

numerical derivatives are approximate
nearby values are missing
Iteration 0:   log pseudolikelihood = -249117.87  
numerical derivatives are approximate
nearby values are missing
Iteration 1:   log pseudolikelihood = -248721.41  
Iteration 2:   log pseudolikelihood = -248110.31  
Iteration 3:   log pseudolikelihood = -245698.21  
Iteration 4:   log pseudolikelihood = -244314.84  
Iteration 5:   log pseudolikelihood = -244016.72  
Iteration 6:   log pseudolikelihood = -243946.86  
Iteration 7:   log pseudolikelihood = -243942.81  
Iteration 8:   log pseudolikelihood = -243935.27  (not concave)
Iteration 9:   log pseudolikelihood = -243934.95  (not concave)
Iteration 10:  log pseudolikelihood = -243933.34  (not concave)
Iteration 11:  log pseudolikelihood = -243933.26  
Iteration 12:  log pseudolikelihood = -243931.53  
Iteration 13:  log pseudolikelihood = -243931.37  
Iteration 14:  log pseudolikelihood = -243931.18  
Iteration 15:  log pseudolikelihood = -243931.14  
Iteration 16:  log pseudolikelihood = -243931.14  

Mixed-effects GLM                               Number of obs     =    268,092
Family:               Bernoulli
Link:                     logit

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
           v000 |         29      2,859    9,244.6     27,537
           v001 |     14,049          1       19.1         76
-------------------------------------------------------------

Integration method: mvaghermite                 Integration pts.  =         20

                                                Wald chi2(0)      =          .
Log pseudolikelihood = -243931.14               Prob > chi2       =          .
                                  (Std. Err. adjusted for 29 clusters in v000)
------------------------------------------------------------------------------
             |               Robust
         h11 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |  -2.217616   .0285623   -77.64   0.000    -2.273597   -2.161635
-------------+----------------------------------------------------------------
v000         |
   var(_cons)|   1.65e+08    7267308                      1.51e+08    1.80e+08
-------------+----------------------------------------------------------------
v000>v001    |
   var(_cons)|   1.90e+08   2.04e+07                      1.54e+08    2.34e+08
------------------------------------------------------------------------------

Code:

. meqrlogit h11 || v000:||v001:,


Refining starting values: 

Iteration 0:   log likelihood = -110014.88  
Iteration 1:   log likelihood = -109514.32  
Iteration 2:   log likelihood = -109430.66  

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -109430.66  
Iteration 1:   log likelihood = -109398.22  
Iteration 2:   log likelihood = -109387.13  
Iteration 3:   log likelihood = -109386.38  
Iteration 4:   log likelihood = -109386.36  
Iteration 5:   log likelihood = -109386.36  

Mixed-effects logistic regression               Number of obs     =    268,103

----------------------------------------------------------------------------
                |     No. of       Observations per Group       Integration
 Group Variable |     Groups    Minimum    Average    Maximum      Points
----------------+-----------------------------------------------------------
           v000 |         29      2,859    9,244.9     27,537           7
           v001 |     14,049          1       19.1         76           7
----------------------------------------------------------------------------

                                                Wald chi2(0)      =          .
Log likelihood = -109386.36                     Prob > chi2       =          .

------------------------------------------------------------------------------
         h11 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |  -1.879505   .0812511   -23.13   0.000    -2.038754   -1.720256
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
v000: Identity               |
                  var(_cons) |   .1889171   .0502624      .1121514    .3182276
-----------------------------+------------------------------------------------
v001: Identity               |
                  var(_cons) |   .3993397   .0112475      .3778925    .4220041
------------------------------------------------------------------------------
LR test vs. logistic model: chi2(2) = 8891.93             Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#10

02 Nov 2015, 08:23

Well, we've exahausted the "usual suspects" approach to this problem, or at least the suspects that I usually think of. But I do notice one thing that may be a clue. The number of observations in your successful meqrlogit model is 268,103, but in your unsuccessful melogit model it is 268,092. So, somehow, even though you have used exactly the same variables in the commands, you have lost 11 observations. Most likely these 11 observations have pweight = 0 or missing. While I doubt that 11 observations really make the difference, it makes me wonder if the pweighted distribution of your outcome variable has rare outcomes (i.e. nearly all 0 or nearly all 1) in some level 3 or level 2 units. Have you looked at the distribution of pweighted-outcomes in all of your level2 and level3 units?
Comment

Announcement