Negative lower confidence interval boundary for variance with nlcom after a multilevel logistic regression

manuel dewez

Join Date: Feb 2020
Posts: 8

Negative lower confidence interval boundary for variance with nlcom after a multilevel logistic regression

08 Mar 2021, 04:52

Hello,

I am conducting a multilevel logistic regression to identify determinants of the availability of diagnostic tests.
I am using a multilevel approach because I want to examine the effects of medical surgeries’ characteristics (level 1) and the effects of country (level 2) on the availability of the test in the surgery.
I am using 5 surgery explanatory variables (three are binary variables and two are continuous variables) and two country explanatory variables (both categorical). I have 847 observations (i.e. 847 surgeries) and 19 countries. The continuous variables are centered to the mean.

I am using the nlcom command to obtain the country variance after the melogit command.
My problem is that I obtain a negative lower boundary for the 95 %CI of the variance, but I understand the variance cannot be negative.

Does anyone have a suggestion about why could this happen?
Is there another command in stata instead of nlcom that I could use to avoid having negative boundaries for variance?

Thank you !

Code:

melogit dipstick_available_for_use ib2.public_private2 ib2.group_solo2 centered_distance_lab99 centered_pc_turna_time3 i.bloods_taker3 ib3.expend_capita_quart i.source_financing2 [pweight=scaled_pweight] || country:

Code:

Mixed-effects logistic regression               Number of obs     =        847
Group variable:         country                 Number of groups  =         19

                                                Obs per group:
                                                              min =          3
                                                              avg =       44.6
                                                              max =        162

Integration method: mvaghermite                 Integration pts.  =          7

                                                Wald chi2(10)     =      62.55
Log pseudolikelihood = -223.54672               Prob > chi2       =     0.0000
                                                    (Std. Err. adjusted for 19 clusters in country)
---------------------------------------------------------------------------------------------------
                                  |               Robust
       dipstick_available_for_use |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------------------------+----------------------------------------------------------------
                  public_private2 |
                         Private  |   .6230582   1.365527     0.46   0.648    -2.053326    3.299443
                                  |
                      group_solo2 |
Solo practice. I am the only h..  |   .4136781   .5851945     0.71   0.480    -.7332821    1.560638
                                  |
          centered_distance_lab99 |   -.036124    .031088    -1.16   0.245    -.0970553    .0248072
          centered_pc_turna_time3 |   .1929808   .1399966     1.38   0.168    -.0814074     .467369
                  3.bloods_taker3 |  -.8310664   .6179098    -1.34   0.179    -2.042147    .3800147
                                  |
              expend_capita_quart |
                              0-  |  -3.333408   1.727199    -1.93   0.054    -6.718656    .0518407
                           2400-  |  -1.755793   1.432835    -1.23   0.220    -4.564098    1.052511
                           2900-  |    -2.2454   1.490375    -1.51   0.132     -5.16648     .675681
                                  |
                source_financing2 |
                             gov  |   1.308304   .7677665     1.70   0.088     -.196491    2.813098
                         vhi_oop  |   1.075064   1.279517     0.84   0.401    -1.432742    3.582871
                                  |
                            _cons |   4.341836   1.405842     3.09   0.002     1.586436    7.097237
----------------------------------+----------------------------------------------------------------
country                           |
                        var(_cons)|    1.22608   1.076046                      .2195218    6.847939
---------------------------------------------------------------------------------------------------

Code:

nlcom _b[/var(_cons[country])], cformat(%9.3f)

Code:

       _nl_1:  _b[/var(_cons[country])]

------------------------------------------------------------------------------
dipsti~r_use |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _nl_1 |      1.226      1.076     1.14   0.255       -0.883       3.335
------------------------------------------------------------------------------

Tags: None

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

08 Mar 2021, 18:22

Originally posted by manuel dewez View Post

...
I am using the nlcom command to obtain the country variance after the melogit command.
My problem is that I obtain a negative lower boundary for the 95 %CI of the variance, but I understand the variance cannot be negative.
...

Code:

... --------------------------------------------------------------------------------------------------- | Robust dipstick_available_for_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------------------------------+---------------------------------------------------------------- public_private2 | Private | .6230582 1.365527 0.46 0.648 -2.053326 3.299443 | group_solo2 | Solo practice. I am the only h.. | .4136781 .5851945 0.71 0.480 -.7332821 1.560638 | centered_distance_lab99 | -.036124 .031088 -1.16 0.245 -.0970553 .0248072 centered_pc_turna_time3 | .1929808 .1399966 1.38 0.168 -.0814074 .467369 3.bloods_taker3 | -.8310664 .6179098 -1.34 0.179 -2.042147 .3800147 | expend_capita_quart | 0- | -3.333408 1.727199 -1.93 0.054 -6.718656 .0518407 2400- | -1.755793 1.432835 -1.23 0.220 -4.564098 1.052511 2900- | -2.2454 1.490375 -1.51 0.132 -5.16648 .675681 | source_financing2 | gov | 1.308304 .7677665 1.70 0.088 -.196491 2.813098 vhi_oop | 1.075064 1.279517 0.84 0.401 -1.432742 3.582871 | _cons | 4.341836 1.405842 3.09 0.002 1.586436 7.097237 ----------------------------------+---------------------------------------------------------------- country | var(_cons)| 1.22608 1.076046 .2195218 6.847939 ---------------------------------------------------------------------------------------------------

Code:

nlcom _b[/var(_cons[country])], cformat(%9.3f)

Code:

_nl_1: _b[/var(_cons[country])] ------------------------------------------------------------------------------ dipsti~r_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _nl_1 | 1.226 1.076 1.14 0.255 -0.883 3.335 ------------------------------------------------------------------------------

The melogit model already produces an estimate for the variance of the random effects: 1.22608.

As you're aware, most of the parameters in the results table are averages, e.g. what's the average effect of being in a solo practice on the log-odds (which is related to probability) of having a diagnostic test available. The random effects parameters differ slightly: those are estimates of the variance of the random effects, on their raw scale. So, if you are actually interested in the variance of the county-level random effects, you already have it.

Moreover, the coefficients table already presented a 95% confidence interval for the variance. It does not cross zero. Moreover, if you look carefully, it's asymmetric. If you do the math on all the betas (keeping in mind that they're in log-odds units), you'll see that they're symmetric. (The same will not hold if you had exponentiated them to get odds ratios, which are inherently positive.)

Now, you might be wondering how certain we are about that estimate of the variance of the random effects. More specifically, is that variance so small that we can ignore it, in a statistical sense? If the 95% CI in the results table included 0, that's one indicator that the estimated variance of the random effects is pretty small (or that our uncertainty about the magnitude of that variance is pretty large). Another indicator is that many of the mixed effects models present a likelihood ratio test at the bottom of the table. If one was supplied, you didn't copy it in your code, but it's a test comparing this model to an identical model without a random intercept. If that test rejects, it's a potential sign that the random effects model explains the data better than an equivalent model without a random intercept.

Instead, you chose to use nlcom on the estimated beta for the variance of the random intercept. It seems like you may now be focusing on the variance of the (variance of the random intercept). If I'm reading you correctly, this is not what you wanted to know, because you already had the variance of the random intercept. Now, you did notice that the standard error (which is the standard deviation of the sample mean - that is, the uncertainty around our betas) of that random intercept is identical in nlcom to the results table, yet the CIs differ. I do know that nlcom calculates its SEs using a different method. I mentioned that in the results table, the CI is asymmetric. If you go to the nlcom result and you take 1.96 * the standard error +/- the beta, you'll replicate the CI presented by nlcom. And yet, we know that can't be correct. In this case, you already had the correct CI in the results table.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
manuel dewez

Join Date: Feb 2020

Posts: 8
#3

09 Mar 2021, 02:52

Thank you Wiewen,

The model doesn’t provide a LRT, I think because the LR test is not possible for models with robust vce? And I think the model provides robust vce because I am using a pweight? (Do you know how to do a LRT test in this situation, shall I use the test command?)

If I understand correctly, nlcom is just applying an algebraic function to the variance coefficient (variance coeff ±1.96*SE), while the CI in the results table with melogit uses another method, and I could rely on the latter.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#4

09 Mar 2021, 04:10

Originally posted by manuel dewez View Post

If I understand correctly, nlcom is just applying an algebraic function to the variance coefficient (variance coeff ±1.96*SE), while the CI in the results table with melogit uses another method, and I could rely on the latter.

I don't see it in the documentation, but the estimation metric for the variance is in the logarithmic transform. So, you need to work in that. You can still use -nlcom-, but you'll need to do a little dipsy-doodle in order to get the confidence bounds that are displayed in the regression output table from -melogit-. See below. Begin at the "Begin here" comment; the stuff above it is just setup (creating an artificial example for illustration).

.ÿ
.ÿversionÿ16.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1596727")'

.ÿ
.ÿquietlyÿsetÿobsÿ19

.ÿgenerateÿbyteÿcidÿ=ÿ_n

.ÿgenerateÿdoubleÿcid_uÿ=ÿrnormal()

.ÿ
.ÿgenerateÿdoubleÿwgt2ÿ=ÿ1

.ÿ
.ÿforvaluesÿiÿ=ÿ1/2ÿ{
ÿÿ2.ÿÿÿÿÿgenerateÿbyteÿpr`i'ÿ=ÿruniformint(1,ÿ5)
ÿÿ3.ÿ}

.ÿ
.ÿquietlyÿexpandÿ45

.ÿforvaluesÿiÿ=ÿ3/4ÿ{
ÿÿ2.ÿÿÿÿÿgenerateÿbyteÿpr`i'ÿ=ÿruniform()ÿ<ÿ0.5
ÿÿ3.ÿ}

.ÿforvaluesÿiÿ=ÿ5/7ÿ{
ÿÿ2.ÿÿÿÿÿgenerateÿdoubleÿpr`i'ÿ=ÿruniform(-0.5,ÿ0.5)
ÿÿ3.ÿ}

.ÿ
.ÿgenerateÿdoubleÿwgt1ÿ=ÿruniform()

.ÿ
.ÿgenerateÿbyteÿoutÿ=ÿrbinomial(1,ÿinvlogit(cid_u))

.ÿ
.ÿmelogitÿoutÿi.(pr1-pr4)ÿc.(pr5-pr7)ÿ[pweight=wgt1]ÿ||ÿcid:ÿ,ÿpweight(wgt2)ÿnolog

Mixed-effectsÿlogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ855
Groupÿvariable:ÿÿÿÿÿÿÿÿÿÿÿÿÿcidÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿÿ19

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿ45
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿ45.0
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿ45

Integrationÿmethod:ÿmvaghermiteÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿIntegrationÿpts.ÿÿ=ÿÿÿÿÿÿÿÿÿÿ7

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(12)ÿÿÿÿÿ=ÿÿÿÿÿÿÿÿÿÿ.
Logÿpseudolikelihoodÿ=ÿ-269.85907ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿÿÿÿÿÿ.
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Std.ÿErr.ÿadjustedÿforÿ19ÿclustersÿinÿcid)
------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRobust
ÿÿÿÿÿÿÿÿÿoutÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿpr1ÿ|
ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿ-.4838884ÿÿÿÿ.853293ÿÿÿÿ-0.57ÿÿÿ0.571ÿÿÿÿ-2.156312ÿÿÿÿ1.188535
ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿ-.3038654ÿÿÿ.5488734ÿÿÿÿ-0.55ÿÿÿ0.580ÿÿÿÿ-1.379638ÿÿÿÿ.7719068
ÿÿÿÿÿÿÿÿÿÿ4ÿÿ|ÿÿ-.0028527ÿÿÿÿ.863401ÿÿÿÿ-0.00ÿÿÿ0.997ÿÿÿÿ-1.695088ÿÿÿÿ1.689382
ÿÿÿÿÿÿÿÿÿÿ5ÿÿ|ÿÿÿ.4597574ÿÿÿ.4421482ÿÿÿÿÿ1.04ÿÿÿ0.298ÿÿÿÿ-.4068371ÿÿÿÿ1.326352
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿpr2ÿ|
ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿ-.8195069ÿÿÿÿÿ.84544ÿÿÿÿ-0.97ÿÿÿ0.332ÿÿÿÿ-2.476539ÿÿÿÿ.8375251
ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿ-.5659286ÿÿÿ.7779242ÿÿÿÿ-0.73ÿÿÿ0.467ÿÿÿÿ-2.090632ÿÿÿÿ.9587748
ÿÿÿÿÿÿÿÿÿÿ4ÿÿ|ÿÿ-1.506734ÿÿÿ.7929109ÿÿÿÿ-1.90ÿÿÿ0.057ÿÿÿÿÿ-3.06081ÿÿÿÿ.0473432
ÿÿÿÿÿÿÿÿÿÿ5ÿÿ|ÿÿÿ-.965058ÿÿÿ.6423658ÿÿÿÿ-1.50ÿÿÿ0.133ÿÿÿÿ-2.224072ÿÿÿÿ.2939558
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿ1.pr3ÿ|ÿÿÿ.0804555ÿÿÿ.2114218ÿÿÿÿÿ0.38ÿÿÿ0.704ÿÿÿÿ-.3339237ÿÿÿÿ.4948347
ÿÿÿÿÿÿÿ1.pr4ÿ|ÿÿÿ.0584114ÿÿÿ.1643895ÿÿÿÿÿ0.36ÿÿÿ0.722ÿÿÿÿ-.2637861ÿÿÿÿÿ.380609
ÿÿÿÿÿÿÿÿÿpr5ÿ|ÿÿÿ.2882664ÿÿÿÿ.406448ÿÿÿÿÿ0.71ÿÿÿ0.478ÿÿÿÿÿ-.508357ÿÿÿÿÿ1.08489
ÿÿÿÿÿÿÿÿÿpr6ÿ|ÿÿ-.5911282ÿÿÿ.2330431ÿÿÿÿ-2.54ÿÿÿ0.011ÿÿÿÿ-1.047884ÿÿÿÿ-.134372
ÿÿÿÿÿÿÿÿÿpr7ÿ|ÿÿÿ.4611187ÿÿÿ.2715873ÿÿÿÿÿ1.70ÿÿÿ0.090ÿÿÿÿ-.0711827ÿÿÿÿÿÿ.99342
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.9103593ÿÿÿ.8161172ÿÿÿÿÿ1.12ÿÿÿ0.265ÿÿÿÿ-.6892011ÿÿÿÿÿ2.50992
-------------+----------------------------------------------------------------
cidÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿvar(_cons)|ÿÿÿ.2075716ÿÿÿ.1403586ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.0551558ÿÿÿÿ.7811691
------------------------------------------------------------------------------

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿquietlyÿnlcomÿln(_b[/var(_cons[cid])])

.ÿ
.ÿdisplayÿinÿsmclÿasÿtextÿ"95%ÿCIÿ=ÿ["ÿ///
>ÿÿÿÿÿÿÿÿÿasÿresultÿ%9.7fÿexp(r(b)[1,ÿ1]ÿ+ÿinvnormal(0.025)ÿ*ÿsqrt(r(V)[1,ÿ1]))ÿ///
>ÿÿÿÿÿÿÿÿÿ",ÿ"ÿ%9.7fÿexp(r(b)[1,ÿ1]ÿ+ÿinvnormal(0.975)ÿ*ÿsqrt(r(V)[1,ÿ1]))ÿ"]"
95%ÿCIÿ=ÿ[0.0551558,ÿ0.7811691]

.ÿ
.ÿexit

endÿofÿdo-file

.
1 like
Comment
manuel dewez

Join Date: Feb 2020

Posts: 8
#5

09 Mar 2021, 04:42

Thanks a lot Joseph,

I used your code and it provides the same 95% ci for then country variance with nlcom and melogit.

I need to calculate a statistic which is called the median odds ratio (MOR), which is the median value of the distribution of ORs obtained when randomly picking two individuals with the same covariate values from two different countries, and comparing the one with some characteristics of the country (eg expenditure per capita) to the one from another category of this country characteristic. In simple terms, the MOR can be interpreted as the median increased odds of reporting the outcome if an individual moves to another country with a different health expenditure. The MOR is calculated as:

MOR = exp (√2 x country variance x the inverse cumulative standard normal distribution function x (0.75))

or in stata coding (with nlcom to get the 95%CI of the MOR) :

Code:

nlcom exp(sqrt(2*_b[/var(_cons[country])]*invnormal(0.75))), cformat(%9.2f)

Would it be valid to use your suggestion on this equation too?
Comment

Announcement

Negative lower confidence interval boundary for variance with nlcom after a multilevel logistic regression

Comment

Comment

Comment

Comment