Multi-level model (xtmelogit) vs. adjusting for PSU and strata (svy: logit)?

Sharon Vandivere

Join Date: Nov 2015

Posts: 1
#1

Multi-level model (xtmelogit) vs. adjusting for PSU and strata (svy: logit)?

30 Nov 2015, 09:27

What is the appropriate way to specify models that incorporate two levels of clustering (if that is the right term)? I initially used xtmelogit (level 1=child, level 2=sibling groups, level 3=counties). These are experimental data; the intervention was implemented separately in 9 counties and served children (many in sibling groups). A colleague recommended that, since I don’t care about estimating county-level impacts, xtmelogit might be overkill and I could run models simply adjusting for strata (county) and PSU (sibling group) which I then did using svy: logit. (If I understand correctly, this suggestion is also made by the authors of GLLAMM.) However, results using the two approaches differ, which makes me think either that I’m doing something wrong, or that one approach is better than the other. Can anyone advise? Thank you in advance!

Below I've provided some sample output and definitions of my key variables.

EXPER: 1=treatment, 0=control (Independent variable of interest)
MOMCLOSE: 1= good outcome, 0=bad outcome
Siteid=county identifier (level 3 id, with dummy indicators called site# )
randcid = case id/sibling group id (level 2 id)
fpcvar =(fpc, calculated per county, number of respondents divided by number of youth in the original sample)

. svyset randcid, strata (siteid) fpc(fpcvar)

pweight: <none>
VCE: linearized
Single unit: missing
Strata 1: siteid
SU 1: randcid
FPC 1: fpcvar

MODEL 1

.
. foreach var in momclose {
2. svy: logit `var' exper, or
3. }
(running logit on estimation sample)

Survey: Logistic regression

Number of strata = 9 Number of obs = 303
Number of PSUs = 263 Population size = 303
Design df = 254
F( 1, 254) = 0.21
Prob > F = 0.6451

------------------------------------------------------------------------------
| Linearized
momclose | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exper | 1.053215 .1184012 0.46 0.645 .8440495 1.314214
_cons | .6947368 .0540147 -4.68 0.000 .5961066 .8096862
------------------------------------------------------------------------------

MODEL 2
.
. foreach var in momclose {
2. svy: logit `var' exper site268 site269 site271 site272 site273 site274 sit
> e275 site276, or
3. }
(running logit on estimation sample)

Survey: Logistic regression

Number of strata = 9 Number of obs = 303
Number of PSUs = 263 Population size = 303
Design df = 254
F( 9, 246) = 6.36
Prob > F = 0.0000

------------------------------------------------------------------------------
| Linearized
momclose | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exper | 1.135454 .1281456 1.13 0.261 .9091681 1.41806
site268 | 1.169825 .282946 0.65 0.517 .7265324 1.883593
site269 | .5536779 .1434254 -2.28 0.023 .3324338 .9221662
site271 | .8257458 .1561435 -1.01 0.312 .5690086 1.198323
site272 | 1.252144 .280899 1.00 0.317 .8049818 1.947701
site273 | 1.53346 .2869998 2.28 0.023 1.060719 2.216892
site274 | .5546835 .1121758 -2.91 0.004 .3724597 .8260591
site275 | 3.282409 .9750136 4.00 0.000 1.828688 5.891772
site276 | .7522824 .1431926 -1.50 0.136 .5171112 1.094405
_cons | .6761679 .0953349 -2.78 0.006 .5122318 .8925705
------------------------------------------------------------------------------

. svyset, clear

.
MODEL 3

. xtmelogit momclose exper || siteid: || randcid: , or

Refining starting values:

Iteration 0: log likelihood = -206.31694 (not concave)
Iteration 1: log likelihood = -203.61326
Iteration 2: log likelihood = -202.51347

Performing gradient-based optimization:

Iteration 0: log likelihood = -202.51347
Iteration 1: log likelihood = -202.47848
Iteration 2: log likelihood = -202.4783
Iteration 3: log likelihood = -202.4783

Mixed-effects logistic regression Number of obs = 303

--------------------------------------------------------------------------
| No. of Observations per Group Integration
Group Variable | Groups Minimum Average Maximum Points
----------------+---------------------------------------------------------
siteid | 9 20 33.7 57 7
randcid | 263 1 1.2 4 7
--------------------------------------------------------------------------

Wald chi2(1) = 0.11
Log likelihood = -202.4783 Prob > chi2 = 0.7348

------------------------------------------------------------------------------
momclose | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exper | 1.144452 .455809 0.34 0.735 .5243044 2.49811
_cons | .5649229 .1772804 -1.82 0.069 .3054012 1.044979
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
siteid: Identity |
sd(_cons) | .2495608 .4194843 .0092555 6.729036
-----------------------------+------------------------------------------------
randcid: Identity |
sd(_cons) | 1.9571 .7909925 .8863118 4.321548
------------------------------------------------------------------------------
LR test vs. logistic regression: chi2(2) = 6.42 Prob > chi2 = 0.0404

Note: LR test is conservative and provided only for reference.

.
.
end of do-file
Tags: None
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

30 Nov 2015, 16:15

Welcome to Statalist, Sharon! It's difficult to read much of your post.. Please read FAQ 12 and repost using CODE delimiters to display commands and results.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#3

01 Dec 2015, 07:34

I managed to read your post after all, but I had to copy & paste into a text editor to do it. So, please, next time use CODE delimiters.

Thee "fpc()" in your svyset statement is invalid. To use the theory of the fpc, the sampled observations must have been selected by random numbers-i.e. they must come from a random sampling design. With a random sample, the value for the fpc option at each site is the sampling-fraction \(n/N\), or \(N\) where \(N\)is the size of the target population, and \(n\) is the size of the random sample. The value you have entered is the response rate; responders are not a random sample.

To comment further I'd like to know the actual study design, including how treatments were assigned.

Below I've abstracted the results for the three models you showed. The estimated odds ratios for each of the models are well within one-standard error of one another, and the CIs subtantially overlap.

Models 1 & Model 2 are very different. Model 1 does not adjust for site differences and would, if properly weighted, reproduce the proportion positive in treatment and control for the nine sites. Model 2 is a better model: it both stratifies on site and includes site predictors as well. The difference between unadjusted and adjusted odds ratios is a common phenomenon in epidemiology, so is not a surprise. Models 2 & 3 have similar predictor lists, so that the closeness of their odds ratios is to be expected.

Code:

---------------------------------------------------------------------------- | Linearized momclose | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval -------------+---------------------------------------------------------------- Model 1 exper | 1.053215 .1184012 0.46 0.645 .8440495 1.314214 Model 2 exper | 1.135454 .1281456 1.13 0.261 .9091681 1.41806 Model 3 exper | 1.144452 .455809 0.34 0.735 .5243044 2.49811

I would recommend the following logit model, which should give results similar to those in Model 2, but doesn't require that there was a random sampling design. It is model-based, not design based. It does, however, require that outcomes, conditional on treatment and site, were independent. With a sample design, such independence is initially induced by the random sampling, followed by randomization. However non-independence can be introduced after selection, for example, if the intervention at a site is applied to groups rather than individuals; or, if treatment is 1 to 1, there is more than one treatment provider.

I suggest that you use Stata's factor variables . I assume that you have a variable "site" with a baseline value (270?) and other values 268,269,271,272,273,274,275,276. You have enough data to look for interactions between treatment and site, so I show that model.

Code:

logit `var' exper site268 site269 site271 /// site272 site273 site274 site275 site276, vce(cluster randcid) /* or */ logit `var' exper i.site , vce(cluster randcid) logit `var' i.site##exper testparm i(268/276)#exper // test of interaction

To actually interpret these ORs, you'll need to apply margins to the model predictions. For the outcome "momclose" that you display, the predicted site positive rates range from about 28% to 76%. If observed prportions are in this range, predictions from a linear and logit models would be very close. The advantage of the linear model is that effects are in terms of differences in proportions, much easier to understand than odds ratios. The disadvantage is that CIs for proportions can extend beyond the ends of the [0,1] interval. However this doesn't usually happen to the CIs for differences. The linear models would be:

Code:

reg `var' exper i.site , vce(cluster randcid) reg `var' i.site##exper

To decide between linear & logit models, you can compare their predictions to the observed proportions.

Last edited by Steve Samuels; 01 Dec 2015, 08:10.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

Multi-level model (xtmelogit) vs. adjusting for PSU and strata (svy: logit)?

Comment

Comment