Dear Statalisters,
I am estimating a random-intercept logistic regression of the effect of neighborhood income on a binary outcome. The data I am using are repeated cross-sections from a regional US survey that samples households as primary sampling units. There is no prior stratification or multi-stage sampling. Certain age and racial households are however oversampled, and the survey provides a sampling (balancing) weight to correct for this sampling bias. Estimating models with and without these weights produces coefficient estimates that are 7% different, on average, but that difference goes above 20% for covariates that are related to the oversampling process (e.g. dummies of years when oversampling of a certain group took place, or dummies of older age groups who're oversampled in every survey wave). When comparing the random-intercept variance across weighted and unweighted models, the difference is much larger (for example 0.62 in weighted vs 0.09 in unweighted for one of my outcome variables, an 86% difference). I can think of this as maybe reflecting the difference between the weighted and unweighted linear predictors (XB, the fixed part of the model) that are used to predict the estimates of random intercepts in either model, respectively, but I am left wondering if the variance from the two models would be as starkly different. Although this question may be specific to the substantive case I have, I am worried if the difference I am seeing is somehow due to incorrect coding or something I am doing wrong with the statistics.
Your advice is greatly appreciated.
Thanks,
Mustafa Hussein
Postdoctoral Fellow
Drexel School of Public Health
Philadelphia, PA
/********************
Unweighted Model
********************/
gllamm y inc_medhh yr2 yr3 yr4 yr5 yr6 ///
if studysam==1, i(stcotrkyr) link(logit) family(binomial) cluster(stcotrkyr) adapt eform
*OUTPUT:
number of level 1 units = 55528
number of level 2 units = 5748
Condition Number = 44.992215
gllamm model
log likelihood = -16904.222
Robust standard errors for clustered data: cluster(stcotrkyr)
-----------------------------------------------------------------------------------
y | exp(b) Std. Err. z P>|z| [95% Conf. Interval]
------------------+----------------------------------------------------------------
inc_medhh | 1.035441 .0052315 6.89 0.000 1.025238 1.045746
yr2 | 1.062166 .0539682 1.19 0.235 .9614868 1.173388
yr3 | 1.27745 .0673384 4.65 0.000 1.152058 1.416489
yr4 | 1.263517 .0661614 4.47 0.000 1.140275 1.400079
yr5 | 1.301564 .0696947 4.92 0.000 1.171889 1.445589
yr6 | 1.303302 .0716153 4.82 0.000 1.170233 1.451503
_cons | 6.886174 .3346781 39.70 0.000 6.26049 7.574389
-----------------------------------------------------------------------------------
Variances and covariances of random effects
------------------------------------------------------------------------------
***level 2 (stcotrkyr)
var(1): .08584678 (.02259006)
------------------------------------------------------------------------------
/********************
Weighted Model
********************/
gllamm y inc_medhh yr2 yr3 yr4 yr5 yr6 ///
if studysam==1, i(stcotrkyr) pweight(rescwt_y) ///
link(logit) family(binomial) adapt eform
//survey sampling weights (individuals, level-1) were rescaled according to Rabe-Hesketh & Skrondal 2006 paper, level-2 (neighborhood) weights all set = 1
*OUTPUT
number of level 1 units = 55528
number of level 2 units = 5748
Condition Number = 47.481972
gllamm model
log likelihood = -17983.648
Robust standard errors
-----------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------------+----------------------------------------------------------------
inc_medhh | .0515644 .0061971 8.32 0.000 .0394184 .0637105
yr2 | .0636029 .0600941 1.06 0.290 -.0541793 .1813851
yr3 | .1850658 .0629961 2.94 0.003 .0615957 .3085358
yr4 | .1318522 .0630226 2.09 0.036 .0083301 .2553743
yr5 | .1809153 .0647305 2.79 0.005 .0540458 .3077849
yr6 | .1653617 .0676986 2.44 0.015 .0326749 .2980485
_cons | 1.938919 .0586822 33.04 0.000 1.823904 2.053934
-----------------------------------------------------------------------------------
Variances and covariances of random effects
------------------------------------------------------------------------------
***level 2 (stcotrkyr)
var(1): .6220461 (.03799536)
------------------------------------------------------------------------------
I am estimating a random-intercept logistic regression of the effect of neighborhood income on a binary outcome. The data I am using are repeated cross-sections from a regional US survey that samples households as primary sampling units. There is no prior stratification or multi-stage sampling. Certain age and racial households are however oversampled, and the survey provides a sampling (balancing) weight to correct for this sampling bias. Estimating models with and without these weights produces coefficient estimates that are 7% different, on average, but that difference goes above 20% for covariates that are related to the oversampling process (e.g. dummies of years when oversampling of a certain group took place, or dummies of older age groups who're oversampled in every survey wave). When comparing the random-intercept variance across weighted and unweighted models, the difference is much larger (for example 0.62 in weighted vs 0.09 in unweighted for one of my outcome variables, an 86% difference). I can think of this as maybe reflecting the difference between the weighted and unweighted linear predictors (XB, the fixed part of the model) that are used to predict the estimates of random intercepts in either model, respectively, but I am left wondering if the variance from the two models would be as starkly different. Although this question may be specific to the substantive case I have, I am worried if the difference I am seeing is somehow due to incorrect coding or something I am doing wrong with the statistics.
Your advice is greatly appreciated.
Thanks,
Mustafa Hussein
Postdoctoral Fellow
Drexel School of Public Health
Philadelphia, PA
/********************
Unweighted Model
********************/
gllamm y inc_medhh yr2 yr3 yr4 yr5 yr6 ///
if studysam==1, i(stcotrkyr) link(logit) family(binomial) cluster(stcotrkyr) adapt eform
*OUTPUT:
number of level 1 units = 55528
number of level 2 units = 5748
Condition Number = 44.992215
gllamm model
log likelihood = -16904.222
Robust standard errors for clustered data: cluster(stcotrkyr)
-----------------------------------------------------------------------------------
y | exp(b) Std. Err. z P>|z| [95% Conf. Interval]
------------------+----------------------------------------------------------------
inc_medhh | 1.035441 .0052315 6.89 0.000 1.025238 1.045746
yr2 | 1.062166 .0539682 1.19 0.235 .9614868 1.173388
yr3 | 1.27745 .0673384 4.65 0.000 1.152058 1.416489
yr4 | 1.263517 .0661614 4.47 0.000 1.140275 1.400079
yr5 | 1.301564 .0696947 4.92 0.000 1.171889 1.445589
yr6 | 1.303302 .0716153 4.82 0.000 1.170233 1.451503
_cons | 6.886174 .3346781 39.70 0.000 6.26049 7.574389
-----------------------------------------------------------------------------------
Variances and covariances of random effects
------------------------------------------------------------------------------
***level 2 (stcotrkyr)
var(1): .08584678 (.02259006)
------------------------------------------------------------------------------
/********************
Weighted Model
********************/
gllamm y inc_medhh yr2 yr3 yr4 yr5 yr6 ///
if studysam==1, i(stcotrkyr) pweight(rescwt_y) ///
link(logit) family(binomial) adapt eform
//survey sampling weights (individuals, level-1) were rescaled according to Rabe-Hesketh & Skrondal 2006 paper, level-2 (neighborhood) weights all set = 1
*OUTPUT
number of level 1 units = 55528
number of level 2 units = 5748
Condition Number = 47.481972
gllamm model
log likelihood = -17983.648
Robust standard errors
-----------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------------+----------------------------------------------------------------
inc_medhh | .0515644 .0061971 8.32 0.000 .0394184 .0637105
yr2 | .0636029 .0600941 1.06 0.290 -.0541793 .1813851
yr3 | .1850658 .0629961 2.94 0.003 .0615957 .3085358
yr4 | .1318522 .0630226 2.09 0.036 .0083301 .2553743
yr5 | .1809153 .0647305 2.79 0.005 .0540458 .3077849
yr6 | .1653617 .0676986 2.44 0.015 .0326749 .2980485
_cons | 1.938919 .0586822 33.04 0.000 1.823904 2.053934
-----------------------------------------------------------------------------------
Variances and covariances of random effects
------------------------------------------------------------------------------
***level 2 (stcotrkyr)
var(1): .6220461 (.03799536)
------------------------------------------------------------------------------