Hi,
I'm looking for assistance in correctly specifying a random effects logistic model using complex survey data.
I am using Demographic and Health Survey data which is collected in a two-stage cluster design. Clusters are selected within strata using probability proportional to size and households are then randomly selected within clusters. The data of interest are collected from individuals within these households. The only available weight variable is for households (all individuals within a household are assigned the same weight). DHS does not produce cluster or strata-level weights. A final complication is that I am trying to run my model with pooled data from 29 surveys (147,175 observations).
The end goal is to adjust for the random effect of the survey (n=29). It is unclear to me how to properly svyset my data to account for the pooled data as well as for the survey design with the limitations of the available weight variables.
For any individual survey I would normally use:
In a melogit model I am having trouble figuring out the proper svyset command. I want to be able to account for random effects of the survey in my pooled model. For example:
This code produces an error: "too many weight variables svyset; there are more svyset weight variables than levels specified in the model. an error occurred when svy executed melogit"
If I run instead:
I get the following error:
"numerical overflow;
You have attempted something that, in the midst of the
necessary calculations, has resulted in something too large
for Stata to deal with accurately. Most commonly, this is
an attempt to estimate a model (say with regress) with more
than 2,147,483,647 effective observations. This effective
number could be reached with far fewer observations if you
were running a frequency-weighted model."
If I group the survey and strata:
or if I group survey and psu:
then the model converges and gives output. For example:

I'm just uncertain that either of these provides the proper specification of the survey design.
Advice would be most welcome.
I'm looking for assistance in correctly specifying a random effects logistic model using complex survey data.
I am using Demographic and Health Survey data which is collected in a two-stage cluster design. Clusters are selected within strata using probability proportional to size and households are then randomly selected within clusters. The data of interest are collected from individuals within these households. The only available weight variable is for households (all individuals within a household are assigned the same weight). DHS does not produce cluster or strata-level weights. A final complication is that I am trying to run my model with pooled data from 29 surveys (147,175 observations).
The end goal is to adjust for the random effect of the survey (n=29). It is unclear to me how to properly svyset my data to account for the pooled data as well as for the survey design with the limitations of the available weight variables.
For any individual survey I would normally use:
Code:
svyset psu [pw=hhwgt], strata(strata) svy: logistic outcome exposure a b c d, or
Code:
svyset survey || psu, strata(strata) || _n, weight(hhwgt) svy: melogit outcome exposure a b c d, or || survey:
If I run instead:
Code:
svy: melogit outcome exposure a b c d, or || survey: || psu:
"numerical overflow;
You have attempted something that, in the midst of the
necessary calculations, has resulted in something too large
for Stata to deal with accurately. Most commonly, this is
an attempt to estimate a model (say with regress) with more
than 2,147,483,647 effective observations. This effective
number could be reached with far fewer observations if you
were running a frequency-weighted model."
If I group the survey and strata:
Code:
egen stratagroup=group(survey strata) svyset stratagroup || _n, weight(hhwgt) svy: melogit outcome exposure a b c d, or || stratagroup:
Code:
egen psugroup=group(survey psu) svyset psugroup || _n, weight(hhwgt) svy: melogit outcome exposure a b c d, or || psugroup:
I'm just uncertain that either of these provides the proper specification of the survey design.
Advice would be most welcome.
Comment