Hi. I am using the Demographic and Health Survey (DHS) dataset to examine the double burden of malnutrition in mother-child pairs. Based on the two-stage complex sample design used by DHS, i originally set the survey design in stata as such:
Where sample_weight is the mother's individual sample weight, v021 is the primary sampling unit (enumeration areas/EAs) and v023 is the stratification used in sample design. I am investigating independent variables on the child, mother, and household level. As I am investigating child-mother pairs, I assume there to be no multilevel effects at the mother level. However, as there are cases of multiple mother-child pairs within a household, I am trying to account for potential household-level effects for variables such as household wealth and household source of water. Within the DHS dataset, clusters (EAs) are selected in the first stage of sampling, and households are selected from each cluster in the second stage. My supervisor suggested clustering the standard errors at the household level. In order to do this, I was thinking of setting the survey design as the following, where v002 is the household number.
Would this be sufficient to account for household-level variations?
The logit I am trying to estimate is:
Where DBM is the binary outcome of whether a double burden is present or not, and the independent variables are a variety of child, mother, and household level variables. I have never worked with multilevel or complex design data before so apologies if this seems trivial or if I have made a mistake.
Code:
svyset v021 [pw=sample_weight] strata(v023)
Code:
svyset v021 [pweight=sample_weight], strata(v023) vce(linearized) singleunit(missing) || v002, strata(v023)
The logit I am trying to estimate is:
Code:
svy: logit DBM i.b4 i.age_child i.sizeatbirth i.v013 i.v106 c.v218 i.short_structure i.v394 c.v453 i.v463a i.brfeeding c.brsfed_time i.v130 i.v190 i.v025 c.HHhead_age i.v113 i.v714 i.v701 i.v501 i.v151 i.v116 i.v131 c.v212 i.v467b i.m3b i.v401 c.m14 i.v460, or
