Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Complex Survey Weights

    I am using two waves (2007 and 2013/14) of the Demographic and Health Survey (DHS) data, which uses a multistage stratified sampling design. Since the multi-stage cluster deviates from Simple Random Sampling assumptions I need to weight the data to provide unbiased estimates as well as apply a stratification and cluster adjustment to correct the standard errors. The weights are provided by the DHS and specific to the recode files.

    The guidance from the DHS is straightforward for cross-sections. My study cohort includes 52,656 children born between 2000-2013. This cross-section of births was then expanded into a panel dataset at the child-month level, with each child observed for up to 59 months.The outcome variable is binary (0/1 mortality).

    I've applied the following adjustments (later using svy), but think I am not correctly weighting the data to account for multiple observations by the same child. I know this wasn't possible in Stata v13, but think it is possible in Stata v14. Any help would be greatly appreciated.

    Code:
    gen wt= v005/1000000
    egen strata=group(v000 v025 ADM1_CODE) // strata also includes the survey (identified by v000) in the group command
    egen v001r = group(v000 v001) // cluster also includes the survey in the group command
    svyset v001r [pw=wt], strata(strata) singleunit(centered)
    
    DATA DICTIONARY 
    Variable Description
    v005 womens individual sample weight
    v000 country code and survey phase
    v025 urban/rural
    ADM1_CODE geographic province
    v001 survey cluster
    id child level unique id

  • #2
    It's hard to answer this without more information on your analysis plans. You say the data were "expanded into a panel dataset." Do you really mean that each of 52,656 children were observed monthly for nearly five years? Or do you mean that you have, in effect, discrete time survival data, that is, one record per child per month coded 0 for mortality unless and until the child dies? If you are going to be doing survival analysis you probably only need the base weight.

    As to your construction of survey design variables, it's hard to give advice without knowing more about the actual design. For example, what is "survey phase?
    Richard T. Campbell
    Emeritus Professor of Biostatistics and Sociology
    University of Illinois at Chicago

    Comment


    • #3
      The past couple of months I have been reviewing the literature around DHS survey weights as well as weighting in general. In particular, the paper What are we weighting for? by Solon, Haider, and Wooldridge http://www.nber.org/papers/w18859 has been incredibly helpful. I keep going back to Why and How I'm weighting. However, I'm still not convinced I have my regression weighted and adjusted correctly and need additional clarification.

      To follow up on my post from July, I have two cross-sections (2007 and 2013/14) of a birth history. Since I know when the child was born and when the child dies, I can expand it to follow the kids up to 5 years. I can also include time-varying spatial data (like disease risk). The "fail" variable is, indeed, 0 in the month the kid is living and 1 in the month he dies.

      My concern is that I'm using a repeated weight for the kid, sometimes up to 59 times. If the weight is to adjust for the probability a kid will appear I'm not sure that I can use the weight provided repeatedly. The overwhelming guidance in the literature is to weight the data. Is there a way to tell Stata that I have this weight repeated in my sample? I don't think there is a way for me to create a new the weight that would adjust for the repeated observations without access to the sampling frame. I do appreciate any guidance.

      Comment

      Working...
      X