Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DHS Data - Sample weights and stratification - Nigeria 2008 and 2018

    Dear all,

    Currently I am using DHS data in combination with Stata for the first time. I intend to estimate effects employing a Difference-in-Difference estimation on Nigerian DHS data from 2008 and 2018 (Individual/women recode). In this regard I would like to know more about the right way to weigh the data and account for the stratification process.

    In literature I found that some scholars combine (append) two sets of data (DHS Year A and DHS Year B) and when running their regression account for the women's sampling weight by just including [pweight=v005]. As far as I understood from the DHS forum and manuals in this case we dont have to divide the sample weight by 1.000.000 as pweight can also handle it without doing so. My question now is whether it is that easy to just use the pweight command on the full/combined dataset as there are women from two distinct survey included whose sampling weight had been calculated for their original dataset (Year A OR Year B). Do I therefore have to reweigh the sample or is it really possible just to make use of [pweight=v005] as the data stems from different women and different year but the same country?

    In addition I am also a bit confused whether I have to account for the stratification process which in the case of Nigeria was done by states and rural/urban. Some literature accounts for that fact, others ignores the stratification process.

    Lastly, I struggle whether I have to make use of the svyset command at all when using DHS data. Again some literature just specifies the data as panel data using xtset command while others suggest svyset commands to account for the DHS survey characteristic.

    In a paper which asks similar research questions, DHS data from two years from the same country is used and the authors also employ a Diff-in-Diff estimation. First, they define the data as panel data by using xtset command and then already run their regression model only including [pweight=v005] and vce(cluster v001) at the end.

    I would really appreciate any help in order to generate the most robust results and understand DHS data better in general.
    Greetings

  • #2
    Welcome to Stata list. You will increase your chances of useful answer by following the FAQ on asking questions-provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Remember that we are not generally from your research area so many of us would not know what DHS data means. You increase your chances of a useful answer by making sure you describe the problem in general statistical terms. We don't even know precisely what estimator you intend to use.

    The question of how you should weight your sample really is a matter of the details of the data collection which we really are not expert on. There is a set of survey tools in Stata that are designed for this kind of problem. I don't work with them I can't help you much more on that.

    It is not clear to me what the panel would be in a panel estimator here. Since you don't have the same individuals necessarily in both surveys, the panel can't be the individual.

    Comment


    • #3
      Dear Phil, Thank you for your reply. I hoped to catch the attention of people who are familiar with DHS data and sampling by indicating it in the headline thats why I did not go into detail about sampling methods etc.

      The data I am using is from two different years in one country (2008 and 2013). The individuals (women 15-49 years old) are not the same. The sample weights which are included in the DHS data are normalized/calculated by the DHS team for the women in each dataset. I guess my question is more general whether one has to reweigh sampling weights if you have data from one country but two years and two different groups of individuals. (no matter if DHS data or other data). I was just unsure how authors could just make use of the included sample weight (in DHS case: v005 (individual sampling weight) as they combine two sets of data from different years but do not reweigh or denormalize sampling weights.

      Maybe this cleared it up a bit as I do not yet started "coding" in Stata (I want to get the weight rights before estimating anything) otherwise I hope someone familiar with DHS data could help.
      Thanks

      Comment


      • #4
        Dear,

        I am working on Couples fertility preference and associated factors using EDHS 2016 couples data with dyadic Multilevel logistic regression. In this data, there are two weights (Women individual sample weight and men individual sample weight). My question is that is it must to use sample weight for a dyadic (couple) data analysis? If yes which one is appropriate to use?

        Comment

        Working...
        X