Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Post-stratification weights - seeking advice on survey setting data from Tanzania.

    Dear forum participants,

    I am writing because we are hoping to get a quick expert glance at our Stata code for survey-setting our data from Tanzania. At our organization we have limited capacity when it comes to survey weights, and so would really appreciate if anyone experienced could let us know if our approach is reasonable.

    Some background - in our survey in Tanzania we oversampled adolescents so that they would make up 33% of the sample. This is an artificially high proportion for the 5 regions where we collected data, and so we have to correct for the weight of adolescent respondents in the dataset.

    Please see do-file attached - I have used DHS Tanzania 2015-16 age distribution by region and sex as auxiliary data, and have created 20 different weights (by interview type -- adolescent/adult --, sex --men/women--, and region -- Dar, Tabora, Iringa, Kagera, Dodoma). I have used two different methods for computing the weights:

    1. I computed postweights as per instructions in this resource shared by our data collection team: http://www.stata.com/manuals14/svy.pdf (page 54). These are not weights per se, but rather poststratum population sizes (DHS-extracted proportions of adolescents and adults multiplied by the Ns in each region of our sample, which -- since they were selected using PPS -- should be proportional to the population size). I called these mypostweights.

    2. I have computed sampling weights manually using the instructions this resource, also shared by our data collection team: https://www.atlas.illinois.edu/suppo...y-analysis.pdf. These are computed as the proportion of adolescents (of each sex and region) in the population over the proportion of adolescents in our sample (of each sex and region). Same for the adults. I called these myweights.

    As you will see in the do-file, I have tried three weighting methods -- (1) post-stratification svyset command without finite population correction using mypostweights, (2) post-stratification svyset command with finite population correction using mypostweights, and (3) simple pweights (sampling weights) using myweights. I have tested these 3 methods on 9 examples, and provided results as annotations. All methods give me almost identical results (plus/minus centesimal points due to rounding in the computation of weights - I assume). Option 2 (with fpc), as we discussed in the call, gives me more precision in the CIs, but the same point estimates.

    The one issue I run into is that I get different weighted means if I use the post-stratification surveyset command (option 1) or the pweights (option 3) when the means I am trying to weight have as a denominator only a sub-group of the sample (e.g. test examples 4, 8 and 9 in the do-file -- age at first marriage, participation in feminine tasks in household, relationship control). As I understand from the Stata user manual (the resource cited in point 2 above), the post-stratification command (option 1) will try to adjust for non-responses, and so when the subset of adolescents is very small, this command will create weights that adjust the sample to make it look like the population sample -- which will sometimes raise, and not reduce, the weight of adolescents when there are very few of them in the sub-sample.

    I would very much appreciate if you could take a look and share any thoughts. Mainly, I would be interested to hear from you about:

    (1) whether I am computing the postweights (mypostweight) for the post-stratification command correctly

    (2) whether I am understanding correctly what happens when weighting means of smaller subgroups within our sample when using the post-stratification command method

    (3) whether the sampling weights (myweights) I have computed can be reliably used as pweights (option 3) instead of the post-stratification command (option 1)

    Any input will be hugely appreciated.

    Best,
    Kristina
    Attached Files
    0
    Weighting option 1 is equivalent to weighting option 3
    0%
    0
    Weighting option 1 is NOT equivalent to weighting option 3 - option 1 is preferrable
    0%
    0
    Weighting option 1 is NOT equivalent to weighting option 3 - option 3 is preferable
    0%
    0
    Last edited by Kristina Vlahovicova; 15 Aug 2017, 13:43.

  • #2
    I don't have time to study a 500 line do file and I found your post difficult to follow. Still, I'll venture a few comments:

    1. Your weight calculations appear to be incorrect. Post-stratification techniques process design weights to produce the poststratified weights. The examples in the Stata manual (unfortunately) and in the Illinois pdf are for equally weighted data. Oversampling of adolescents means that the design weights for adolescents and adult will differ. (First stage PPS does not imply equal final weights.). I don't know enough details to be sure, but if weights would otherwise be equal, then divide the children's weights by three.

    2. I doubt that you need 22 weights. You can create a single poststratification weight that balances the sample on several dimensions at a time, for example (age, gender, region) or (age-gender, age-region, gender-region). See commands and references in this post. (Of the four commands mentioned, I've personally used only survwgt rake and ipfraking).

    3. To analyze a subpopulation, you will need the subpop option in your svy analysis commands.
    Last edited by Steve Samuels; 19 Aug 2017, 13:09.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      I apologize: your weight calculations are okay. I did take a look at your do-file, and you are apparently post-stratifying on one age-gender-region variable with 20 categories. Post-stratification is relatively robust to exact starting values as long as sample category totals are large.

      The fpc affects only standard errors, not point estimates, so the near identity of results for options 1 & 2 is expected. However: your "myweights" are not design or sampling weightsm as you say: he Illinois document makes it clear it calculates post-stratified weights. Thus I would expect 1, 2, and 3 to be nearly identical for the complete sample.

      For subgroups, however, some of your category totals are likely to be small, accounting for the lack of equivalence there. When categories are small, you can run into problems. I recommend that instead of your 20 category single dimension, you use one of the raking methods I referred to with fewer, hence larger, categories in each dimension. In problem cases, raking will work better, the closer the proportions of weights in a category are to the population proportions. So specify design weights for children that reflect the actual selection probability. You say they were over-sampled by a factor of three; down weighting by 3 should work. However, depending on the selection protocol, the actual selection probabilities for adolescents might have varied from HH to HH (1 for some HH, <1 for others). Standard errors are increased when selection probability and weights vary. So if you will get standard errors that are too small if you hide the variability by treating weights as equal or dividing children's weights by three.

      Moreover, standard errors for subgroups defined by "if" clauses will be in error, because the denominators are random. Get proper standard errors by Invoking the subpop() option, as I suggest.
      Last edited by Steve Samuels; 19 Aug 2017, 14:57.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment

      Working...
      X