Dear forum participants,
I am writing because we are hoping to get a quick expert glance at our Stata code for survey-setting our data from Tanzania. At our organization we have limited capacity when it comes to survey weights, and so would really appreciate if anyone experienced could let us know if our approach is reasonable.
Some background - in our survey in Tanzania we oversampled adolescents so that they would make up 33% of the sample. This is an artificially high proportion for the 5 regions where we collected data, and so we have to correct for the weight of adolescent respondents in the dataset.
Please see do-file attached - I have used DHS Tanzania 2015-16 age distribution by region and sex as auxiliary data, and have created 20 different weights (by interview type -- adolescent/adult --, sex --men/women--, and region -- Dar, Tabora, Iringa, Kagera, Dodoma). I have used two different methods for computing the weights:
1. I computed postweights as per instructions in this resource shared by our data collection team: http://www.stata.com/manuals14/svy.pdf (page 54). These are not weights per se, but rather poststratum population sizes (DHS-extracted proportions of adolescents and adults multiplied by the Ns in each region of our sample, which -- since they were selected using PPS -- should be proportional to the population size). I called these mypostweights.
2. I have computed sampling weights manually using the instructions this resource, also shared by our data collection team: https://www.atlas.illinois.edu/suppo...y-analysis.pdf. These are computed as the proportion of adolescents (of each sex and region) in the population over the proportion of adolescents in our sample (of each sex and region). Same for the adults. I called these myweights.
As you will see in the do-file, I have tried three weighting methods -- (1) post-stratification svyset command without finite population correction using mypostweights, (2) post-stratification svyset command with finite population correction using mypostweights, and (3) simple pweights (sampling weights) using myweights. I have tested these 3 methods on 9 examples, and provided results as annotations. All methods give me almost identical results (plus/minus centesimal points due to rounding in the computation of weights - I assume). Option 2 (with fpc), as we discussed in the call, gives me more precision in the CIs, but the same point estimates.
The one issue I run into is that I get different weighted means if I use the post-stratification surveyset command (option 1) or the pweights (option 3) when the means I am trying to weight have as a denominator only a sub-group of the sample (e.g. test examples 4, 8 and 9 in the do-file -- age at first marriage, participation in feminine tasks in household, relationship control). As I understand from the Stata user manual (the resource cited in point 2 above), the post-stratification command (option 1) will try to adjust for non-responses, and so when the subset of adolescents is very small, this command will create weights that adjust the sample to make it look like the population sample -- which will sometimes raise, and not reduce, the weight of adolescents when there are very few of them in the sub-sample.
I would very much appreciate if you could take a look and share any thoughts. Mainly, I would be interested to hear from you about:
(1) whether I am computing the postweights (mypostweight) for the post-stratification command correctly
(2) whether I am understanding correctly what happens when weighting means of smaller subgroups within our sample when using the post-stratification command method
(3) whether the sampling weights (myweights) I have computed can be reliably used as pweights (option 3) instead of the post-stratification command (option 1)
Any input will be hugely appreciated.
Best,
Kristina
I am writing because we are hoping to get a quick expert glance at our Stata code for survey-setting our data from Tanzania. At our organization we have limited capacity when it comes to survey weights, and so would really appreciate if anyone experienced could let us know if our approach is reasonable.
Some background - in our survey in Tanzania we oversampled adolescents so that they would make up 33% of the sample. This is an artificially high proportion for the 5 regions where we collected data, and so we have to correct for the weight of adolescent respondents in the dataset.
Please see do-file attached - I have used DHS Tanzania 2015-16 age distribution by region and sex as auxiliary data, and have created 20 different weights (by interview type -- adolescent/adult --, sex --men/women--, and region -- Dar, Tabora, Iringa, Kagera, Dodoma). I have used two different methods for computing the weights:
1. I computed postweights as per instructions in this resource shared by our data collection team: http://www.stata.com/manuals14/svy.pdf (page 54). These are not weights per se, but rather poststratum population sizes (DHS-extracted proportions of adolescents and adults multiplied by the Ns in each region of our sample, which -- since they were selected using PPS -- should be proportional to the population size). I called these mypostweights.
2. I have computed sampling weights manually using the instructions this resource, also shared by our data collection team: https://www.atlas.illinois.edu/suppo...y-analysis.pdf. These are computed as the proportion of adolescents (of each sex and region) in the population over the proportion of adolescents in our sample (of each sex and region). Same for the adults. I called these myweights.
As you will see in the do-file, I have tried three weighting methods -- (1) post-stratification svyset command without finite population correction using mypostweights, (2) post-stratification svyset command with finite population correction using mypostweights, and (3) simple pweights (sampling weights) using myweights. I have tested these 3 methods on 9 examples, and provided results as annotations. All methods give me almost identical results (plus/minus centesimal points due to rounding in the computation of weights - I assume). Option 2 (with fpc), as we discussed in the call, gives me more precision in the CIs, but the same point estimates.
The one issue I run into is that I get different weighted means if I use the post-stratification surveyset command (option 1) or the pweights (option 3) when the means I am trying to weight have as a denominator only a sub-group of the sample (e.g. test examples 4, 8 and 9 in the do-file -- age at first marriage, participation in feminine tasks in household, relationship control). As I understand from the Stata user manual (the resource cited in point 2 above), the post-stratification command (option 1) will try to adjust for non-responses, and so when the subset of adolescents is very small, this command will create weights that adjust the sample to make it look like the population sample -- which will sometimes raise, and not reduce, the weight of adolescents when there are very few of them in the sub-sample.
I would very much appreciate if you could take a look and share any thoughts. Mainly, I would be interested to hear from you about:
(1) whether I am computing the postweights (mypostweight) for the post-stratification command correctly
(2) whether I am understanding correctly what happens when weighting means of smaller subgroups within our sample when using the post-stratification command method
(3) whether the sampling weights (myweights) I have computed can be reliably used as pweights (option 3) instead of the post-stratification command (option 1)
Any input will be hugely appreciated.
Best,
Kristina
Comment