Dear all,
I want to calculate weighted means of variable x and don't know how to combine the weights provided in the data set with post-stratification weights that I calculated on my own.
I am working with cross-sectional individual-level survey data in Stata 15.
The data set comes with two different weights: (i) a sampling design weight that account for unequal selection probabilities of the sample units (inverse of the probability to be in the sample) and (ii) calibrated weights that also consider calibration margins based on gender and regions.
Because the age distribution in the sample is not the same as the age distribution in the population, I want to further apply post-stratification weights considering the age structure (in addition to gender and region) when calculating the weighted means of x.
I know that I could calculate post-stratification weights by dividing the share of each gender-region-age group in the population (N) by the share of the same gender-region-age group in the sample (n) and then use these weights as pweights (pweight = N/n) when calculating means.
My question is: How do I combine these weights with the calibrated weights provided in the sample? Or do I need to combine them with the sampling design weights somehow?
I do have information on strata, psu and ssu - but (i) this information is missing for 1/3 of my observations and (ii) I do not know how this information relates to my problem. The information on the share of gender-region-age groups in the population (N) comes from census data.
I know this is a very specific problem, but if you could at least lead me to some applied readings on "combining sampling design weights with post-stratification weights", I would be very grateful.
Best regards,
Stephanie
*----------------------------------------------------------
I also tried the following:
1. Collapse the dataset using the calibrated weights provided in the dataset:
2. Merge shares of age groups from census data
3. Calculate means manually
But (i) I am not sure if this is a valid option and (ii) it makes it hard to compare the means with and without the post-stratification weights so I am not that happy with this approach.
I want to calculate weighted means of variable x and don't know how to combine the weights provided in the data set with post-stratification weights that I calculated on my own.
I am working with cross-sectional individual-level survey data in Stata 15.
The data set comes with two different weights: (i) a sampling design weight that account for unequal selection probabilities of the sample units (inverse of the probability to be in the sample) and (ii) calibrated weights that also consider calibration margins based on gender and regions.
Because the age distribution in the sample is not the same as the age distribution in the population, I want to further apply post-stratification weights considering the age structure (in addition to gender and region) when calculating the weighted means of x.
I know that I could calculate post-stratification weights by dividing the share of each gender-region-age group in the population (N) by the share of the same gender-region-age group in the sample (n) and then use these weights as pweights (pweight = N/n) when calculating means.
My question is: How do I combine these weights with the calibrated weights provided in the sample? Or do I need to combine them with the sampling design weights somehow?
I do have information on strata, psu and ssu - but (i) this information is missing for 1/3 of my observations and (ii) I do not know how this information relates to my problem. The information on the share of gender-region-age groups in the population (N) comes from census data.
I know this is a very specific problem, but if you could at least lead me to some applied readings on "combining sampling design weights with post-stratification weights", I would be very grateful.
Best regards,
Stephanie
*----------------------------------------------------------
I also tried the following:
1. Collapse the dataset using the calibrated weights provided in the dataset:
Code:
collapse (mean) x [pweight = calibrated weight] , by(age)
3. Calculate means manually
Code:
gen help = x * N // e.g. mean of x in age group 30-34 * the share of 30-34-year-olds in the population
egen x_mean = total(help)
drop help
But (i) I am not sure if this is a valid option and (ii) it makes it hard to compare the means with and without the post-stratification weights so I am not that happy with this approach.
Comment