Dear all,
I am puzzled about the use of post stratification weights with svyset. I computed a designweight which is the inverse of the inclusion probability of each individual in the sample standardized to a mean of 1. Then I computed a final_weight to match a known distribution of the population with
survwgt post designweight, by(poststratum) totvar(poststratumpopulation) gen(finalweight)
I compared two options for data analysis with different results regarding SE:
svyset psu [pweight= designweight], poststrata(poststratum) postweight(poststratumpopulation) vce(linearized) singleunit(missing)
Number of strata = 1 Number of obs = 2564
Number of PSUs = 17 Population size = 24250
N. of poststrata = 14 Design df = 16
linearized
Mean Std. Err. [95% Conf. Interval]
age 20.14886 .1186727 19.89728 20.40043
svyset psu [pweight=final_weight], vce(linearized) singleunit(missing)
Number of strata = 1 Number of obs = 2564
Number of PSUs = 17 Population size = 24045.5
Design df = 16
Linearized
Mean Std. Err. [95% Conf. Interval]
age 20.14893 .2864927 19.5416 20.75627
I would love to have small standard errors, but which option is valid? Obviously there are some missings in the “age” variable here (n=20 / 0.8%)
Many thanks for attention!
Christian Meyer
I am puzzled about the use of post stratification weights with svyset. I computed a designweight which is the inverse of the inclusion probability of each individual in the sample standardized to a mean of 1. Then I computed a final_weight to match a known distribution of the population with
survwgt post designweight, by(poststratum) totvar(poststratumpopulation) gen(finalweight)
I compared two options for data analysis with different results regarding SE:
svyset psu [pweight= designweight], poststrata(poststratum) postweight(poststratumpopulation) vce(linearized) singleunit(missing)
Number of strata = 1 Number of obs = 2564
Number of PSUs = 17 Population size = 24250
N. of poststrata = 14 Design df = 16
linearized
Mean Std. Err. [95% Conf. Interval]
age 20.14886 .1186727 19.89728 20.40043
svyset psu [pweight=final_weight], vce(linearized) singleunit(missing)
Number of strata = 1 Number of obs = 2564
Number of PSUs = 17 Population size = 24045.5
Design df = 16
Linearized
Mean Std. Err. [95% Conf. Interval]
age 20.14893 .2864927 19.5416 20.75627
I would love to have small standard errors, but which option is valid? Obviously there are some missings in the “age” variable here (n=20 / 0.8%)
Many thanks for attention!
Christian Meyer
Comment