Svy: means generating different results and macros

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#16

20 Jul 2018, 14:21

There are two subpopulations in this problem: 1) primeage2 = oldage and 2) primeage2=primeage. svy: mean runs separately in each and a stratum could be singleton in one or both of the groups. Therefore I would modify the by lines in Clyde's code lines to start with:

Code:

by primeage2

Then keep those with psu_count==1 and eventually drop duplicates. I hope that Kassandra will do this calculation.

But to step back: what should be done with these problem strata? As Clyde points out, the ideal is to attach singleton strata to "similar" ones, which could be, for example, geographic neighbors. If there are many singletons, this attachment process will be a lot of work. Moreover, for each new subpopulation analyzed, the set of singleton strata will change and new attachments will be required. I think this effort is worthwhile only to answer one (or two) major study questions.

The alternative then is an automated approach. Stata offers two: singleunit(scaled) and singleunit(centered). As the manual says:

singleunit(scaled) results in a scaled version of singleunit(certainty). The scaling factor comes from using the average of the variances from the strata with multiple sampling units for each stratum with one sampling unit.

singleunit(centered) specifies that strata with one sampling unit are centered at the grand mean instead of the stratum mean.

So, scaled relies on the assumption that the unknown variances for the singleton strata are approximately equal to the average of the multiple unit strata. This assumption might be reasonable in many studies, but perhaps less so in BRFSS, where each state conducts its own study. centered is conservative and will inflate standard errors. A third approach is to reassign singleton strata to the same new stratum, which can be implemented with the two lines of code in Post #13. This method doesn't make the variance assumption of the scaled option; and 2) I surmise that it's less ikely to inflate variances as much as centered does. I could be wrong, but the two can be compared in any case.

Last edited by Steve Samuels; 20 Jul 2018, 14:49.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment

Announcement

Comment