OK to delete cases because of stratum with single sampling unit?

Richard Williams

Join Date: Apr 2014
Posts: 5008

OK to delete cases because of stratum with single sampling unit?

21 Feb 2018, 04:53

OK, I've read about this problem several times but now it has finally happened to me (or rather, a student of mine). Suppose you run

Code:

use http://www.stata-press.com/data/r15/nhanes2b, clear
svyset psuid [pweight=finalwgt], strata(stratid)
svy: mean hdresult

You get this:

Code:

. svy: mean hdresult
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =      31       Number of obs   =       8,720
Number of PSUs   =      60       Population size =  98,725,345
                                 Design df       =          29

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    hdresult |   49.67141          .             .           .
--------------------------------------------------------------
Note: Missing standard error because of stratum with single
      sampling unit.

One way to make the error go away is to drop the singletons:

Code:

svydescribe hdresult, gen(oneunit)
svy , subpop (if oneunit==0): mean hdresult

Code:

. svy, subpop(if oneunit==0): mean hdresult
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =      29      Number of obs   =        9,786
Number of PSUs   =      58      Population size =  109,915,685
                                Subpop. no. obs =        8,508
                                Subpop. size    =   96,086,827
                                Design df       =           29

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    hdresult |   49.63489   .3934587      48.83018    50.43961
--------------------------------------------------------------
Note: 2 strata omitted because they contain no subpopulation
      members.

But, is that a terrible way to do it? I've read that you are supposed to reassign strata, e.g. merge them with, say a neighboring area. But, at least in my student's case, it is not clear what a reasonable merge would be. So how do you decide how to merge in such cases?

In my student's case, I suspect just dropping cases will be no big deal, since only 10 cases out of 1500 are affected. But we still would like to do it the best way possible, if we can figure out how.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Tags: None

Pramma Elayaperumal

Join Date: Feb 2018
Posts: 5

18 Mar 2018, 19:28

This is a question I'm looking for the answer to as well!

I'm using the NIS data, which I don't think publishes the exact criteria for each strata, which makes it difficult to decide how to reassign singleton PSUs. Should one just delete the singleton PSUs like Mr. Williams did? Reassign them to the previous strata numerically? or the subsequent one? For example, part of my data looks like this (below). Should 2112 be merged with 2111 or 2113? I don't think NIS publishes any good information on which is better or why. Any input would be greatly appreciated!

	#Obs with	#Obs with	#Obs	per included	Unit
#Units	#Units	complete	missing	--------	------------	------
Stratum	included	omitted	data	data	min	mean	max
2111	15	6	22	15	1	1.5	3
2112	1*	2	1	12	1	1.0	1
2113	5	3	57	21	3	11.4	34

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#3

22 Mar 2018, 15:23

Sorry I missed this when I was away. I see no reason for omitting singleton strata. When there isn't sufficient information to identify "neighboring" or "similar" strata for merging, I use the singleunit(scaled) or singleunit(centered) option in svyset. The centered option is conservative and I would usually choose scaled.

Last edited by Steve Samuels; 22 Mar 2018, 15:26.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

22 Mar 2018, 16:48

I was about to say practically the same as Steve Samuels. Just as a side note: selecting ‘certainty’ for singleunit may as well tackle the issue. If I understood correctly l, the main difference is compared to the ‘centered’ option is: instead of centering at the grand mean, we take the singleunit values themselves. A couple of months ago, I faced a similar problem and I decided to choose this strategy.

Best regards,

Marcos
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#5

22 Mar 2018, 17:51

Thanks much! I never seem to encounter these svy problems myself but my students do.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

OK to delete cases because of stratum with single sampling unit?

Comment

Comment

Comment

Comment