Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OK to delete cases because of stratum with single sampling unit?

    OK, I've read about this problem several times but now it has finally happened to me (or rather, a student of mine). Suppose you run

    Code:
    use http://www.stata-press.com/data/r15/nhanes2b, clear
    svyset psuid [pweight=finalwgt], strata(stratid)
    svy: mean hdresult
    You get this:

    Code:
    . svy: mean hdresult
    (running mean on estimation sample)
    
    Survey: Mean estimation
    
    Number of strata =      31       Number of obs   =       8,720
    Number of PSUs   =      60       Population size =  98,725,345
                                     Design df       =          29
    
    --------------------------------------------------------------
                 |             Linearized
                 |       Mean   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
        hdresult |   49.67141          .             .           .
    --------------------------------------------------------------
    Note: Missing standard error because of stratum with single
          sampling unit.
    One way to make the error go away is to drop the singletons:

    Code:
    svydescribe hdresult, gen(oneunit)
    svy , subpop (if oneunit==0): mean hdresult
    Code:
    . svy, subpop(if oneunit==0): mean hdresult
    (running mean on estimation sample)
    
    Survey: Mean estimation
    
    Number of strata =      29      Number of obs   =        9,786
    Number of PSUs   =      58      Population size =  109,915,685
                                    Subpop. no. obs =        8,508
                                    Subpop. size    =   96,086,827
                                    Design df       =           29
    
    --------------------------------------------------------------
                 |             Linearized
                 |       Mean   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
        hdresult |   49.63489   .3934587      48.83018    50.43961
    --------------------------------------------------------------
    Note: 2 strata omitted because they contain no subpopulation
          members.
    But, is that a terrible way to do it? I've read that you are supposed to reassign strata, e.g. merge them with, say a neighboring area. But, at least in my student's case, it is not clear what a reasonable merge would be. So how do you decide how to merge in such cases?

    In my student's case, I suspect just dropping cases will be no big deal, since only 10 cases out of 1500 are affected. But we still would like to do it the best way possible, if we can figure out how.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

  • #2
    This is a question I'm looking for the answer to as well!

    I'm using the NIS data, which I don't think publishes the exact criteria for each strata, which makes it difficult to decide how to reassign singleton PSUs. Should one just delete the singleton PSUs like Mr. Williams did? Reassign them to the previous strata numerically? or the subsequent one? For example, part of my data looks like this (below). Should 2112 be merged with 2111 or 2113? I don't think NIS publishes any good information on which is better or why. Any input would be greatly appreciated!
    #Obs with #Obs with #Obs per included Unit
    #Units #Units complete missing -------- ------------ ------
    Stratum included omitted data data min mean max
    2111 15 6 22 15 1 1.5 3
    2112 1* 2 1 12 1 1.0 1
    2113 5 3 57 21 3 11.4 34

    Comment


    • #3


      Sorry I missed this when I was away. I see no reason for omitting singleton strata. When there isn't sufficient information to identify "neighboring" or "similar" strata for merging, I use the singleunit(scaled) or singleunit(centered) option in svyset. The centered option is conservative and I would usually choose scaled.
      Last edited by Steve Samuels; 22 Mar 2018, 15:26.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        I was about to say practically the same as Steve Samuels. Just as a side note: selecting ‘certainty’ for singleunit may as well tackle the issue. If I understood correctly l, the main difference is compared to the ‘centered’ option is: instead of centering at the grand mean, we take the singleunit values themselves. A couple of months ago, I faced a similar problem and I decided to choose this strategy.
        Best regards,

        Marcos

        Comment


        • #5
          Thanks much! I never seem to encounter these svy problems myself but my students do.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          Stata Version: 17.0 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment

          Working...
          X