Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to sample with unequal probability in a given population

    Hi Statalist experts,

    Suppose I have data on a population of N subjects, each has X1-X5 and and variable p specifying its sampling probability. Each subject has an unequal probability of being selected. Supposed I would like to select a sample of size n taking into account the sampling probability, how would I do that? I am looking for a function that parallels the "sample(x, size, replace = , prob = )" function in R.

    Thank you in advance!

    Best,

    Eva

  • #2
    Originally posted by Eva Li View Post
    I am looking for a function that parallels the "sample(x, size, replace = , prob =)" function in R.
    Maybe
    Code:
    help sample
    if replace=False, and
    Code:
    help bsample
    if replace=True?

    The first helpfile shows an example of unequal probability sampling.

    Comment


    • #3
      I don't know how to do this with -sample-, but the user-written -gsample- available at SSC can do it, I think, via weights, something like this:
      Code:
      gsample 50 [weight = p], wor gen(mysample)  // Sample of 50 w/o replacement, probability of selection proportional to p

      Comment


      • #4
        Originally posted by Mike Lacy View Post
        I don't know how to do this with -sample-
        Well, according to the helpfile, sample if/in . . . leaves everything else alone, and so you can cull the other observations at your convenience. So, wouldn't it would be something like the following?

        .ÿ
        .ÿversionÿ15.1

        .ÿ
        .ÿclearÿ*

        .ÿ
        .ÿsetÿseedÿ`=strreverse("1490322")'

        .ÿ
        .ÿinputÿbyteÿ(grpÿpct)

        ÿÿÿÿÿÿÿÿÿÿgrpÿÿÿÿÿÿÿpct
        ÿÿ1.ÿ1ÿ10
        ÿÿ2.ÿ2ÿÿ5
        ÿÿ3.ÿ3ÿ25
        ÿÿ4.ÿend

        .ÿ
        .ÿquietlyÿexpandÿ100

        .ÿgenerateÿdoubleÿthingyÿ=ÿrnormal()

        .ÿ
        .ÿtabulateÿgrp

        ÿÿÿÿÿÿÿÿgrpÿ|ÿÿÿÿÿÿFreq.ÿÿÿÿÿPercentÿÿÿÿÿÿÿÿCum.
        ------------+-----------------------------------
        ÿÿÿÿÿÿÿÿÿÿ1ÿ|ÿÿÿÿÿÿÿÿ100ÿÿÿÿÿÿÿ33.33ÿÿÿÿÿÿÿ33.33
        ÿÿÿÿÿÿÿÿÿÿ2ÿ|ÿÿÿÿÿÿÿÿ100ÿÿÿÿÿÿÿ33.33ÿÿÿÿÿÿÿ66.67
        ÿÿÿÿÿÿÿÿÿÿ3ÿ|ÿÿÿÿÿÿÿÿ100ÿÿÿÿÿÿÿ33.33ÿÿÿÿÿÿ100.00
        ------------+-----------------------------------
        ÿÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿÿ300ÿÿÿÿÿÿ100.00

        .ÿ
        .ÿ*
        .ÿ*ÿBeginÿhere
        .ÿ*
        .ÿquietlyÿlevelsofÿgrp,ÿlocal(grps)

        .ÿforeachÿgrpÿofÿlocalÿgrpsÿ{
        ÿÿ2.ÿÿÿÿÿÿÿÿÿsummarizeÿpctÿifÿgrpÿ==ÿ`grp',ÿmeanonly
        ÿÿ3.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ_newline(1)ÿ"Groupÿ`grp'",ÿ"Randomÿ`r(mean)'%ÿsample"
        ÿÿ4.ÿÿÿÿÿÿÿÿÿsampleÿ`r(mean)'ÿifÿgrpÿ==ÿ`grp'
        ÿÿ5.ÿ}

        Groupÿ1ÿRandomÿ10%ÿsample
        (90ÿobservationsÿdeleted)

        Groupÿ2ÿRandomÿ5%ÿsample
        (95ÿobservationsÿdeleted)

        Groupÿ3ÿRandomÿ25%ÿsample
        (75ÿobservationsÿdeleted)

        .ÿ
        .ÿtabulateÿgrp

        ÿÿÿÿÿÿÿÿgrpÿ|ÿÿÿÿÿÿFreq.ÿÿÿÿÿPercentÿÿÿÿÿÿÿÿCum.
        ------------+-----------------------------------
        ÿÿÿÿÿÿÿÿÿÿ1ÿ|ÿÿÿÿÿÿÿÿÿ10ÿÿÿÿÿÿÿ25.00ÿÿÿÿÿÿÿ25.00
        ÿÿÿÿÿÿÿÿÿÿ2ÿ|ÿÿÿÿÿÿÿÿÿÿ5ÿÿÿÿÿÿÿ12.50ÿÿÿÿÿÿÿ37.50
        ÿÿÿÿÿÿÿÿÿÿ3ÿ|ÿÿÿÿÿÿÿÿÿ25ÿÿÿÿÿÿÿ62.50ÿÿÿÿÿÿ100.00
        ------------+-----------------------------------
        ÿÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿÿÿ40ÿÿÿÿÿÿ100.00

        .ÿ
        .ÿexit

        endÿofÿdo-file


        .

        Comment

        Working...
        X