Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Take a random sample of unique values of a variable

    Hello,

    How can I take a sample of unique values of a variable, and then filter my dataset according to those drawn values?

    That is, assume my variable product contains multiple product identifiers (strings). More than one row of the data is attached to a given product, and different products appear in different frequencies in the data.

    (therefore, simply taking a sample of the data itself won't suffice).

    How can I take a sample of size N of the product identifiers and then filter the data to keep observations linked to those identifiers that were drawn?

    I tried using something with
    Code:
    levelsof product, local(unique_values)
    but didn't know how to use the sample command with a local

    Thank you

    Last edited by Arthur Carvalho Brito; 27 Aug 2023, 21:45. Reason: sample

  • #2
    Maybe something along the following lines.
    Code:
    frame copy default Selection
    frame Selection {
        contract product
        quietly sample N, count
    }
    quietly frlink m:1 product, frame(Selection)
    quietly keep if !mi(Selection)
    You'll need to (i) set the seed in order to help assure reproducibility and (ii) change N above to the sample size you want.

    Comment


    • #3
      Please see Section 2 of https://journals.sagepub.com/doi/pdf...867X0800800408 for a recommendation to use the term distinct, not unique, for the meaning intended here.

      Comment

      Working...
      X