Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a subsample with predefined mean values of variables

    Hello StataList community,

    I am working with a dataset that contains two groups: "group1" and "group2." My goal is to create a smaller subsample of "group2" observations that have the same mean values of the binary variables "var1" and "var2" as found in "group1."

    It's important to note that the mean values of both "var1" and "var2" are higher for "group1" than for "group2" in the full sample. Consequently, a random subsample of "group2" is inappropriate, and I must carefully select "semi-random" observations from group2 to ensure that the means of var1 and var2 in the group2 subsample are equal to (or close to) the variable means of group1.

    (As a side note, the size of the group2 subsample should be about 1/100000 of the size of the full "group2" sample.)

    The structure of the dataset is as follows:
    Code:
    clear
    input byte(group var1 var2)
    1 1 0
    1 1 0
    1 1 1
    1 1 0
    1 1 1
    1 1 1
    1 0 1
    1 0 0
    1 0 0
    1 0 0
    2 0 0
    2 1 0
    2 0 0
    2 0 0
    2 1 0
    2 1 1
    2 0 1
    2 0 0
    2 0 0
    2 1 0
    end
    I would appreciate any guidance on how to achieve this in Stata. If you could provide me with the necessary code or steps, I would be extremely grateful.

    Thank you in advance for your assistance!

    Best regards,
    Marvin

    PS:
    I am aware of the commands
    Code:
    splitsample
    and
    Code:
    rsz
    but none of them seem to do the trick.

  • #2
    This sounds bad. There are potentially many subsamples that would do that and not sure what any statistical results would mean once you've done it.

    Here's one way, though you lose observations in group1.

    Code:
    tabstat var1 var2, by(group)
    g group1 = group==1
    cem var1 var2, tr(group1) k2k
    tabstat var1 var2 if cem_matched, by(group) stats(mean N)

    Comment


    • #3
      Thank you, George! That was exactly what I was looking for.
      And thanks for the warning - I am aware of the selection issues.

      Comment

      Working...
      X