Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to randomly assign cases to groups specifying the % to assign to each group?

    Good morning,
    I'm working on a longitudinal dataset and I need to randomly assign cases to groups in the first wave. Basically, in the first wave, within each country separately (this because the distribution of cases across groups will be based on different percentages) and only at specific levels of education I have to assign the cases randomly specifying the percentage of observations I want to assign to each group.

    In this hypothetical example I want to randomly assign 60% of cases belonging to wave 1, country 110 and having education 4 to group 2; while the remaining 40% have to go in the group 3 (there will be a total of 4 groups - so 0% goes to 1 and 4):
    id wave country education group
    1 1 220 4 .
    2 2 110 4 .
    3 1 110 5 .
    4 1 110 4 2
    5 1 110 4 3
    6 1 110 4 2
    7 1 110 4 2
    8 1 110 4 3
    9 1 110 4 2
    10 1 110 4 2
    11 1 110 4 2
    12 1 110 4 3
    13 1 110 4 3
    14 1 110 4 2
    15 1 110 4 3

  • #2
    First, you'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code deimiters, Stata output, and sample data using dataex. Also, try to reduce the code you include to what is needed to demonstrate your problem. You should also have searched for the major terms in your question from the command window of Stata (findit majorterm) and in the subject index in the documentation (under index at the bottom of the list of documentation).

    You can generate a uniform distribution (g select=runiform()), then
    group=1 if select<.6
    replace group=2 if select>.6 & select<.

    I don't know what you want with the education bit, but you can add additional conditions easily.

    Comment


    • #3
      Thank you for the answer and sorry for the seemingly confused post. I'm still new on this forum. I searched for a while and saw some similar topics where runiform was suggested as solution. The main point is that as you wrote it, the command randomize the whole sample and NOT all the desired subpopulations separately. Thus, once you introduce the option "replace if", you are not saying to assign specific portions of a specific subpopulation to the group in question, but rather a random mix of everything. The solution itself was easy (I'm quite embarrassed that I didn't thought about it earlier), but I really tried to solve it without success and that's why I asked. Continuing the small and elemental example from above, it was sufficient to type:

      by country wave education, sort: gen random=runiform(0,1)
      *THEN you can proceed with replace group=X if runiform>.60 etc.
      Last edited by Dominik Balazka; 25 Apr 2017, 15:32.

      Comment

      Working...
      X