Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • randomselect

    Hello,

    I am trying to randomly select a subsample of participants from my data set. I found the command randomselect useful in this sense, but I don't know how to set seed in my syntax so that the randomly selected observations are the same during subsequent runs of the do file.

    Basically, I want to select two groups based on the following characteristics:

    Group 1: N=3000, smokers, 50% female, aged 50-80
    Group 2: N=3000, non smokers, 50% female, aged 20-80

    Here is my syntax (with the seed command integrated but not working as expected):


    Code:
    randomselect if smoking == 1 & gender == 1, gen(sample_1) n(1500) seed(7492001)
    
    randomselect if smoking == 1 & gender == 0 & sample_1 != 1, gen(sample_2) n(1500) seed(7492001)
    
    randomselect if smoking == 0 & gender == 1, gen(sample_3) n(1500) seed(7492001)
    
    randomselect if smoking == 0 & gender == 0 & sample_1 != 1, gen(sample_4) n(1500) seed(7492001)
    
    g sample_smoking = 0 if inlist(1, sample_1, sample_2)
    replace sample_smoking = 1 if inlist(1, sample_3, sample_4)
    
    drop sample_1-sample_4

    Thank you in advance for any comment!

    Giovanni

  • #2
    See from around slide 40 here:
    https://github.com/BPLIM/Workshops/b...y_Radyakin.pdf

    Comment

    Working...
    X