select a random sample of 100 observations and to check similarity between two variables of sampled and not sampled data

salman iqbal

Join Date: Apr 2021

Posts: 7
#1

select a random sample of 100 observations and to check similarity between two variables of sampled and not sampled data

14 Apr 2021, 11:55

I have dataset of columns (participant_id,age,sex)
I need to select a random sample of 100 participants and check if the age and sex distribution of the participants in the sample is similar to those not in the sample.
Tags: data, distribution, Random sample
Ken Chui

Join Date: Aug 2014

Posts: 1058
#2

14 Apr 2021, 12:03

I am not sure if there is an elegant approach, but a work around can be generating a random variable, rank by it, and then tag the first 100 cases like this:

Code:

gen some_random_var = runiform() sort some_random_var gen first_100 = (_n <= 100)
Comment
salman iqbal

Join Date: Apr 2021

Posts: 7
#3

14 Apr 2021, 12:20

Originally posted by Ken Chui View Post

I am not sure if there is an elegant approach, but a work around can be generating a random variable, rank by it, and then tag the first 100 cases like this:

Code:

gen some_random_var = runiform() sort some_random_var gen first_100 = (_n <= 100)

what about checking similarity of distribution of columns sex and age between sampled and un sampled data.
kindly write full code. I am unable to understand. I would be very grateful to you.
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2403

14 Apr 2021, 12:31

You can do what you are asking, but what is the logic behind it? A simple random sample will be similar to the population from which it came, by definition. So what is it you are ultimately wanting to do?

The code in #2 is complete for the stated purpose of your question and does the following:

Code:

gen some_random_var = runiform()    // <-- randomly assign to each observation a value from to 0 to 1
sort some_random_var                        // <-- sort the variables into the order of the random variable, guaranteeing a random sort order for the observations.
gen first_100 = (_n <= 100)                 // <-- create a indicator variable. The value is 1 if it is the first 100 observations selected at random, and 0 otherwise.

Comment

salman iqbal

Join Date: Apr 2021

Posts: 7
#5

14 Apr 2021, 12:41

Originally posted by Leonardo Guizzetti View Post

You can do what you are asking, but what is the logic behind it? A simple random sample will be similar to the population from which it came, by definition. So what is it you are ultimately wanting to do?

The code in #2 is complete for the stated purpose of your question and does the following:

Code:

gen some_random_var = runiform() // <-- randomly assign to each observation a value from to 0 to 1 sort some_random_var // <-- sort the variables into the order of the random variable, guaranteeing a random sort order for the observations. gen first_100 = (_n <= 100) // <-- create a indicator variable. The value is 1 if it is the first 100 observations selected at random, and 0 otherwise.

Yes, I agree a simple random sample will be similar to the population from which it came. but i need to make it sure by commands"

Last edited by salman iqbal; 14 Apr 2021, 13:06.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4485
#6

14 Apr 2021, 12:59

sounds like homework to me - please read the FAQ
Comment

Announcement

select a random sample of 100 observations and to check similarity between two variables of sampled and not sampled data

Comment

Comment

Comment

Comment

Comment