Random selection of observations in a variable

Antara Sharma

Join Date: Dec 2019

Posts: 2
#1

Random selection of observations in a variable

04 Apr 2020, 03:49

Hi,

I have a dataset with 1500 each for 46 countries. But I want to make a random selection of countries from these 46. How can I make a random selection of these countries?

Thank you.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35699
#2

04 Apr 2020, 04:14

Suppose you want to select 23 out of 46. Here's one way to do it. I can't use your variable names, or even be certain that you have panel data, because you don't really tell us much about your data.

Code:

* your choice set seed 9876 bysort id (year) : gen random = runiform() if _n == 1 by id: replace random = random[1] egen select = group(random id) su select assert r(max) == 46 * wanted countries are 1 to 23, or 24 to 46 or any 23 out of 46

If this doesn't help (enough), please read and act on https://www.statalist.org/forums/help#stata

All that said, why do you want to do this? I sometimes encounter this argument:

My data are a grab-bag sample with selection biases, etc.

And I need a random sample to justify inference.

So, I will take a random sample from the data I have.

I don't get the notion that random sampling the data you have makes up for biases in selecting the data you have. That's just throwing away good data to no useful end.

The biases will remain; you have just punished yourself into taking a smaller sample.

The priority is not to have a good conscience about statistical inference; it's to understand the data you have.

Would you take these results seriously if they were a random sample of the same size? can be a fair question, but your most important question is what patterns you can find in your data.

Last edited by Nick Cox; 04 Apr 2020, 04:21.
2 likes
Comment
Antara Sharma

Join Date: Dec 2019

Posts: 2
#3

04 Apr 2020, 15:15

Apologies for missing out on the crucial details. I'm working on cross-sectional survey data. I want to be able to make comparisons among employees from different countries but I wanted to limit them to a few countries. I wanted to see how employee job satisfaction changes with changes in workplace socialization but I believe that would differ amongst countries.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

05 Apr 2020, 05:58

Nick's advice on the perils of doing so (concerning internal validation) is to take in full consideration.

Just for the sake of fiddling with the code, and assuming I understood correctly (maybe not!), you could also do:

Code:

set obs 46 gen country = _n set seed 1234 sample 23, count list *also this strategy clear set obs 46 gen country = _n set seed 1234 sample 50 list

Then, you could generate a variable, say "touse":

Code:

gen touse = 1

After that, merge with the main dataset, and, for the estimations, use "if touse ==1".

This is a random selection without replacement.

Shall you wish a random selection with replacement, use - bsample - instead.

Hopefully that helps.

Last edited by Marcos Almeida; 05 Apr 2020, 06:03.

Best regards,

Marcos
1 like
Comment

Announcement

Random selection of observations in a variable

Comment

Comment

Comment