Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random selection of observations in a variable

    Hi,

    I have a dataset with 1500 each for 46 countries. But I want to make a random selection of countries from these 46. How can I make a random selection of these countries?

    Thank you.

  • #2
    Suppose you want to select 23 out of 46. Here's one way to do it. I can't use your variable names, or even be certain that you have panel data, because you don't really tell us much about your data.

    Code:
    * your choice
    set seed 9876
    bysort id (year) : gen random = runiform() if _n == 1
    by id: replace random = random[1]
    
    egen select = group(random id)
    su select
    assert r(max) == 46
    
    * wanted countries are 1 to 23, or 24 to 46 or any 23 out of 46
    If this doesn't help (enough), please read and act on https://www.statalist.org/forums/help#stata

    All that said, why do you want to do this? I sometimes encounter this argument:

    My data are a grab-bag sample with selection biases, etc.

    And I need a random sample to justify inference.

    So, I will take a random sample from the data I have.
    I don't get the notion that random sampling the data you have makes up for biases in selecting the data you have. That's just throwing away good data to no useful end.

    The biases will remain; you have just punished yourself into taking a smaller sample.

    The priority is not to have a good conscience about statistical inference; it's to understand the data you have.

    Would you take these results seriously if they were a random sample of the same size? can be a fair question, but your most important question is what patterns you can find in your data.
    Last edited by Nick Cox; 04 Apr 2020, 04:21.

    Comment


    • #3
      Apologies for missing out on the crucial details. I'm working on cross-sectional survey data. I want to be able to make comparisons among employees from different countries but I wanted to limit them to a few countries. I wanted to see how employee job satisfaction changes with changes in workplace socialization but I believe that would differ amongst countries.

      Comment


      • #4
        Nick's advice on the perils of doing so (concerning internal validation) is to take in full consideration.

        Just for the sake of fiddling with the code, and assuming I understood correctly (maybe not!), you could also do:

        Code:
        set obs 46
        gen country = _n
        set seed 1234
        sample 23, count
        list
        *also this strategy
        clear
        set obs 46
        gen country = _n
        set seed 1234
        sample 50
        list
        Then, you could generate a variable, say "touse":

        Code:
        gen touse = 1
        After that, merge with the main dataset, and, for the estimations, use "if touse ==1".

        This is a random selection without replacement.

        Shall you wish a random selection with replacement, use - bsample - instead.

        Hopefully that helps.
        Last edited by Marcos Almeida; 05 Apr 2020, 06:03.
        Best regards,

        Marcos

        Comment

        Working...
        X