Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • select a random sample of 100 observations and to check similarity between two variables of sampled and not sampled data

    I have dataset of columns (participant_id,age,sex)
    I need to select a random sample of 100 participants and check if the age and sex distribution of the participants in the sample is similar to those not in the sample.

  • #2
    I am not sure if there is an elegant approach, but a work around can be generating a random variable, rank by it, and then tag the first 100 cases like this:

    Code:
    gen some_random_var = runiform()
    sort some_random_var
    gen first_100 = (_n <= 100)

    Comment


    • #3
      Originally posted by Ken Chui View Post
      I am not sure if there is an elegant approach, but a work around can be generating a random variable, rank by it, and then tag the first 100 cases like this:

      Code:
      gen some_random_var = runiform()
      sort some_random_var
      gen first_100 = (_n <= 100)
      what about checking similarity of distribution of columns sex and age between sampled and un sampled data.
      kindly write full code. I am unable to understand. I would be very grateful to you.

      Comment


      • #4
        You can do what you are asking, but what is the logic behind it? A simple random sample will be similar to the population from which it came, by definition. So what is it you are ultimately wanting to do?

        The code in #2 is complete for the stated purpose of your question and does the following:

        Code:
        gen some_random_var = runiform()    // <-- randomly assign to each observation a value from to 0 to 1
        sort some_random_var                        // <-- sort the variables into the order of the random variable, guaranteeing a random sort order for the observations.
        gen first_100 = (_n <= 100)                 // <-- create a indicator variable. The value is 1 if it is the first 100 observations selected at random, and 0 otherwise.

        Comment


        • #5
          Originally posted by Leonardo Guizzetti View Post
          You can do what you are asking, but what is the logic behind it? A simple random sample will be similar to the population from which it came, by definition. So what is it you are ultimately wanting to do?

          The code in #2 is complete for the stated purpose of your question and does the following:

          Code:
          gen some_random_var = runiform() // <-- randomly assign to each observation a value from to 0 to 1
          sort some_random_var // <-- sort the variables into the order of the random variable, guaranteeing a random sort order for the observations.
          gen first_100 = (_n <= 100) // <-- create a indicator variable. The value is 1 if it is the first 100 observations selected at random, and 0 otherwise.
          Yes, I agree a simple random sample will be similar to the population from which it came. but i need to make it sure by commands"
          Last edited by salman iqbal; 14 Apr 2021, 13:06.

          Comment


          • #6
            sounds like homework to me - please read the FAQ

            Comment

            Working...
            X