Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A question on randomization

    I wonder if there is a command that can do the following:

    1. I have a list of 7 numbers, say (2,5,7,8,11,13,14)
    2. I want to generate a variable call "v1" with observations = 100, the values of the 100 observations are randomly assigned from the 7 numbers in #1. In other words, the values of the 100 observations can only be (2,5,7,8,11,13,14).

    If it possible to generate such variable in stata?

  • #2
    Code:
    //  CREATE A DATA SET WITH THE 7 NUMBERS
    clear*
    input int (pick x)
    1 2
    2 5
    3 7
    4 9
    5 11
    6 13
    7 14
    end
    tempfile 7numbers
    save `7numbers'
    
    //  CREATE THE DESIRED DATA SET
    clear
    set seed 1234 // OR YOUR FAVORITE RANDOM NUMBER SEED
    set obs 100
    gen pick = runiformint(1, 7)
    gen sort_order = _n
    merge m:1 pick using `7numbers', assert(match using) nogenerate
    drop pick
    sort sort_order

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Code:
      // CREATE A DATA SET WITH THE 7 NUMBERS
      clear*
      input int (pick x)
      1 2
      2 5
      3 7
      4 9
      5 11
      6 13
      7 14
      end
      tempfile 7numbers
      save `7numbers'
      
      // CREATE THE DESIRED DATA SET
      clear
      set seed 1234 // OR YOUR FAVORITE RANDOM NUMBER SEED
      set obs 100
      gen pick = runiformint(1, 7)
      gen sort_order = _n
      merge m:1 pick using `7numbers', assert(match using) nogenerate
      drop pick
      sort sort_order
      Awesome! The solution is so beautiful.

      Could you help me one more time?

      I created a sample dataset as below. For each observation (or row), I would like to randomly assign var1, var2, or var3's value to x. Is this possible in stata?

      Code:
      clear all
      input int (var1 var2 var3 x)
      1 2 5 .
      2 5 2 .
      3 7 6 .
      4 9 7 .
      5 11 9 .
      6 13 11 .
      7 14 34 .
      4 9 7 . 
      3 7 6 . 
      7 14 34 .
      end

      Comment


      • #4
        Code:
        clear all
        input int (var1 var2 var3 x)
        1 2 5 .
        2 5 2 .
        3 7 6 .
        4 9 7 .
        5 11 9 .
        6 13 11 .
        7 14 34 .
        4 9 7 . 
        3 7 6 . 
        7 14 34 .
        end
        
        set seed 5678
        gen int pick = runiformint(1, 3)
        forvalues i = 1/3 {
            replace x = var`i' if pick == `i'
        }
        Note: If the real situation has a substantially larger number of variables, I would approach it differently, but for picking one out of three, I think this is simplest and best.

        Comment


        • #5
          Here are a couple of other solutions to Victor's original questions--not better, just different. I'll bet other people can suggest several other reasonable solutions.
          Code:
          clear
          set obs 100
          set seed 185434
          // #1
          mat urn = (2,5,7,8,11,13,14)
          gen x = urn[1,runiformint(1,7)]
          // #2
          local urn = "2 5 7 8 11 13 14"
          replace x = real(word("`urn'", runiformint(1,7)))

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            Code:
            clear all
            input int (var1 var2 var3 x)
            1 2 5 .
            2 5 2 .
            3 7 6 .
            4 9 7 .
            5 11 9 .
            6 13 11 .
            7 14 34 .
            4 9 7 .
            3 7 6 .
            7 14 34 .
            end
            
            set seed 5678
            gen int pick = runiformint(1, 3)
            forvalues i = 1/3 {
            replace x = var`i' if pick == `i'
            }
            Note: If the real situation has a substantially larger number of variables, I would approach it differently, but for picking one out of three, I think this is simplest and best.
            Hi Clyde, thanks again for your beautiful solution.

            If there are many variables with different names, how would you do it?

            Comment


            • #7
              Originally posted by Mike Lacy View Post
              Here are a couple of other solutions to Victor's original questions--not better, just different. I'll bet other people can suggest several other reasonable solutions.
              Code:
              clear
              set obs 100
              set seed 185434
              // #1
              mat urn = (2,5,7,8,11,13,14)
              gen x = urn[1,runiformint(1,7)]
              // #2
              local urn = "2 5 7 8 11 13 14"
              replace x = real(word("`urn'", runiformint(1,7)))
              Thanks Mike, for the diversity of solutions. Really enjoyed reading your solutions.

              Comment


              • #8
                Here's the general approach, illustrated with the built-in auto.dta.

                Code:
                clear*
                sysuse auto
                
                //  CREATE A LOCAL MACRO LISTING
                //  THE VARIABLES THAT VALUES WILL BE
                //  RANDOMLY SELECTED FROM
                ds price-mpg headroom-gear_ratio
                local sources `r(varlist)'
                
                //  SET RANDOM NUMBER SEED
                set seed 9101112
                
                //  GIVE SOURCE VARIABLES NAMES WITH
                //  A COMMON PREFIX AND CREATE AN OBS IDENTIFIER
                //  SO WE CAN RESHAPE
                rename (`sources') s_=
                gen long obs_no = _n
                
                //  GO LONG
                reshape long s_, i(obs_no) j(vname) string
                //  SORT EACH OBS VARIABLES INTO RANDOM ORDER
                //  AND SELECT THE FIRST FOR NEW VARIABLE x
                gen double shuffle = runiform()
                by obs_no (shuffle), sort: gen x = s_[1]
                
                //  RESTORE ORIGINAL DATA LAYOUT AND VARIABLE NAMES
                drop shuffle
                reshape wide
                rename s_* *
                Establishing a local macro with the desired variable names can be tricky if the names are completely unsystematic and the variables are scattered haphazardly around the data set. Worst case scenario you have to just list them all out, but usually, as here, the use of some wildcards can accomplish it more economically.

                Also, the creation of new variable names ahead of the -reshape long- command can be tricky. Since Stata limits variable names to 32 characters, if any of the source variable names are already more than 30 characters, what I've done here won't work. So sometimes this method requires some ad hoc renaming of variables to work.

                As a rule of thumb I would say that if your problem involves a large number of variables but the number of observations is modest (and the variable names are not too difficult to work with) this is the approach I prefer. But if the number of variables is small, or if the data set contains a large number of observations (which makes the sorting and -reshape-ing very slow) then I would stick with the approach in #4.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  Here's the general approach, illustrated with the built-in auto.dta.

                  Code:
                  clear*
                  sysuse auto
                  
                  // CREATE A LOCAL MACRO LISTING
                  // THE VARIABLES THAT VALUES WILL BE
                  // RANDOMLY SELECTED FROM
                  ds price-mpg headroom-gear_ratio
                  local sources `r(varlist)'
                  
                  // SET RANDOM NUMBER SEED
                  set seed 9101112
                  
                  // GIVE SOURCE VARIABLES NAMES WITH
                  // A COMMON PREFIX AND CREATE AN OBS IDENTIFIER
                  // SO WE CAN RESHAPE
                  rename (`sources') s_=
                  gen long obs_no = _n
                  
                  // GO LONG
                  reshape long s_, i(obs_no) j(vname) string
                  // SORT EACH OBS VARIABLES INTO RANDOM ORDER
                  // AND SELECT THE FIRST FOR NEW VARIABLE x
                  gen double shuffle = runiform()
                  by obs_no (shuffle), sort: gen x = s_[1]
                  
                  // RESTORE ORIGINAL DATA LAYOUT AND VARIABLE NAMES
                  drop shuffle
                  reshape wide
                  rename s_* *
                  Establishing a local macro with the desired variable names can be tricky if the names are completely unsystematic and the variables are scattered haphazardly around the data set. Worst case scenario you have to just list them all out, but usually, as here, the use of some wildcards can accomplish it more economically.

                  Also, the creation of new variable names ahead of the -reshape long- command can be tricky. Since Stata limits variable names to 32 characters, if any of the source variable names are already more than 30 characters, what I've done here won't work. So sometimes this method requires some ad hoc renaming of variables to work.

                  As a rule of thumb I would say that if your problem involves a large number of variables but the number of observations is modest (and the variable names are not too difficult to work with) this is the approach I prefer. But if the number of variables is small, or if the data set contains a large number of observations (which makes the sorting and -reshape-ing very slow) then I would stick with the approach in #4.
                  I am just learning and exploring the randomizing function in stata. Thanks again for helping me with my random questions.

                  Comment

                  Working...
                  X