Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    There is some unclarity in what you request.. I'm going to assume you mean: "From a given list of variables, randomly select three of them without replacement, and for each observation of a selected variable, assign its values to the same observation of a new variable." You might mean something different.
    Code:
    clear
    sysuse auto
    // Assemble a list of variables for illustration
    ds
    local setvars = "`r(varlist)'"
    //
    local size : word count `setvars'
    forval i = 1/3 {
       local nextvar = word("`setvars'", runiformint(1, `size'))
       di "newvar`i' will get the values of `nextvar'."
       gen newvar`i' = `nextvar'
       // Don't pick this var again
       local setvars = subinstr("`setvars'", "`nextvar'", "", 1)
       local size = `size' - 1
    }

    Comment


    • #17
      Code:
      // USE THE BUILT-IN AUTO DATA SET AS AN EXAMPLE
      sysuse auto, clear
      
      // SELECT RANDOMLY ONE OF THE VARIABLES
      // price mpg headroom gear_ratio IN EACH
      // OBSERVATION
      
      set seed 1234
      local source_vars price mpg headroom gear_ratio displacement turn length
      local n_vars: word count `source_vars'
      
      gen long obs_no = _n
      preserve
      rename (`source_vars') var=
      rename var* var#, renumber sort
      keep obs_no var*
      reshape long var, i(obs_no) j(_j)
      set seed 1234
      gen double shuffle = runiform()
      by obs_no (shuffle), sort: keep if _n <= 3 //   KEEP 3 SELECTED AT RANDOM W/O REPLACEMENT
      by obs_no: replace _j = _n
      drop shuffle
      reshape wide var, i(obs_no) j(_j)
      tempfile selections
      save `selections'
      restore
      merge 1:1 obs_no using `selections', assert(match) nogenerate
      Added: Crossed with #16. Mike Lacy's solution and mine do different things. His solution make one random selection of three variables and uses those same three variable for every observation in the data set. My solution picks a new random selection of three variables for each observation in the data set. This is parallel to what was done in response to the original question at #1 in this thread. It's up to you which of these is what you want.
      Last edited by Clyde Schechter; 22 Aug 2022, 09:54.

      Comment


      • #18
        Mike Lacy Assuming you capture what the poster wanted, I'd like to suggest two edits to your code in #16

        Code:
        clear
        sysuse auto
        // Assemble a list of variables for illustration
        ds
        local setvars = "`r(varlist)'"
        //
        local size : word count `setvars'
        set seed 12345
        forval i = 1/3 {
           local nextvar = word("`setvars'", runiformint(1, `size'))
           di "newvar`i' will get the values of `nextvar'."
           gen newvar`i' = `nextvar'
           // Don't pick this var again
           local setvars: list setvars - nextvar
           local size = `size' - 1
        }
        The setting of a seed ensures replicability. The second edit ensures that the code runs well even if you have a variable whose entire name appears in another variable. This is not true of the auto dataset, but could be true in the user's. E.g. if there were two variables mpg and mpg1, the subinstr method would likely create a problem.
        Last edited by Hemanshu Kumar; 22 Aug 2022, 09:50.

        Comment


        • #19
          In the" st_keepvar(sample_vars')" code, how can I add an variable (such as id) to the randomly selected variable list? thanks!


          Originally posted by Hua Peng (StataCorp) View Post
          If you do not mind using some Mata,

          Code:
          cscript
          sysuse auto
          
          mata:
          // draw 20% variables
          sample_percent = 0.2
          // number of variables in the dataset
          nvar = st_nvar()
          // number of variables to draw
          sample = round(nvar*sample_percent)
          // randomize the variable index list then draw
          sample_vars = jumble(1::nvar)[1..sample]
          // only keep selected variables
          st_keepvar(sample_vars')
          end
          
          describe

          Comment

          Working...
          X