Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to randomly draw variables (NOT observations) in stata?

    My question is in the title. Many thanks!

  • #2
    I assume you have a list of desired variables in your data set, and in each observation you wish to select the value from one of those variables, with equal probability for all of those variables, to go in a new variable. As you did not provide an example of your data, I illustrate the approach using the built-in auto.dta:

    [code]

    // USE THE BUILT-IN AUTO DATA SET AS AN EXAMPLE
    sysuse auto, clear

    // SELECT RANDOMLY ONE OF THE VARIABLES
    // price mpg headroom gear_ratio IN EACH
    // OBSERVATION

    set seed 1234
    local source_vars price mpg headroom gear_ratio
    local n_vars: word count `source_vars'

    gen long selector = runiformint(1, `n_vars')
    gen long obs_no = _n

    preserve

    rename (`source_vars') var=
    rename var* var#, renumber sort
    keep obs_no var*
    reshape long var,i(obs_no) j(selector)
    tempfile source
    save `source'

    restore
    merge 1:1 obs_no selector using `source', assert(match using) keep(match) nogenerate
    [code]

    In the future, when asking for help with code, please show example data, and do so using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Really appreciate your kindness! I am trying the code.

      Comment


      • #4
        Is this code, rename (`source_vars') var=, asking stata to rename all those source variables? If so, what is the part, var=, for?

        Comment


        • #5
          If you do not mind using some Mata,

          Code:
          cscript
          sysuse auto
          
          mata:
              // draw 20% variables 
          sample_percent = 0.2
              // number of variables in the dataset
          nvar = st_nvar()
              // number of variables to draw
          sample = round(nvar*sample_percent)
              // randomize the variable index list then draw 
          sample_vars = jumble(1::nvar)[1..sample]
              // only keep selected variables
          st_keepvar(sample_vars')
          end
          
          describe

          Comment


          • #6
            In response to #4, the -rename- command prefixes the source variable names with var. This is done so that they all have the same starting characters, which makes the -reshape- command much simpler. After the -reshape-, then the value of the first variable (price) is found as the value of var in those observations where _j = 1; the value of mpg, the second variable, is found as the value of of var in those observations where _j = 2, etc.

            Comment


            • #7
              I see. In response to #4, so in this line of code, rename (`source_vars') var=, I don't need to specify the name of each variable, right? I just run this line, an error message showed up as the following:
              . rename (`source_vars') var=
              syntax error
              Syntax is
              rename oldname newname [, renumber[(#)] addnumber[(#)] sort ...]
              rename (oldnames) (newnames) [, renumber[(#)] addnumber[(#)] sort ...]
              rename oldnames , {upper|lower|proper}
              r(198);

              Comment


              • #8
                In response to #5, thank you very much for your kindness! When I run this code,
                nvar = st_nvar() an error message showed up as the following:

                : nvar = st_nvar(10)
                wrong number of arguments for st_nvar()

                Comment


                • #9
                  Seems you have a typo when copied/retyped the code. The line should be:

                  Code:
                  nvar = st_nvar()
                  not

                  Code:
                  nvar = st_nvar(10)

                  Comment


                  • #10
                    Many thanks! It works though I don't know much of how.

                    Comment


                    • #11
                      In response to #6, so in this line of code, rename (`source_vars') var=, I don't need to specify the name of each variable, right? I just run this line, an error message showed up as the following:
                      . rename (`source_vars') var=
                      syntax error
                      Syntax is
                      rename oldname newname [, renumber[(#)] addnumber[(#)] sort ...]
                      rename (oldnames) (newnames) [, renumber[(#)] addnumber[(#)] sort ...]
                      rename oldnames , {upper|lower|proper}
                      r(198);

                      Comment


                      • #12
                        I ran that code to test it before I posted it and it did not produce that error.

                        I don't need to specify the name of each variable, right?

                        No, because you already do that in the command -local source_vars price mpg headroom gear_ratio- (evidently, you need to place the actual variables you want to randomly select from here, not the variables from auto.dta), which appears near the begininng of the code (just after the -set seed- command.) That line of code defining the contents of local source_vars defines it once and for all, and subsequent references to it will retrieve the names of the variables. So make sure thatlineof code is there.

                        I think the problem is that you are trying to run this code line by line. Because it uses local macros, that won't work. You have to run the entire code, from beginning to end, without interruption from the do-file editor because if you run it line-by-line or chunk-by-chunk the local macros disappear at each interruption. So run it all at once.

                        Comment


                        • #13
                          Yes, I did run line by line. Thank you very much! So after finishing running this code (without running the line of code, restore), I should then apply the code, sample or bsample, on the newly created temp file "source" to randomly draw observations. Is this the idea?

                          Comment


                          • #14
                            No. Run the whole thing, including the restore and -merge- commands, and you will have your original data back, with an extra variable, called var, that contains random draws from the variables you chose.

                            Comment


                            • #15
                              I have a question along this line. If I wanted to draw, not one but three values, each of them different from each other from a given set of variables, how would I go about that? (The draw of the second value will depend on the first and the draw of the third will depend on the second and the first.)
                              Last edited by Denat Ephrem Negatu; 22 Aug 2022, 08:15.

                              Comment

                              Working...
                              X