Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate random data (number and string) based on predefined criteria

    Hello,

    I have never tried generating random data (number or string), and would like to learn how. I created a sample of faculty name. Next I want to create two variables. One variable would be "phd" for where they got their degree and I want Stata to randomly select from a pre-defined list (Penn, UCLA, MIT). The second variable would be "year" for when they got their degree, and I want Stata to randomly select an integer between 1970 and 2016. Please kindly share how you would approach it. I appreciate your suggestion.

    input str10 name
    "Mary"
    "Lisa"
    "Philip"
    "Jack"
    "Jill"
    "Julia"
    "Patrick"
    "Scott"
    "Jason"
    "Donald"
    "Mia"
    end

    Best,
    Ji

  • #2
    Try this:

    Code:
    clear*
    input str10 name
    "Mary"
    "Lisa"
    "Philip"
    "Jack"
    "Jill"
    "Julia"
    "Patrick"
    "Scott"
    "Jason"
    "Donald"
    "Mia"
    end
    
    gen year = runiformint(1970, 2016)
    
    label define institutions    1    "Penn"    ///
                                2    "UCLA"    ///
                                3    "MIT"
                                
    gen phd:institutions = runiformint(1, 3)
    This isn't exactly what you asked for, in that the variable phd is not a string variable but rather a value-labeled numeric variable. That is actually likely to work better for most things you might do with this data anyway. But if you really want a string variable, you can apply -decode- to phd to get it.

    Comment


    • #3

      Code:
      gen PhD = word("Penn UCLA MIT", ceil(3 * runiform()))

      Comment


      • #4
        Thank you very much Clyde.

        For some reason, I had to modify the codes to the following for Stata to run:

        Code:
        gen year = 1970+int((46+1)*runiform())
        
        label define institutions 1 "Penn" ///
        2 "UCLA" ///
        3 "MIT"
        
        gen phd:institutions = 1+int((2+1)*runiform())

        Comment


        • #5
          Nick, Clyde - Thanks again!

          A follow up question. What if the predefined list is from another Stata file, say a list of 500 higher education institutions in the U.S. How should I approach that? Thanks.

          Comment


          • #6
            In that case, you need to prepare your institution file so that each institution has a distinct number from 1 to 500 (or whatever the exact N is). Then you can start by generating a random integer between 1 and 500, and then -merge- on that number with the institutions file. So, like this:

            Code:
            clear*
            input str24 institution
            "Penn"
            "UCLA"
            "MIT"
            "Yale"
            "Columbia"
            "Edinburgh"
            "Sciences Po"
            "Gottingen"
            end
            gen long inst_id = _n
            tempfile institutions
            save `institutions'
            
            clear
            input str10 name
            "Mary"
            "Lisa"
            "Philip"
            "Jack"
            "Jill"
            "Julia"
            "Patrick"
            "Scott"
            "Jason"
            "Donald"
            "Mia"
            end
            tempfile names
            save `names'
            
            set seed 1234
            
            gen year = runiformint(1970, 2016)
            
            des using `institutions'
            local i_max `r(N)'
            
            gen inst_id = runiformint(1, `i_max')
            merge m:1 inst_id using `institutions', assert(match using) keep(match) nogenerate
            rename institution phd

            Comment


            • #7
              Thank you so much, Clyde. The m:1 merge is a smart solution!

              Comment


              • #8
                Well, I'd love to take credit for inventing it, but it's a very standard approach to random sampling with replacement. Unfortunately, I've been doing it this way for so long I don't recall who I first learned it from, so I can't give due credit. But, anyway, glad I was able to help.

                Comment

                Working...
                X