Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating random numbers

    Hello everyone,
    I have a dataset for over 600 households. Here's an example of it:

    Key | Parent | hh_members_count | Index | name | age
    123AA57begs Sofia 3 1 Sam 37
    123AA57begs Sofia 3 2 Nancy 15
    123AA57begs Sofia 3 3 Mark 2
    983aM04bb5z Karma 3 1 Joseph 38
    983aM04bb5z Karma 3 2 Hariot 4
    983aM04bb5z Karma 3 3 Kevin 1.5

    I would like to create a random variable called hhid that repeats for each individual but at the same time is unique for each household. Here's an example:

    Key | Parent | hh_members_count | Index | name | age | hhid
    123AA57begs Sofia 3 1 Sam 37 2517
    123AA57begs Sofia 3 2 Nancy 15 2517
    123AA57begs Sofia 3 3 Mark 2 2517
    983aM04bb5z Karma 3 1 Joseph 38 3089
    983aM04bb5z Karma 3 2 Hariot 4 3089
    983aM04bb5z Karma 3 3 Kevin 1.5 3089

    The hhid shouldn't be cumulative (i.e. not 1, 2, 3...). Also it should have a minimum of 3 numbers and a maximum of 5 numbers. In other words the hhid should be as unique as the key except that it takes numerical values only.
    I would greatly appreciate your help in this regard. Thank you.

  • #2
    Tina:
    you may want to consider the following toy-example:
    Code:
    . g family=1
    
    . g hh_members=_n
    
    . expand 2
    
    . replace family=2 in 3/4
    
    . label define hh_members 1 "mother" 2 "daughter"
    
    . label val hh_members hh_members
    
    . bysort family (hh_members): gen wanted=runiform() if _n==1
    
    
    . bysort family ( hh_members): replace wanted=wanted[1] if wanted==.
    
    
    . list
    
         +------------------------------+
         | family   hh_mem~s     wanted |
         |------------------------------|
      1. |      1     mother   .3488717 |
      2. |      1   daughter   .3488717 |
      3. |      2     mother   .2668857 |
      4. |      2   daughter   .2668857 |
         +------------------------------+
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      There might well be a simpler method, but your constraints

      1. Between 100 and 99999

      2. Matching households uniquely, not individuals

      3. Random otherwise

      all have to be satisfied.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str11 Key str5 Parent byte(hh_members_count Index) str6 name double age
      "123AA57begs" "Sofia" 3 1 "Sam"     37
      "123AA57begs" "Sofia" 3 2 "Nancy"   15
      "123AA57begs" "Sofia" 3 3 "Mark"     2
      "983aM04bb5z" "Karma" 3 1 "Joseph"  38
      "983aM04bb5z" "Karma" 3 2 "Hariot"   4
      "983aM04bb5z" "Karma" 3 3 "Kevin"  1.5
      end
      
      save SAFECOPY 
      
      bysort Key : keep if _n == 1 
      save KEY 
      count 
      local N = r(N)
      
      clear 
      set obs 99900 
      range id 100 99999
      set seed 2803 
      gen double rnd = runiform()
      sort rnd 
      keep in 1/`N'
      
      merge 1:1 _n using KEY 
      assert _merge == 3 
      drop _merge 
      
      merge 1:m Key using SAFECOPY 
      
      list

      Comment

      Working...
      X