Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating random id numbers for groups

    Not sure my title makes sense but here is what I'm trying to do. I have teacher and school ids that can be identified so I want to create a new id that identifies that all the students have the same teacher or are in the same school, but can't be identified as they are now. For example, there are multiple students in school 30721 and that school can be identified so I want to scramble the number for all students in that school. Does this make sense? Any help appreciated. Thanks

  • #2
    The simplest way to do this is to shuffle the observations thoroughly into a random order and then assign sequential numbers as the new id's. So, something like this:

    Code:
    set seed pick_an_integer_here
    frame put school_id, into(schools)
    frame schools {
        duplciates drop
        gen double shuffle = runiform()
        sort shuffle
        gen random_school_id = _n
        drop shuffle
        save school_id_crosswalk, replace
    }
    frame put teacher_id, into(teachers)
    frame teachers {
        duplicates drop
        gen double shuffle = runiform()
        sort shuffle
        gen random_teacher_id = _n
        drop shuffle
        save teacher_id_crosswalk, replace
    }
    frame put student_id, into(students)
    frame students {
        duplicates drop
        gen double shuffle = runiform()
        sort shuffle
        gen random_student_id = _n
        drop shuffle
        save student_id_crosswalk, replace
    }
     
    foreach x in school teacher student {
        frlink m:1 `x'_id, frame(`x's)
        frget random_`x'_id, from(`x's)
        drop `x'_id `x's
    }
    sort random_*
    quietly compress
    save masked_data_set, replace
    Evidently you will need to specify a random number seed of your choice. And you will need to modify the code to reflect your actual variable names. If the ID variable names are systematic, as I have specified in this model code, then you can do all of the creation of those three frames in a -foreach- loop analogous to the one at the bottom. On the other hand, if the ID variable names are unsystematic, then you will have to spell out the final -foreach- loop as a series of three "paragraphs" of code to deal with each one separately. In any case, the code will save three crosswalk files that will enable you to recover the actual id's when necessary. You should probably encrypt those crosswalk files, and the do-file containing the code, for extra security, and, if the data are really sensitive, store them on a different computer, or on a removable storage device.

    Comment

    Working...
    X