Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Shuffling Groups of Observations

    Dear all,

    I am trying to shuffle groups of observations within my dataset (but keep some variables in the original ordering). My dataset looks something like this:

    Code:
    clear all
    input group1 group2 time v1 v2 v3 v4
    1 1 1 23 42 56 87
    1 1 2 43 12 14 93
    1 1 3 34 76 23 98
    1 2 4 65 23 92 97
    1 2 5 87 88 99 22
    1 3 6 66 55 33 22
    The above is one panel where the data is tsset as group1 time. What I would like to do is to permute the ordering of the observations based on the group2 variable (within each panel). So one possible permutation would look like:

    Code:
    clear all
    input group1 group2 time
    1 2 4
    1 2 5
    1 3 6
    1 1 1
    1 1 2
    1 1 3
    and another would look like:

    Code:
    clear all
    input group1 group2 time
    1 3 6
    1 1 1
    1 1 2
    1 1 3
    1 2 4
    1 2 5
    Notice how the ordering within each group2 variable is preserved (i.e. the time variable ordering within group2 remains unchanged). I will appreciate any advice on how to accomplish this.

    There is a further clink to this reshuffling. I want only some variables to be re-shuffled but some others to remain the same. So the first sample permutation above would need to look like these:

    Code:
    clear all
    input group1 group2 time v1 v2 v3 v4
    1 2 4 23 42 92 97
    1 2 5 43 12 99 22
    1 3 6 34 76 33 22
    1 1 1 65 23 56 87
    1 1 2 87 88 14 93
    1 1 3 66 55 23 98
    Notice how the ordering of v1 and v2 remain the same but that of v3 and v4 change according to the shuffling.

    I know, there are better ways to spend Thanksgiving

    Thank you for any pointers.

  • #2
    So, if I understand what you want to do, we can think of variables group1, v1, and v2 as invariant, and variables group2, v3, and v4 as permutable. But group2, v3, and v4 do not permute independently of each other: they retain the correspondence in the original data and permute as a block. Also permutations are to be done within blocks of group1.

    I would begin by splitting out the permutable variables plus group1, and do the permutation there, and then -merge- it back into the original data. Thus:

    Code:
    clear all
    input group1 group2 time v1 v2 v3 v4
    1 1 1 23 42 56 87
    1 1 2 43 12 14 93
    1 1 3 34 76 23 98
    1 2 4 65 23 92 97
    1 2 5 87 88 99 22
    1 3 6 66 55 33 22
    end
    
    //    ESTABLISH A FIXED INITIAL ORDER FOR THE DATA
    sort group1 group2 time
    list, noobs clean
    
    local permutable group2 time v3 v4
    set seed 1234
    
    //    SPLIT OUT PERMUTABLE VARIABLES INTO A SEPARATE FILE
    //    ALONG WITH GROUP1
    preserve
    tempfile permute
    keep group1 `permutable'
    //    AND RANDOMLY PERMUTE THEM
    gen shuffle = runiform()
    sort group1 shuffle
    drop shuffle
    save `permute'
    
    //    NOW MERGE THIS BACK TO THE ORIGINAL DATA
    restore
    merge 1:1 _n using `permute', update replace assert(3 4 5) nogenerate
    
    list, noobs clean

    Comment


    • #3
      Thank you, Clyde. This is very useful.

      Comment

      Working...
      X