How do I randomly assign scores from a set of observed scores within groups?

Stacy Kehoe

Join Date: Oct 2015

Posts: 6
#1

How do I randomly assign scores from a set of observed scores within groups?

29 Nov 2016, 12:56

Dear Community,

I have 106 groups, which I will call "strata groups", with subjects that come from two datasets, which I'll call Dataset A and Dataset B. The strata groups are the product of a stratification procedure, so the subjects are observationally similar across a vector of covariates. The subjects from Dataset A have a score that I am calling the "selection score". The subjects from Dataset B do not have this score.

Within each strata group, I would like to randomly assign selection scores to subjects from Dataset B (who are missing scores) using the scores from subjects from Dataset A. In other words, I would like to randomly draw a number from a set of selection scores observed for Dataset A and assign that number to each subject in the strata group that is from Dataset B. The distribution of scores in each strata group is uniform (most scores are only observed once).

The strata groups have different proportions of subjects from each dataset. So, in some strata groups there is only one subject from Dataset A and many subjects from Dataset B. In that case, all of the subjects from B should have the value from A.

I have attempted to do this a number of different ways, but I have not been able to figure this out. Is someone able to offer a suitable looping code to help me generate these scores? I am using Stata/SE 14.2 on a Mac.

Stacy
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#2

29 Nov 2016, 14:53

This is too complicated to do with imaginary data. Please use the -dataex- program to post a short sample of your data so we can see how your data is laid out and what it looks like, and experiment with it.
1 like
Comment

Stacy Kehoe

Join Date: Oct 2015
Posts: 6

29 Nov 2016, 15:36

Hi Clyde,

Sure! This is my first time using dataex so please let me know if I am doing this incorrectly. I appreciate your help with this.

Here is output for subjects in "strata groups" 10 and 65. The selection score is centered on zero, which is why there are negative and positive values. The first two outputs pasted below contain data for each group separately. The third output has all of them together.

Thank you!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(subjectid strata_group Dataset_A selection_score)
 36685 10 1  -31
 39158 10 1 -151
 40609 10 1 -101
 40943 10 1  -21
 41262 10 1 -201
 41977 10 1 -174
 42972 10 1  -85
 50007 10 1  -13
123680 10 0    .
126954 10 0    .
127022 10 0    .
158919 10 0    .
end

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(subjectid strata_group Dataset_A selection_score)
 29899 65 1  20
 40839 65 1 -11
 44613 65 1   3
 50843 65 1  42
134743 65 0   .
135596 65 0   .
135608 65 0   .
135614 65 0   .
135879 65 0   .
136154 65 0   .
140548 65 0   .
159957 65 0   .
162065 65 0   .
169702 65 0   .
170428 65 0   .
185070 65 0   .
185220 65 0   .
185228 65 0   .
190342 65 0   .
end

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(subjectid strata_group Dataset_A selection_score)
 29899 65 1   20
 36685 10 1  -31
 39158 10 1 -151
 40609 10 1 -101
 40839 65 1  -11
 40943 10 1  -21
 41262 10 1 -201
 41977 10 1 -174
 42972 10 1  -85
 44613 65 1    3
 50007 10 1  -13
 50843 65 1   42
123680 10 0    .
126954 10 0    .
127022 10 0    .
134743 65 0    .
135596 65 0    .
135608 65 0    .
135614 65 0    .
135879 65 0    .
136154 65 0    .
140548 65 0    .
158919 10 0    .
159957 65 0    .
162065 65 0    .
169702 65 0    .
170428 65 0    .
185070 65 0    .
185220 65 0    .
185228 65 0    .
190342 65 0    .
end

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30095

29 Nov 2016, 17:40

OK, the trick is to extract from the file a file that just lists the strata_groups and all associated selection_scores. Then we join (not -merge-) that back to the original data, so that each observation in the original data is paired with every observation for the same strata_group. Then we pick one observation at random for each subjectid.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(subjectid strata_group Dataset_A selection_score)
 29899 65 1   20
 36685 10 1  -31
 39158 10 1 -151
 40609 10 1 -101
 40839 65 1  -11
 40943 10 1  -21
 41262 10 1 -201
 41977 10 1 -174
 42972 10 1  -85
 44613 65 1    3
 50007 10 1  -13
 50843 65 1   42
123680 10 0    .
126954 10 0    .
127022 10 0    .
134743 65 0    .
135596 65 0    .
135608 65 0    .
135614 65 0    .
135879 65 0    .
136154 65 0    .
140548 65 0    .
158919 10 0    .
159957 65 0    .
162065 65 0    .
169702 65 0    .
170428 65 0    .
185070 65 0    .
185220 65 0    .
185228 65 0    .
190342 65 0    .
end

tempfile original
save `original'

//    CREATE A FILE THAT CROSSWALKS STRATA_GROUP WITH NON-MISSING
//    SELECTION_SCORES
keep if Dataset_A
keep strata_group selection_score
rename selection_score selection_score_2

//    FORM ALL POSSIBLE PAIRS WITHIN STRATA_GROUP
joinby strata_group using `original'

//    SORT RANDOMLY THEN KEEP THE FIRST FOR EACH SUBJECTID
set seed 1234 // OR YOUR PREFERRED RANDOM NUMBER SEED
gen double shuffle1 = runiform()
gen double shuffle2 = runiform()
by subjectid (shuffle1 shuffle2), sort: keep if _n == 1
replace selection_score = selection_score_2 if missing(selection_score)
drop selection_score_2 shuffle*
sort strata_group Dataset_A subjectid

Note: I used two double-precision random numbers for the random sorting because, if your data set is large, one might encounter duplicate values of just one random number--which would make the sort order indeterminate and irreproducible. But if your data set is of only moderate size, you can get rid of shuffle2. And if your data set is small (just a few thousand observations) you can even shrink shuffle1 down to a float.

Comment

Stacy Kehoe

Join Date: Oct 2015

Posts: 6
#5

02 Dec 2016, 13:51

Thank you, Clyde! This worked perfectly.
Comment

Announcement