Drawing a random sample of subjects for manual review

May Blake

Join Date: Feb 2020

Posts: 137
#1

Drawing a random sample of subjects for manual review

26 Jun 2023, 11:42

Hello,
I am trying to randomly sample a sub-data for review. I have a dataset with 17,000 encounters and I want to draw a random sample of 1500. I have several factors I want to sample on.

1 - Within a before and after implementation of surgery protocol
2 - Within the department (4 departments)
3 - Within the surgeons in the health care facility
4 - Within the protocol use variable (whether the surgeon adopted the protocol or not)
5 - Within the patient encounters as some patients visited multiple times

I am using the code below but my data returns all empty cells

[Code]
sample 8.8, by(time dept surgeon prot enc)
Tags: None
Bruce Weaver

Join Date: May 2014

Posts: 1139
#2

26 Jun 2023, 12:40

Hello May Blake. I wonder if your surgeon variable is causing trouble. Presumably, surgeons are not crossed with all other variables in your list. Rather, they are clustered within HC facilities. Does it work better if you omit surgeon?

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
May Blake

Join Date: Feb 2020

Posts: 137
#3

26 Jun 2023, 13:34

Thank you Bruce, Surgeons are clustered within departments. Even when I removed that variable, I am still getting empty cells so not sure what could be the problem.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1139
#4

26 Jun 2023, 13:55

If you cross-tabulate all of those variables on the full dataset, do you see any empty cells?

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment

May Blake

Join Date: Feb 2020
Posts: 137

27 Jun 2023, 13:04

Hi Bruce,

Here is a sample of my data below. There are some missing fields but there are filled in with the string of "Missing" they are not blank.

---------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str4 time str5 dept str6 surgeon str7 prot byte enc
"Pre"  "ONC"   "DANIEL" "YES"      3
"Pre"  "SURG1" "LEDDY"  "YES"      1
"Pre"  "SURG2" "EVA"    "NO"       2
"Pre"  "ONC"   "LEDDY"  "MISSING"  1
"Pre"  "SURG3" "MATT"   "MISSING"  4
"Pre"  "ONC"   "MATT"   "NO"       4
"Post" "SURG1" "LEDDY"  "NO"       6
"Post" "SURG3" "MATT"   "YES"      7
"Post" "ONC"   "DANIEL" "MISSING"  8
"Post" "SURG2" "EVA"    "MISSING"  9
"Post" "SURG2" "PRINCE" "NO"      10
"Post" "ONC"   "SADDIE" "YES"     11
"Post" "OONC"  "TRACE"  "MISSING" 12
end

------------------ copy up to and including the previous line ------------------

Listed 13 out of 13 observations

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2406
#6

27 Jun 2023, 13:21

May, have you considered creating 5 separate flags for each listed condition, and then drawing a sample from each set of flagged observations?
Comment
May Blake

Join Date: Feb 2020

Posts: 137
#7

27 Jun 2023, 13:39

Leonardo, I have not but that does sound like what I am looking for. Unfortunately, I don't have a clue on where to start for that. Would you recommend creating sub-data for each condition and then drawing from that sub-data?
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2406
#8

27 Jun 2023, 13:43

That would be a reasonable approach to get started. I’m not really sure how to help you get started from the data provided.
Comment

Announcement

Drawing a random sample of subjects for manual review

Comment

Comment

Comment

Comment

Comment

Comment

Comment