Subset data based on groups of observations if group meets 1 condition

Drew Nelson

Join Date: Oct 2022

Posts: 23
#1

Subset data based on groups of observations if group meets 1 condition

21 Oct 2022, 08:33

Hello,

I have a dataset containg hospital billing data. 257000 observations to be exact. I have a variable called, ID. This variable uniquely identifies patients. Each row is a billed charge so there are mutliple rows per ID. I have another variable called, Department. This variable contains a string value denoting the type of service the patient received and was charged for.

Example data:

ID Department
1 MRI
1 PT/OT
1 Surgery
2 Office
2 PTOT
2 MRI
3 PTOT
3 PTOT
4 Surgery
5 MRI
5 PTOT
5 PTOT

I need to subset my dataset to all patients that have received an MRI and analyze the other types of services (denoted in Department) that they have received. I would like to keep all same IDs if one observation is Department=="MRI". So, for example, In the example dataset, I would keep all of the observatrions for IDs: 1, 2, & 5. This would include keeping the observations where ID 1=="PT/OT", as example.

My elementary approach has been:

gen tagger="1" if Department=="MRI"
sort ID
replace tagger="1" if ID==ID[_n-1] & tag=="1"[_n-1]

I usually repeat this until all IDs have been tagged. I know this is not the most efficient way to do this. Please advise.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#2

21 Oct 2022, 08:41

Code:

bysort ID: egen tokeep= max(Department=="MRI") keep if tokeep

See https://www.stata.com/support/faqs/d...ble-recording/
1 like
Comment
Drew Nelson

Join Date: Oct 2022

Posts: 23
#3

21 Oct 2022, 08:45

Hi Andrew,

Thank you for the response. That unfortunately deleted all observations. The tokeep variable contained only 0s.
Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1400

21 Oct 2022, 08:51

Drew Nelson I don't think the problem is with the solution in #2.

Code:

clear
input byte ID str10 Department
1 "MRI"
1 "PT/OT"
1 "Surgery"
2 "Office"
2 "PTOT"
2 "MRI"
3 "PTOT"
3 "PTOT"
4 "Surgery"
5 "MRI"
5 "PTOT"
5 "PTOT"
end

bysort ID: egen tokeep= max(Department=="MRI")
keep if tokeep

produces:

Code:

  +------------------------+
  | ID   Depart~t   tokeep |
  |------------------------|
  |  1        MRI        1 |
  |  1      PT/OT        1 |
  |  1    Surgery        1 |
  |------------------------|
  |  2     Office        1 |
  |  2       PTOT        1 |
  |  2        MRI        1 |
  |------------------------|
  |  5        MRI        1 |
  |  5       PTOT        1 |
  |  5       PTOT        1 |
  +------------------------+

If it deleted all your observations, there's something about the strings that is different. Are there leading/trailing spaces, perhaps? If so, you might want to amend the code slightly to

Code:

bysort ID: egen tokeep= max(trim(Department)=="MRI")
keep if tokeep

Last edited by Hemanshu Kumar; 21 Oct 2022, 08:53.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35698
#5

21 Oct 2022, 08:51

#3 contradicts #1 as you're saying that you used the condition

Code:

Department=="MRI"

with success. From my reading of #1 Andrew Musu's code in #2 should work fine. Did you do exactly what you said in #1?
1 like
Comment
Drew Nelson

Join Date: Oct 2022

Posts: 23
#6

21 Oct 2022, 08:52

My apologies. Ran it on the wrong dataset. Worked perfectly. Thank you!
Comment

Announcement

Subset data based on groups of observations if group meets 1 condition

Comment

Comment

Comment

Comment

Comment