Hi all,
First time posting, so please forgive any errors.
I have a large dataset of patients undergoing surgery (approx 140k). This data set is large but the relevant data is organized as follows
My goal is to determine the average number of surgeries performed by hospital and by surgeon per year and then divide the patients into 4 groups based on hospital/surgeon volume. To do this, I need to first sort by year or capture the unique hospital and surgeon IDs, then find the average number of observations (aka surgeries performed) by the hospitals and surgeons for that year, and then generate a new variable assigning patients into the quartiles based on hospital and surgeon volume.
I've thus far tried to generate code to find the unique hospital_ID codes by using the following:
bys Hospital_ID: generate _first = (_n == 1)
This has allowed me to determine if it's the first instance of the hospital ID performing the surgery on the patient, but honestly I have no idea where to go from here. Any insight would be greatly appreciated!
Thanks,
Chris
First time posting, so please forgive any errors.
I have a large dataset of patients undergoing surgery (approx 140k). This data set is large but the relevant data is organized as follows
Patient ID | Hospital_ID | Year | Surgeon ID | ||
1 | 1 | 2005 | |||
2 | 1 | 2006 | |||
3 | 2 | 2007 | |||
4 | 3 | 2008 | |||
5 | 4 |
I've thus far tried to generate code to find the unique hospital_ID codes by using the following:
bys Hospital_ID: generate _first = (_n == 1)
This has allowed me to determine if it's the first instance of the hospital ID performing the surgery on the patient, but honestly I have no idea where to go from here. Any insight would be greatly appreciated!
Thanks,
Chris
Comment