Hi, all. I have a dataset containing 26 million hospital discharge records. Each record has 25 fields containing a code for a procedure performed during the patient's stay.
I have a passel of flags that I need to set for various procedures (i.e., did the patient undergo coronary bypass surgery?) that can appear in any one of the 25 procedure fields. For obvious reasons, I'm interested in doing this as efficiently as possible.
Currently I've implemented this as follows:
This works, but is fairly slow. I'm wondering if it might be more productive to create a separate, normalized table with a unique id for patient record and a single field for procedure code, and simply run through each record of that table without -foreach-. I've tried this for one or two procedures and it seems speedy, but I don't know if it will scale.
Alternatively, I could create a third table containing the procedure code and flag variable for each procedure I'm interested in, and merge it with the normalized table.
Any thoughts on which of these approaches is likely to be the fastest would be great.
Thanks much.
I have a passel of flags that I need to set for various procedures (i.e., did the patient undergo coronary bypass surgery?) that can appear in any one of the 25 procedure fields. For obvious reasons, I'm interested in doing this as efficiently as possible.
Currently I've implemented this as follows:
Code:
foreach v of varlist icd_prcdr_cd* { replace cabg=1 if inlist(`v', "3610", "3611", "3612", "3613", "3614","3615", "3616") // flag for CABG operation replace valve=1 if strpos(`v', "350") | strpos(`v', "351") | strpos(`v', "352") // flag for valve operation etc. etc. // do this for 50 more flags }
Alternatively, I could create a third table containing the procedure code and flag variable for each procedure I'm interested in, and merge it with the normalized table.
Any thoughts on which of these approaches is likely to be the fastest would be great.
Thanks much.
Comment