Hi, I have two string variables in my dataset that each contain multiple ICD codes delimited by ";". I'd like to:
This is what I'd like to get to:
Thanks!
Jonathan
- Amalgamate a list of all unique ICD codes across these two string variables for all observations.
- Generate a dichotomous variable for each unique ICD code.
- Replace the dichotomous variable with 1 if the ICD is present in the parent string or 0 if it is absent.
| id | icd_primary | icd_secondaries |
| 1 | I4890 | J841;C3430;J90;J920 |
| 2 | M4802 | K100;J920 |
This is what I'd like to get to:
| id | icd_I4890 | icd_J841 | icd_C3430 | icd_J90 | icd_J920 | icd_M4802 | icd_K100 |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
Thanks!
Jonathan

Comment