I have a dataset and want to delete duplicate observations for diagnoses.
Each row I have a unique patient ID - and then diagnosis variables diag_code_1 - diag_code_x (up to 800 for some patiends). diag_code_1 is a combination of 2-5 numbers and letters representing an ICD9 code. The majority of these variables are repeats. What is the best code to use to delete all diagnosis duplicates for each row of the table leaving only unique diagnoses.
Many thanks - very grateful for any help with this.
Each row I have a unique patient ID - and then diagnosis variables diag_code_1 - diag_code_x (up to 800 for some patiends). diag_code_1 is a combination of 2-5 numbers and letters representing an ICD9 code. The majority of these variables are repeats. What is the best code to use to delete all diagnosis duplicates for each row of the table leaving only unique diagnoses.
Many thanks - very grateful for any help with this.
Comment