Hi,
I am in the process of cleaning health care data received from a drug registry and would be so grateful to get help with the following:
The data is panel data consisting of the identifier variable "StudyID", a categorical variable "ATC" constituting 9 different drug regimens and the date variable "EDATUM" which is the date that a patient collects a certain drug at the pharmacy. There is one row for every EDATUM and since most patients collect a drug multiple times and also switches between different drugs during our follow-up, the data file now consists of appx 100 000 rows.
I want to keep only the first and the last EDATUM for every drug that a patient uses. If a patient switch to a new drug and then switch back to the prior used drug, I want to keep the first and the last EDATUM for both the times the drug was used.
I will further convert the data into wide format but since the data now consists of (too) many rows per patient, I believe it is better to start with cleaning out the EDATUMS that are not of interest (all EDATUMS in-between the first and the last of every used drug).
Thank you in advance!
Kind regards,
Hanne
I am in the process of cleaning health care data received from a drug registry and would be so grateful to get help with the following:
The data is panel data consisting of the identifier variable "StudyID", a categorical variable "ATC" constituting 9 different drug regimens and the date variable "EDATUM" which is the date that a patient collects a certain drug at the pharmacy. There is one row for every EDATUM and since most patients collect a drug multiple times and also switches between different drugs during our follow-up, the data file now consists of appx 100 000 rows.
I want to keep only the first and the last EDATUM for every drug that a patient uses. If a patient switch to a new drug and then switch back to the prior used drug, I want to keep the first and the last EDATUM for both the times the drug was used.
I will further convert the data into wide format but since the data now consists of (too) many rows per patient, I believe it is better to start with cleaning out the EDATUMS that are not of interest (all EDATUMS in-between the first and the last of every used drug).
ID | ATC | EDATUM |
1 | A | 2010-01-01 |
1 | A | 2010-02-02 |
1 | A | 2010-03-03 |
1 | B | 2010-04-04 |
1 | B | 2010-05-05 |
1 | B | 2010-06-06 |
1 | A | 2010-07-07 |
1 | A | 2010-08-08 |
1 | A | 2010-09-09 |
Thank you in advance!
Kind regards,
Hanne
Comment