Dear Statalisters,
I am making a Counting process (long format) dataset from routine surveillance data from a Drug Resistant TB database. Patients are put on drug regimens consisting of multiple drugs at different time points. The surveillance system captures this as drug* and drugstartdate*. Each variable may be for multiple different drugs but starting on the same date. I wish to concatenate the drugnames into one list on the condition the drugstartdates are the same. This can then be repeated for proceeding drug regimens.
I am hoping for the end result to be something like:
drugregimen1 = "bedaquiline; Isiniozid; B6" drugregimenstart1 ="14 Jan 2022"
drugregimen2 = "Lenozalid; bedaquiline; High Dose INH" drugregimenstart2= "20 Oct 2022"
Initially I have rudimentary code :
on data
which returns
So far so good but the problems start now.
The rudimentary code won't stop if the drugstardate changes and it will stop if there are differing numbers of drugs in each regimen. For instance when there are missing variables or changes in drug startdate:
I have tried a loop :
This doesnt work, Stata doesnt seem to recognise the `i+1'
I have tried a more manual version
but this code simply lumps all the drugs that the patient has ever been on.
If anyone has an idea on an elegant piece of code that will be able to lump the drug regimens together by their corresponding drugstartdates I would be most grateful...
Kind Regards
Brian Brummer
I am making a Counting process (long format) dataset from routine surveillance data from a Drug Resistant TB database. Patients are put on drug regimens consisting of multiple drugs at different time points. The surveillance system captures this as drug* and drugstartdate*. Each variable may be for multiple different drugs but starting on the same date. I wish to concatenate the drugnames into one list on the condition the drugstartdates are the same. This can then be repeated for proceeding drug regimens.
I am hoping for the end result to be something like:
drugregimen1 = "bedaquiline; Isiniozid; B6" drugregimenstart1 ="14 Jan 2022"
drugregimen2 = "Lenozalid; bedaquiline; High Dose INH" drugregimenstart2= "20 Oct 2022"
Initially I have rudimentary code :
Code:
egen drugreg1 = concat(drugname1 drugname2 drugname3 ) if drugstartdate1==drugstartdate2 &drugstartdate1==drugstartdate3, punct(;)
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str23 drugname1 int drugstartdate1 str23 drugname2 int drugstartdate2 str23 drugname3 int drugstartdate3 "Bedaquiline" 22573 "Levofloxacin" 22573 "Linezolid" 22573 "Bedaquiline" 22581 "Levofloxacin" 22581 "Linezolid" 22581 "Bedaquiline" 22824 "Levofloxacin" 22824 "Linezolid" 22824 "Bedaquiline" 22816 "Levofloxacin" 22816 "Linezolid" 22816 "Bedaquiline" 22806 "Levofloxacin" 22806 "Linezolid" 22806 "Levofloxacin" 22494 "Bedaquiline" 22494 "Clofazimine" 22494 "Linezolid" 22307 "Bedaquiline" 22307 "Terizidone" 22307 "Bedaquiline" 22564 "Levofloxacin" 22564 "Linezolid" 22564 "Bedaquiline" 22540 "Levofloxacin" 22540 "Linezolid" 22540 "Bedaquiline" 22839 "Levofloxacin" 22839 "Linezolid" 22839 end format %td drugstartdate1 format %td drugstartdate2 format %td drugstartdate3
Code:
+--------------------------------------+ | drugreg1 | |--------------------------------------| 1. | Bedaquiline;Levofloxacin;Linezolid | 2. | Bedaquiline;Levofloxacin;Linezolid | 3. | Bedaquiline;Levofloxacin;Linezolid | 4. | Bedaquiline;Levofloxacin;Linezolid | 5. | Bedaquiline;Levofloxacin;Linezolid | |--------------------------------------| 6. | Levofloxacin;Bedaquiline;Clofazimine | 7. | Linezolid;Bedaquiline;Terizidone | 8. | Bedaquiline;Levofloxacin;Linezolid | 9. | Bedaquiline;Levofloxacin;Linezolid | 10. | Bedaquiline;Levofloxacin;Linezolid | +--------------------------------------+
The rudimentary code won't stop if the drugstardate changes and it will stop if there are differing numbers of drugs in each regimen. For instance when there are missing variables or changes in drug startdate:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str23 drugname6 int drugstartdate6 str21 drugname7 long drugstartdate7 str23 drugname8 int drugstartdate8 str23 drugname9 int drugstartdate9 str21 drugname10 int drugstartdate10 str23 drugname11 int drugstartdate11 "Pyrazinamide" 22573 "High dose INH" 22573 "" . "" . "" . "" . "Pyrazinamide" 22581 "High dose INH" 22581 "" . "" . "" . "" . "Pyrazinamide" 22824 "High dose INH" 22824 "Bedaquiline" 22831 "Levofloxacin" 22831 "Clofazimine" 22831 "Terizidone" 22831 "Pyrazinamide" 22816 "High dose INH" 22816 "PYRIDIXINE" 22816 "" . "" . "" . "Pyrazinamide" 22806 "High dose INH" 22806 "B6" 22806 "" . "" . "" . "Ethambutol" 22494 "Pyrazinamide" 22494 "" . "" . "" . "" . "" . "" . "" . "" . "" . "" . "High dose INH" 22564 "" . "" . "" . "" . "" . "Pyrazinamide" 22540 "High dose INH" 22540 "Pyridoxine" 22540 "Bco" 22540 "" . "" . "Pyrazinamide" 22839 "High dose INH" 22839 "PYRIDOXINE" 22839 "" . "" . "" . end format %td drugstartdate6 format %td drugstartdate7 format %td drugstartdate8 format %td drugstartdate9 format %td drugstartdate10 format %td drugstartdate11
I have tried a loop :
Code:
capture drop drugreg* forval i=1/5{ egen drugreg`i'= concat(drugname`i' drugname`i+1' drugname`i+2') if drugstartdate`i' ==drugstartdate`i+1' & drugstartdate`i+2', punct(;) } list drugreg1 drugreg2 drugreg3 in 1/10
I have tried a more manual version
Code:
order drugname* order drugstartdate* capture drop drugreg1 egen drugreg1 = concat(drugname1-drugname56) if drugstartdate1<=drugstartdate2 & drugstartdate1<=drugstartdate3 & drugstartdate1<=drugstartdate4 & drugstartdate1<=drugstartdate5 & drugstartdate1<=drugstartdate6 & drugstartdate1<=drugstartdate7 & drugstartdate1<=drugstartdate8 & drugstartdate1<=drugstartdate9 & drugstartdate1<=drugstartdate10, punct(;) list drugreg1 in 1/10
If anyone has an idea on an elegant piece of code that will be able to lump the drug regimens together by their corresponding drugstartdates I would be most grateful...
Kind Regards
Brian Brummer
Comment