Hi!
Im currently working on a big data set (UK Biobank) with roughly 500 000 individuals. I have made a training set with 1000 individuals that I'm currently writing the code in. I want to know if one variable at baseline is associated with a specific disease over time. That disease in question has different codes, e.g I48, and I481-489. Linked to every diagnose is a column with a date (date for the diagnose). The problem is that there is 242 columns, whereof most has no data in it.
Im only interested in one of the diagnose and columns above, the one with the earliest date (since it is time to disease that are interesting).
The diagnose is in one of these columns: s_41270_0_0 - s_41270_0_242 and the date in this ts_41280_0_0 - ts_41280_0_242. They are linked, so the diagnose in s_41270_0_0 was given on the date that are shown in ts_41280_0_0 etc.
I have made a loop for s_41270_0_0, so that everyone that has either of those diagnose codes mentioned above are coded 1 for the diagnose (afib):
foreach var of varlist s_41270_0_0 - s_41270_0_242 {
recode afib (0=1) if (`var'=="I48" | `var'=="I480" | `var'=="I481" ///
| `var'=="I482" | `var'=="I483" | `var'=="I484" | `var'=="I485" ///
| `var'=="I486" | `var'=="I487" | `var'=="I488" | `var'=="I489")
}
Due to the large dataset I need a loop. What I need is to drop those who got the diagnose (41270-column) with afib=1 before a certain date (41280-column). What I think is tricky is that there are several diagnoses and also dates. And only in the column with the earliest date with one of those diagnoses are interesting. I hope I explained so you could understand. English isn't my first launguage.
Is it possible to make a loop for this in STATA?
Im currently working on a big data set (UK Biobank) with roughly 500 000 individuals. I have made a training set with 1000 individuals that I'm currently writing the code in. I want to know if one variable at baseline is associated with a specific disease over time. That disease in question has different codes, e.g I48, and I481-489. Linked to every diagnose is a column with a date (date for the diagnose). The problem is that there is 242 columns, whereof most has no data in it.
Im only interested in one of the diagnose and columns above, the one with the earliest date (since it is time to disease that are interesting).
The diagnose is in one of these columns: s_41270_0_0 - s_41270_0_242 and the date in this ts_41280_0_0 - ts_41280_0_242. They are linked, so the diagnose in s_41270_0_0 was given on the date that are shown in ts_41280_0_0 etc.
I have made a loop for s_41270_0_0, so that everyone that has either of those diagnose codes mentioned above are coded 1 for the diagnose (afib):
foreach var of varlist s_41270_0_0 - s_41270_0_242 {
recode afib (0=1) if (`var'=="I48" | `var'=="I480" | `var'=="I481" ///
| `var'=="I482" | `var'=="I483" | `var'=="I484" | `var'=="I485" ///
| `var'=="I486" | `var'=="I487" | `var'=="I488" | `var'=="I489")
}
Due to the large dataset I need a loop. What I need is to drop those who got the diagnose (41270-column) with afib=1 before a certain date (41280-column). What I think is tricky is that there are several diagnoses and also dates. And only in the column with the earliest date with one of those diagnoses are interesting. I hope I explained so you could understand. English isn't my first launguage.
Is it possible to make a loop for this in STATA?

Comment