No time overlapping, first date for each drug for the same participant

Howaida Fahmy

Join Date: Feb 2023
Posts: 14

No time overlapping, first date for each drug for the same participant

01 Apr 2023, 13:45

Hi, I have a data set that displayed long. I am doing study on prevalence of this medicines among thousands of indviduals,
Each participant has many drug prescriptions. I want to know the first date for each drug for the each indvidual.
I used bysort id (Edate):gen firstdate=Edate[1] to know the first date of prescription. Now I want to keep only the first date for these drugs. I tried to use this command bysort id (firstdate):gen want=_n
then
drop if want >1
but it drop the repeated ids. Could you please help me to keep only the first date of exposure to only one of these drugs.
Thanks in advance!

id	gender	Edate	year	age_5g		Asprin	Other anti- inflamatory	analgesic	corticosteriods		sedatives	firstdate
1256	2	60221	2006	30-40								16-Jan-06
1256	2	60406	2006	30-40	0	0	0	1	0	0	0	16-Jan-06
1256	2	60602	2006	30-40	0	0	0	1	0	0	0	16-Jan-06
1256	2	60725	2006	30-40								16-Jan-06
1256	2	60808	2006	30-40								16-Jan-06
1256	2	60811	2006	30-40								16-Jan-06
1256	2	61027	2006	30-40	0	0	0	1	0	0	0	16-Jan-06
1257	2	60116	2006	30-40	0	0	0	0	0	1	0	16-Jan-06
1257	2	60116	2006	30-40	0	1	0	0	0	0	0	16-Jan-06
1257	2	60116	2006	30-								16-Jan-06
1257	2	60116	2006	25-30	0	0	0	0	0	0	1	16-Jan-06
1257	2	60523	2006	25-30								16-Jan-06
1257	2	61101	2006	25-30								16-Jan-06
1257	2	61107	2006	25-30								16-Jan-06
1257	2	61230	2006	25-30	0	0	0	0	0	0	1	16-Jan-06

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#2

01 Apr 2023, 17:35

Your data example is not usable: it requires surgery to import into Stata. The helpful way to show example data is with the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

I also do not understand the variables as shown. The variable Edate is obscure: there is no obvious (to me) relationship between the numbers shown and any dates. How are the dates encoded in those numbers? I also don't understand why the firstdate variable you want should be the same for both of these id's. While I don't know which, if any, of the numbers in Edate corresponds to 16-Jan-06, it is clear just from inspection that there is no common value of Edate between these two id's, so I don't understand how they can have the same value of firstdate, unless Edate, despite its name, has nothing to do with the prescribing dates. But if that's the case, what variable does tell when the drugs were prescribed.

Please post back with example data using the -dataex- command, and clarify the issues I have raised.
1 like
Comment
Bader Bin Adwan

Join Date: Apr 2021

Posts: 91
#3

01 Apr 2023, 17:43

Code:

Help Keep
Comment
Howaida Fahmy

Join Date: Feb 2023

Posts: 14
#4

01 Apr 2023, 23:28

Hi again!
Thanks for your reply! The Edate is the exposure date for each medication and I want to keep the first date of exposure to only one of these drugs. Each participant my take one, two, or three medications which can be in different time or in the same time
clear
input double serial_N str1 gender int(Edate year) float(age_5g Asprin corticosteriods other_Analgesics diclophenac paracetamol firstdate)
7 "2" 17135 2006 4 0 0 1 0 0 16897
7 "2" 16897 2006 4 0 0 0 0 1 16897
7 "2" 16974 2006 4 0 0 1 0 0 16897
7 "2" 16897 2006 4 0 0 1 0 0 16897
7 "2" 16926 2006 4 0 0 1 0 0 16897
8 "2" 17142 2006 4 0 0 1 0 0 16824
8 "2" 16925 2006 4 0 0 1 0 0 16824
8 "2" 16824 2006 4 0 0 1 0 0 16824
8 "2" 17017 2006 4 0 0 1 0 0 16824
10 "2" 17128 2006 5 . . . . . 16820
10 "2" 16959 2006 5 . . . . . 16820
10 "2" 17094 2006 5 . . . . . 16820
Thanks

Last edited by Howaida Fahmy; 01 Apr 2023, 23:34.
Comment
Howaida Fahmy

Join Date: Feb 2023

Posts: 14
#5

02 Apr 2023, 02:49

Hi,
I Want just to clarify that, I created the first date from the following command bysort serial_N (Edate): gen firstdate = [1]
it was not in the dataset from the beginning.
Thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#6

02 Apr 2023, 09:53

There are still a few things that are unclear. For serial_N 10, we have only missing values for the drugs, so we do not know whether this person was exposed to any drugs or not, and if he/she was, which date would have been first. Reasonable approaches to this situation in principle would include dropping that person from the data altogether, or retaining the person but with the value of the first date set to missing. I will assume you prefer the latter. It is also unclear what you wish to do with the information about the drugs themselves. In particular, if the person was exposed to two or more drugs on the first date in which they were exposed to any, do you want to create a new observation that indicates all of the drugs that were begun on that date. Or is the information about the particular drugs not needed at this point (and later) in your work? I will assume the former.

With those assumptions about what you want:

Code:

collapse (max) Asprin-paracetamol (first) gender age_5g, by(serial_N Edate) by serial_N (Edate): keep if _n == 1
Comment
Howaida Fahmy

Join Date: Feb 2023

Posts: 14
#7

02 Apr 2023, 10:56

Hi again,
Thanks for your reply and your help. let's make my plan more clear, actually, in this study we as a research team will investigate the prevalence for each drug during each year under the study period and we have a suggestion to define the exposure : First date of the drug prescription. And the suggested definition of non exposed: Not any drug for 6 months. We don't want any possibility of drug overlaping. The time line of prescription is important (when the individual was exposed to only one drug) .So what I have posted before was the way I was thinking to proceed but I was not quiet sure that I am doing it the right way. So I wonder if I use the syntax you have posted will do what I want. Could you help me please?
Thanks a gain!

Last edited by Howaida Fahmy; 02 Apr 2023, 11:05.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#8

02 Apr 2023, 11:20

What you are describing here is very different from what you said earlier, or at least as I understand things. The code shown in #6 will not do what I understand you to be asking in #7. In fact, in your example data, there is only one instance of a person being exposed to one of the drugs without also having been exposed to another drug in the preceding 6 months (which I actually define as 183 days). That would be serial_N 8, exposed to "other_Analgesics" on 23 Jan 2006. If this is what you are looking to do, the code would be:

Code:

gen `c(obs_t)' obs_no = _n rename (Asprin-paracetamol) exposed= reshape long exposed, i(obs_no) j(drug) string keep if exposed == 1 rangestat (count) n_other_drugs = obs_no, interval(Edate -182 0) /// by(serial_N) excludeself keep if n_other_drugs == 0

-rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.
Comment
Howaida Fahmy

Join Date: Feb 2023

Posts: 14
#9

02 Apr 2023, 12:13

Hi,
Thanks a lot! I tried the syntax in#6,but I think this was not the result I want because I want to investigate the prevalence of taking multiple drugs (What is the prevalence of taking more than two drugs) as well. I will try to execute the syntax in #7 and get back to you. Sorry for confusing you!
thanks for your effort and advice.

Last edited by Howaida Fahmy; 02 Apr 2023, 12:48.
Comment
Howaida Fahmy

Join Date: Feb 2023

Posts: 14
#10

02 Apr 2023, 13:25

Hi,
by(serial_N): excludeself was not working and bysort serial_N: excludeself as well. Further, I want to know, this is just for my self , if I need to keep the first time of exposure only to the drug ,not the second or third time, what could be the suitable syntax for this.
Best,
Howaida

Last edited by Howaida Fahmy; 02 Apr 2023, 14:18.
Comment
Howaida Fahmy

Join Date: Feb 2023

Posts: 14
#11

02 Apr 2023, 15:23

Hi,
One more thing we discussed in our team that we need the analysis per individual, which means having each individual in one row ( each serial no in one row). Any help will be appreciated.
thanks!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#12

02 Apr 2023, 16:34

OK. There is no command -by(serial_N): excludeself- in the code. The code shown does include:

Code:

rangestat (count) n_other_drugs = obs_no, interval(Edate -182 0) /// by(serial_N) excludeself

Notice that there is no : after by(serial_N) here. And the /// at the end of that first line means that the next line is a continuation of the same command. It is always my intention when posting code here that it be run from the do-file editor. And if that is not what you have done, you should go back and do it that way. If you are working from the Command window typing it in line by line, then, no, the code will not work because in the Command window continuations on the next line are not permitted.

Further, I want to know, this is just for my self , if I need to keep the first time of exposure only to the drug ,not the second or third time, what could be the suitable syntax for this. [#9]
One more thing we discussed in our team that we need the analysis per individual, which means having each individual in one row ( each serial no in one row).[#10]

Now I am even more confused what you want. From what you said earlier, I understood that you wanted to capture the first time a person was exposed to one of the drugs in your data in circumstances where they had not been exposed to any drug in the preceding 6 months. So if somebody was exposed for the first time to one drug on, say, January 15, and then with no other intervening observations, was exposed to a different drug on, say, July 31, more than 6 months would have gone by, so this exposure, too, would count. So this means that some individuals will have more than one observation to be captured.

It is clear we are having severe difficulties communicating here. I have no understanding of what you are trying to get, and I have the sense that you do not understand my responses and questions. Let's give it one more try. In order to avoid wasting your time or mine, you need to post a small example data set (the same thing you posted in #4 will do if you believe it is representative of the data as a whole), and you must also show exactly what results you want to obtain from it. No more descriptions or explanations in words--clearly that is not working for us. Show exactly what the results should look like for that example data.
Comment
Howaida Fahmy

Join Date: Feb 2023

Posts: 14
#13

03 Apr 2023, 00:42

Thanks a lot! I know that I wasted a lot of your time, I think it is working perfect now when I executed the syntax in the last post. Still one final thing, in case you have a time, I want to know the prevalence of taking multiple drugs(How many are taking more than two of these drugs), is there a way to know that?
clear
input double serial_N str1 gender int(Edate year) float(age_5g Asprin corticosteriods other_Analgesics diclophenac paracetamol firstdate)
7 "2" 16897 2006 4 0 0 1 0 0 16897
7 "2" 16897 2006 4 0 0 0 0 1 16897
7 "2" 16926 2006 4 0 0 1 0 0 16897
7 "2" 16974 2006 4 0 0 1 0 0 16897
7 "2" 17135 2006 4 0 0 1 0 0 16897
8 "2" 16824 2006 4 0 0 1 0 0 16824
8 "2" 16925 2006 4 0 0 1 0 0 16824
8 "2" 17017 2006 4 0 0 1 0 0 16824
8 "2" 17142 2006 4 0 0 1 0 0 16824
10 "2" 16820 2006 5 . . . . . 16820
10 "2" 16924 2006 5 . . . . . 16820
10 "2" 16959 2006 5 . . . . . 16820
Thanks
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30066

#14

03 Apr 2023, 12:04

Code:

egen n_drugs = rowtotal(Asprin-paracetamol), missing

by serial_N, sort: egen ever_multiple_drugs = max(n_drugs > 2)
egen person_tag = tag(serial_N)
tab ever_multiple_drugs if person_tag

Comment

Howaida Fahmy

Join Date: Feb 2023

Posts: 14
#15

03 Apr 2023, 23:13

Hi,
Really, I cannot thank you enough!
Thanks,
Best,
Comment

Announcement

No time overlapping, first date for each drug for the same participant

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment