data management: identifying subjects experiencing event A, followed, after certain time X, by event B, with no further record of event A

Abdul-Kareem Abdul-Rahman

Join Date: Sep 2015

Posts: 24
#1

data management: identifying subjects experiencing event A, followed, after certain time X, by event B, with no further record of event A

20 Jun 2016, 06:03

Hi all,

I would be grateful if you could consider the following enquiry. I am a second year PhD student researching the epidemiology of mortality of alcohol misusers within primary care (general practice) records. My record is structured as per the following excerpt.

+-------------------------------------------------------------------------------------------------------------------------------------------+
| patid eeventd~e readterm cohortcodes lrtag fdtag lr startlr LR |
|-------------------------------------------------------------------------------------------------------------------------------------------|
29. | 103001 20aug2008 Alcohol consumption low-risk 1 1 1 20aug2008 1 |
30. | 103001 15nov2012 Stopped drinking alcohol former-drinking 1 1 . . . |
31. | 103001 17oct2013 Alcohol use disorder identificatn test consumptn questionnre low-risk 1 1 1 . . |
32. | 103001 17oct2013 Teetotaller non-drinking 1 1 . . . |
33. | 103001 18mar2015 Teetotaller non-drinking 1 1 . . . |
+-------------------------------------------------------------------------------------------------------------------------------------------+

Key:
patid: subject/participant identifier
eeventdate: date of the event
readterm: name of event
cohortcodes: codes used to label alcohol related observations
lrtag: label of whether the participant has any record of ‘low risk drinking’
fdtag: label of whether the participant has any record of ‘former drinking’
LR: label indicating that the participant is considered a low risk drinker
startlr: start date of alcohol use exposure (here low risk drinker)

I am trying to identify patients in my dataset who fulfil the following patterns of sequence of events:

Generally, 1)Has experienced condition or event A,
2)Followed, after a certain time X, by condition or event B
3)And after event B or consecutive series of event Bs, no longer has event A

Specifically,
4)Has alcohol use event ( e.g. low risk drinking, hazardous drinking, alcohol use disorder, etc),
5)AND, at least 1 year after the record of alcohol use, has a record of ‘former drinking’ or ‘no drinking’,
6)AND after that record of ‘former drinking’ or ‘no drinking’ (or other consecutive records of ‘former drinking’ or ‘no drinking’), has no further record of alcohol use.

I have tried using several sources, including guidance on using stata for the data management of panel data but have not been able to solve my query.

Any help would be appreciated very much.

Thank you again.

Kareem

PS

Attached Files

Last edited by Abdul-Kareem Abdul-Rahman; 20 Jun 2016, 06:09. Reason: table formatting
Tags: None

Robert Picard

Join Date: Mar 2014
Posts: 1536

20 Jun 2016, 11:34

Sounds like you are looking for patients with at least one alcohol use event and who are not drinking anymore, with at least one non-drinking event being more than a year after the most recent drinking event.

The first step is to identify events of alcohol use. You did not use dataex (from SSC) to generate your data example so I'll assume that cohortcodes is of type string:

Code:

gen a_use = 0
replace a_use = 1 if cohortcodes == "low-risk"
replace a_use = 1 if cohortcodes == "hazardous drinking"
replace a_use = 1 if cohortcodes == "alcohol use disorder"

The next step is to find the most recent alcohol use event date. Assuming that edate is a Stata numeric date, that would be:

Code:

sort patid
by patid: egen recent_aue = max(edate / a_use)

Note that when a_use is 0 (this is not an alcohol use event), the result of the division is missing. With this "trick" (which I picked up from Nick Cox), the max is based only on event dates where a_use == 1.

If you then order patient observations by date, the most recent event will be last. You can reference the most recent value of any variable using the _N subscript (see help subscripting and help _variables). So you need to target patients who have a non-missing value for recent_aue (in other words, they have at least one alcohol event) and whose most recent event is a non-drinking event and that event is more than 365 days from the most recent alcohol event. This can be coded as:

Code:

sort patid edate a_use
by patid: gen target = !mi(recent_aue) &  a_use[_N] == 0  &  (edate[_N] - recent_aue) > 365

Putting it all together:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long patid str9 eeventdate str32 readterm str15 cohortcodes byte(lrtag fdtag lr) str9 startlr byte LR
103001 "20aug2008" "Alcohol consumption"              "low-risk"        1 1 1 "20aug2008" 1
103001 "15nov2012" "Stopped drinking alcohol"         "former-drinking" 1 1 . ""          .
103001 "17oct2013" "Alcohol use disorder identificat" "low-risk"        1 1 1 ""          .
103001 "17oct2013" "Teetotaller"                      "non-drinking"    1 1 . ""          .
103001 "18mar2015" "Teetotaller"                      "non-drinking"    1 1 . ""          .
end

* convert string dates to Stata numeric dates, see: help datetime
gen edate = daily(eeventdate,"DMY")
gen lrdate = daily(startlr,"DMY")
format %td edate lrdate

* leftalign is from SSC, to install, type in Command window: ssc install leftalign
leftalign

* identify alcohol use event
gen a_use = 0
replace a_use = 1 if cohortcodes == "low-risk"
replace a_use = 1 if cohortcodes == "hazardous drinking"
replace a_use = 1 if cohortcodes == "alcohol use disorder"

* date of most recent alcohol use event per patient
sort patid
by patid: egen recent_aue = max(edate / a_use)

* target patients whose most recent event is 1 year after recent_aue
sort patid edate a_use
by patid: gen target = !mi(recent_aue) &  a_use[_N] == 0  &  (edate[_N] - recent_aue) > 365

Comment

Abdul-Kareem Abdul-Rahman

Join Date: Sep 2015
Posts: 24

22 Jun 2016, 07:44

Dear Robert,

Thank you so much for your reply.

And apologies for the instruction on using dataex that I had missed previously.

If I may record an excerpt of my dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long patid float eeventdate str60 readterm int cohortcodes float(alcuse ndtag fdtag)
  103001 17764 "Alcohol consumption"                                           1 1 1 1
  103001 19312 "Stopped drinking alcohol"                                      2 0 1 1
  103001 19648 "Teetotaller"                                                   0 0 1 1
  103001 19648 "Alcohol use disorder identificatn test consumptn questionnre"  1 1 1 1
  103001 20165 "Teetotaller"                                                   0 0 1 1
  260001 17156 "Alcohol consumption"                                           1 1 . 1
  260001 17505 "Alcohol consumption"                                           1 1 . 1
  260001 17888 "Stopped drinking alcohol"                                      2 0 . 1
  260001 18253 "Drinks rarely"                                                 1 1 . 1
  260001 18618 "Drinks rarely"                                                 1 1 . 1
  260001 18984 "Drinks rarely"                                                 1 1 . 1
  260001 19367 "Trivial drinker - <1u/day"                                     1 1 . 1
  260001 19772 "Drinks rarely"                                                 1 1 . 1
  260001 20103 "Drinks rarely"                                                 1 1 . 1
 1583001 16159 "Drinks occasionally"                                           1 1 1 1
 1583001 16524 "Drinks rarely"                                                 1 1 1 1
 1583001 16785 "Teetotaller"                                                   0 0 1 1
 1583001 17153 "Teetotaller"                                                   0 0 1 1
 1583001 17587 "Teetotaller"                                                   0 0 1 1
 1583001 17953 "Alcohol consumption"                                           1 1 1 1
 1583001 18350 "Alcohol consumption"                                           1 1 1 1
 1583001 18484 "Drinks rarely"                                                 1 1 1 1
 1583001 18877 "Alcohol consumption"                                           1 1 1 1
 1583001 19256 "Drinks rarely"                                                 1 1 1 1
 1583001 19625 "Stopped drinking alcohol"                                      2 0 1 1
end
format %td eeventdate
label values cohortcodes cohortcodelab
label def cohortcodelab 0 "non-drinking", modify
label def cohortcodelab 1 "low-risk", modify
label def cohortcodelab 2 "former-drinking", modify
label def cohortcodelab 3 "increased risk", modify
label def cohortcodelab 4 "heavy drinking", modify
label def cohortcodelab 5 "hazardous", modify
label def cohortcodelab 6 "harmful", modify
label def cohortcodelab 7 "alcohol dependent", modify
label def cohortcodelab 8 "AUD", modify
label def cohortcodelab 9 "alcohol-misuse", modify
label def cohortcodelab 10 "drinking, level unknown", modify
label def cohortcodelab 11 "not known if drinking", modify

(
Key:
patid: subject/participant identifier
eeventdate: date of the event
readterm: name of event
cohortcodes: codes used to label alcohol related observations
alcuse: tag of whether the event is an alcohol drinking event or not
ndtag: label of whether the participant has any record of ‘non drinking’
fdtag: label of whether the participant has any record of ‘former drinking’
)

Thank you for the elegant guidance.

I'd just like to add that I think I may have not been specific enough regarding what I need.

I believe that the syntax you suggested would, as you mentioned, identify those patients whose most recent non-alcohol event is 1 year after the final alcohol-use event (recent_aue). Whereas I need to identify the first non-alcohol event after recent_aue.

I believe I have, with your help, been able to do this with the following syntax:

by patid: g fdinalcusetag = !mi(recent_alcuse) & (ndtag==1 | fdtag==1) // (recent_alcuse==a_use)

* identify patients' first record of non-alcohol use at least 1 year after the final (most recent) alcohol use

gsort patid eeventdate alcuse

by patid eeventdate: g alcusenil=1 if alcuse==0 /// tag for nondrinking observation
& ((eeventdate[_n] - recent_alcuse)>= 365) /// this confirms that it will only detect the observations occuring at least 1 year after the last alc use
& fdinalcusetag==1

gsort patid alcusenil eeventdate

by patid: g fdobs=_n if alcusenil==1

I hope this is ok?

Thank you so much.

Kareem

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

22 Jun 2016, 09:31

Thanks for the dataex generated example, it clarifies the type for both the eeventdate and cohortcodes variables. I'm not sure how the accuse, ndtag, and fdtag variables were generated so I'll proceed with my strategy of defining alcohol use using cohortcodes.

It seems like the data could include more than one non-alcohol events after the last alcohol use event. If that's the case, then you need to skip those that are within a year of the last alcohol use event. If you really want the first non-alcohol event one year or more after recent_aue, then you need to identify the first event in the group of events that are 1 year after recent_aue. In the example below, I've added a few observations at the end to make clear how the grouping occurs.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long patid float eeventdate str60 readterm int cohortcodes float(alcuse ndtag fdtag)
  103001 17764 "Alcohol consumption"                                           1 1 1 1
  103001 19312 "Stopped drinking alcohol"                                      2 0 1 1
  103001 19648 "Teetotaller"                                                   0 0 1 1
  103001 19648 "Alcohol use disorder identificatn test consumptn questionnre"  1 1 1 1
  103001 20165 "Teetotaller"                                                   0 0 1 1
  260001 17156 "Alcohol consumption"                                           1 1 . 1
  260001 17505 "Alcohol consumption"                                           1 1 . 1
  260001 17888 "Stopped drinking alcohol"                                      2 0 . 1
  260001 18253 "Drinks rarely"                                                 1 1 . 1
  260001 18618 "Drinks rarely"                                                 1 1 . 1
  260001 18984 "Drinks rarely"                                                 1 1 . 1
  260001 19367 "Trivial drinker - <1u/day"                                     1 1 . 1
  260001 19772 "Drinks rarely"                                                 1 1 . 1
  260001 20103 "Drinks rarely"                                                 1 1 . 1
 1583001 16159 "Drinks occasionally"                                           1 1 1 1
 1583001 16524 "Drinks rarely"                                                 1 1 1 1
 1583001 16785 "Teetotaller"                                                   0 0 1 1
 1583001 17153 "Teetotaller"                                                   0 0 1 1
 1583001 17587 "Teetotaller"                                                   0 0 1 1
 1583001 17953 "Alcohol consumption"                                           1 1 1 1
 1583001 18350 "Alcohol consumption"                                           1 1 1 1
 1583001 18484 "Drinks rarely"                                                 1 1 1 1
 1583001 18877 "Alcohol consumption"                                           1 1 1 1
 1583001 19256 "Drinks rarely"                                                 1 1 1 1
 1583001 19620 "Stopped drinking alcohol"                                      2 0 1 1
 1583001 19625 "Stopped drinking alcohol"                                      2 0 1 1
 1583001 19725 "Stopped drinking alcohol"                                      2 0 1 1
end
format %td eeventdate
label values cohortcodes cohortcodelab
label def cohortcodelab 0 "non-drinking", modify
label def cohortcodelab 1 "low-risk", modify
label def cohortcodelab 2 "former-drinking", modify
label def cohortcodelab 3 "increased risk", modify
label def cohortcodelab 4 "heavy drinking", modify
label def cohortcodelab 5 "hazardous", modify
label def cohortcodelab 6 "harmful", modify
label def cohortcodelab 7 "alcohol dependent", modify
label def cohortcodelab 8 "AUD", modify
label def cohortcodelab 9 "alcohol-misuse", modify
label def cohortcodelab 10 "drinking, level unknown", modify
label def cohortcodelab 11 "not known if drinking", modify

* define an alcohol use event negatively
gen a_use = !inlist(cohortcodes, 0, 2)

* date of most recent alcohol use event per patient
bysort patid: egen recent_aue = max(eeventdate / a_use)

* group events before and after recent_aue
by patid: gen nae_after = eeventdate > recent_aue

* in nae_after groups, tag events that are more than a year later
bysort patid nae_after: gen nae_1year = (eeventdate - recent_aue) > 365 & !mi(recent_aue)

* the first event of a nae_1year group
bysort patid nae_after nae_1year (eeventdate): gen nae_1year1 = _n == 1 & nae_1year == 1

Comment

Abdul-Kareem Abdul-Rahman

Join Date: Sep 2015
Posts: 24

23 Jun 2016, 12:49

Dear Robert,

Thank you for your continued help.

I was wondering that I've found that when I adjust the syntax provided,
gen a_use = !inlist(cohortcodes, 0, 2)

so that the alcohol use label, a_use, also excludes the indeterminate category (label of 11 "not known if drinking")

gen a_use = !inlist(cohortcodes, 0, 2, 11)

(this is better I think than allowing this indeterminate category to be considered as alcohol use),

it creates a situation where, in a few observations, this indeterminate category is considered the first nonalcohol event.

(examples given below)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long patid float eeventdate str60 readterm int cohortcodes float(a_use recent_alcuse nalcuse_1year nalcuse_1year1st)
 5715001 18240 "Alcohol consumption"                                           1 1 18240 0 0
 5715001 18801 "Pregnancy alcohol advice"                                     11 . 18240 1 1
 5948001 14906 "Alcohol consumption"                                           1 1 14906 0 0
 5948001 18948 "Teetotaller"                                                   0 0 14906 1 1
 5948001 18948 "Alcohol use disorder identificatn test consumptn questionnre" 11 . 14906 1 0
18583001 15259 "Alcohol consumption"                                           1 1 18561 0 0
18583001 16300 "Drinks rarely"                                                 1 1 18561 0 0
18583001 18561 "Drinks rarely"                                                 1 1 18561 0 0
18583001 18952 "Teetotaller"                                                   0 0 18561 1 1
18583001 19067 "Alcohol use disorder identificatn test consumptn questionnre" 11 . 18561 1 0
18583001 19067 "Teetotaller"                                                   0 0 18561 1 0
18583001 19667 "Teetotaller"                                                   0 0 18561 1 0
18654001 15273 "Non drinker alcohol"                                           0 0 15741 0 0
18654001 15741 "Drinks rarely"                                                 1 1 15741 0 0
18654001 19085 "Alcohol use disorder identificatn test consumptn questionnre" 11 . 15741 1 0
18654001 19085 "Teetotaller"                                                   0 0 15741 1 1
end
format %td eeventdate
format %td recent_alcuse
label values cohortcodes cohortcodelab
label def cohortcodelab 0 "non-drinking", modify
label def cohortcodelab 1 "low-risk", modify
label def cohortcodelab 2 "former-drinking", modify
label def cohortcodelab 11 "not known if drinking", modify

I would be grateful for a good solution to identify those patients who, after the last alcohol event, only has nonalcohol events (ie excluding those who have this indeterminate category after the last alcohol event)?

Thank you so much.

Kareem

Last edited by Abdul-Kareem Abdul-Rahman; 23 Jun 2016, 12:53.

Comment

Abdul-Kareem Abdul-Rahman

Join Date: Sep 2015

Posts: 24
#6

23 Jun 2016, 13:29

I believe this might do the job:

*starting from
by patid: g noalcuse_after = eeventdate > recent_alcuse

*tag those who have any indeterminate event after last alcohol use
bysort patid (eeventdate): g indfdtag=1 if noalcuse_after==1 & inlist(cohortcodes, 11, 9999)
bysort patid (indfdtag): replace indfdtag=indfdtag[1]

* in noalcuse_after groups with no indefinete events, tag events that are more than a year later than most recent/final alcohol use event
bysort patid noalcuse_after: g nalcuse_1year = (eeventdate - recent_alcuse) >= 365 & !mi(recent_alcuse) & indfdtag!=1

* the first event of a nalcuse_1year group
bysort patid noalcuse_after nalcuse_1year (eeventdate): g nalcuse_1year1st = _n==1 & nalcuse_1year == 1

Thanks,

Kareem
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

24 Jun 2016, 08:08

Here's how I would approach the problem. You do exactly as before but remove code 11 from the list of codes used to identify alcohol events. At the end, you deemphasize code 11 in the group of non-alcohol events that are 1 year after the last alcohol event by sorting the observations and putting the "not known if drinking" observations at the end of the group. Your pick is then, as before, the first observation of the group:

Code:

* define an alcohol use event negatively
gen a_use = !inlist(cohortcodes, 0, 2, 11)

* date of most recent alcohol use event per patient
bysort patid: egen recent_aue = max(eeventdate / a_use)

* group events before and after recent_aue
by patid: gen nae_after = eeventdate > recent_aue

* in nae_after groups, tag events that are more than a year later
bysort patid nae_after: gen nae_1year = (eeventdate - recent_aue) > 365 & !mi(recent_aue)

* the first event of a nae_1year group; deemphasize code 11 ("not known if drinking")
gen unknown = cohortcodes == 11
bysort patid nae_after nae_1year (unknown eeventdate): gen nae_1year1 = _n == 1 & nae_1year == 1 & !unknown

Comment

Abdul-Kareem Abdul-Rahman

Join Date: Sep 2015

Posts: 24
#8

27 Jun 2016, 07:16

Thank you so much Robert (as well as this forum itself) for all your vital help. It has helped me along where previously I have been quite stuck.

Regards,

Kareem
Comment

Announcement