Identifying occurrence of different events within a time period

Eduardo Torre

Join Date: Jan 2016

Posts: 26
#1

Identifying occurrence of different events within a time period

01 Mar 2016, 09:03

Dear Statalist,

My longitudinal dataset is organized in the following way:

variables: ID, test (categorical 1/4), date (in %td format)

I am interested in flagging a person if within 6 months period all four different tests were performed. The problem is that in some cases a certain test has been performed several times, in other cases not at all. I will try to make it clear with the following example:

ID TEST DATE
1 1 11Jun2005
1 3 11Sep2008
1 1 11Sep2008
1 2 20Sep2008
1 4 20Nov2008
2 4 10Nov2009
2 4 13Dec2009
3 2 01Feb2011
3 2 01Mar2011
3 2 03Mar2011
3 4 03Mar2011
3 1 03Mar2011
3 1 05May2011

If you look at the example the first person has all the four test performed within a 6 months period of time (3,1,2,4. Row from 2 to 5) and, therefore, I should flag it positively. On the contrary, the third person doesn't have the test n.3 in that specific time interval, therefore, I cannot flag it.
I have tried to think of a way to do this on Stata but the fact that tests have been performed randomly and without a specific order is making it quite complicated for me.

Any idea how shall I sort this out?

Thanks,

E.
Tags: None

Robert Picard

Join Date: Mar 2014
Posts: 1536

01 Mar 2016, 10:09

I think this is a tricky problem to solve in Stata because you are looking for cases where ALL four tests have been performed within the time period. Also, you have multiple tests on a single day which makes it impossible to define the data as a panel and use tsegen (from SSC).

One approach is to create indicator variables for these tests. Then you can sum these indicators for observations within 6 month of the current observation. This would typically require looping over each observation. For example

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(ID TEST) str9 DATE int ndate
1 1 "11Jun2005" 16598
1 3 "11Sep2008" 17786
1 1 "11Sep2008" 17786
1 2 "20Sep2008" 17795
1 4 "20Nov2008" 17856
2 4 "10Nov2009" 18211
2 4 "13Dec2009" 18244
3 2 "01Feb2011" 18659
3 2 "01Mar2011" 18687
3 2 "03Mar2011" 18689
3 4 "03Mar2011" 18689
3 1 "03Mar2011" 18689
3 1 "05May2011" 18752
end
format %td ndate

* create indicators for each test
tab TEST, gen(test)

* do it by observation
gen flag1 = 0
forvalues i = 1/`=_N' {
    local n 0
    forvalues j = 1/4 {
        sum test`j' if inrange(ndate, ndate[`i'] - 182, ndate[`i']) & ID == ID[`i'], meanonly
        local n = `n' + (r(sum) > 0)
    }
    
    replace flag1 = 1 if `n' == 4 & _n == `i'
}

which generates

Code:

. list ID TEST ndate test* flag1, sepby(ID) noobs

  +---------------------------------------------------------------+
  | ID   TEST       ndate   test1   test2   test3   test4   flag1 |
  |---------------------------------------------------------------|
  |  1      1   11jun2005       1       0       0       0       0 |
  |  1      3   11sep2008       0       0       1       0       0 |
  |  1      1   11sep2008       1       0       0       0       0 |
  |  1      2   20sep2008       0       1       0       0       0 |
  |  1      4   20nov2008       0       0       0       1       1 |
  |---------------------------------------------------------------|
  |  2      4   10nov2009       0       0       0       1       0 |
  |  2      4   13dec2009       0       0       0       1       0 |
  |---------------------------------------------------------------|
  |  3      2   01feb2011       0       1       0       0       0 |
  |  3      2   01mar2011       0       1       0       0       0 |
  |  3      2   03mar2011       0       1       0       0       0 |
  |  3      4   03mar2011       0       0       0       1       0 |
  |  3      1   03mar2011       1       0       0       0       0 |
  |  3      1   05may2011       1       0       0       0       0 |
  +---------------------------------------------------------------+

I'm putting the final touches to a new program called rangestat that will make it easy to calculate statistics over observations that are within a range of value for a key variable. Here's an example of how it would be used for the problem at hand

Code:

. * define 6 month interval from the date of the current obs
. gen lowdate = ndate - 182

. gen highdate = ndate

. 
. * sum indicator variables for all obs within the 6 month range
. rangestat (sum) test*, interval(ndate lowdate highdate) by(ID)

. 
. * identify an obs with all tests done within the 6 month range
. gen flag2 = test1_sum & test2_sum & test3_sum & test4_sum

. 
. list ID TEST ndate test* flag*, sepby(ID) noobs

  +-------------------------------------------------------------------------------------------------------------------+
  | ID   TEST       ndate   test1   test2   test3   test4   test1_~m   test2_~m   test3_~m   test4_~m   flag1   flag2 |
  |-------------------------------------------------------------------------------------------------------------------|
  |  1      1   11jun2005       1       0       0       0          1          0          0          0       0       0 |
  |  1      3   11sep2008       0       0       1       0          1          0          1          0       0       0 |
  |  1      1   11sep2008       1       0       0       0          1          0          1          0       0       0 |
  |  1      2   20sep2008       0       1       0       0          1          1          1          0       0       0 |
  |  1      4   20nov2008       0       0       0       1          1          1          1          1       1       1 |
  |-------------------------------------------------------------------------------------------------------------------|
  |  2      4   10nov2009       0       0       0       1          0          0          0          1       0       0 |
  |  2      4   13dec2009       0       0       0       1          0          0          0          2       0       0 |
  |-------------------------------------------------------------------------------------------------------------------|
  |  3      2   01feb2011       0       1       0       0          0          1          0          0       0       0 |
  |  3      2   01mar2011       0       1       0       0          0          2          0          0       0       0 |
  |  3      2   03mar2011       0       1       0       0          1          3          0          1       0       0 |
  |  3      4   03mar2011       0       0       0       1          1          3          0          1       0       0 |
  |  3      1   03mar2011       1       0       0       0          1          3          0          1       0       0 |
  |  3      1   05may2011       1       0       0       0          2          3          0          1       0       0 |
  +-------------------------------------------------------------------------------------------------------------------+

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#3

01 Mar 2016, 10:13

There are a few aspects of your question that are unclear to me.

1. You don't say whether by 6 months you mean 183 days, or whether you mean within 6 calendar months, which is slightly different and more variable from date to date. I'll assume 183 days.

2. You say you want to flag a person if they have had all four tests within 6 months, but, for person #1, he/she does not actually achieve that status until his/her last observation on 20 Nov 2008. So I will not flag his/her earlier observations, only observations for which all four tests have been carried out within the 6 months preceding that date. If you want to convert that to a person-level flag for it ever having occurred, you can just apply -egen, min() by(id)- to my result.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(id test) int date 1 1 16598 1 3 17786 1 1 17786 1 2 17795 1 4 17856 2 4 18211 2 4 18244 3 2 18659 3 2 18687 3 2 18689 3 4 18689 3 1 18689 3 1 18752 end format %td date local 6_months 183 // DAYS IN 6 MONTHS forvalues t = 1/4 { by id (date), sort: gen test`t'_count = sum(test == `t') by id test`t'_count (date), sort: /// gen test`t'_past_6_mos = date-date[1] <= `6_months' & test`t'_count > 0 } egen all_4_tests_6_mos = rowmin(test*_past_6_mos) list id test date all_4_tests_6_mos

The logic is this: at each observation count up the number of times the person has had each test so far. Then, a new occurrence of this test takes place when the count increases. So now compare each date to the date when the count last changed. If the difference is less than or equal to 6 months, then the test has occurred within the past 6 months. So now we have it for each test separately. Taking the minimum of those separate results will give us a 1 only if all four of them are 1, and 0 otherwise.

The code requires that date be a Stata numeric date, not a string that is human-readable as a date. If your dates are strings, you need to convert them using the -daily()- function.

For future reference, please post example data using -dataex-, as I have done above. If you don't have -dataex- installed, run -ssc install dataex-. Then read the simple instructions in -help dataex-. It saves a lot of time and trouble for those who want to experiment with your data.

Note added: crossed in cyberspace with Robert Picard's solution, which is a different path to the same destination.
2 likes
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#4

01 Mar 2016, 10:28

Pretty slick Clyde!
Comment

Announcement