Panel data : How to select or tag only consecutive values of variables that satisfy a benchmark

Saurabh Chavan

Join Date: Jun 2015
Posts: 25

Panel data : How to select or tag only consecutive values of variables that satisfy a benchmark

12 May 2016, 14:00

I have data that look like this.

id	testname	result	resultdatetime
5	Some test	40	12/5/12 11:54
5	Some test	39	4/1/13 9:24
5	Some test	25	12/20/13 9:10
5	Some test	100	7/10/14 16:39
5	Some test	40	8/13/14 15:40
8	Some test	40	2/28/13 10:59
8	Some test	540	8/19/13 13:32
8	Some test	40	6/19/14 8:27
8	Some test	40	7/14/14 16:33
8	Some test	40	2/10/15 8:44
39	Some test	40	4/13/14 3:30
39	Some test	40	5/11/14 8:10
39	Some test	40	8/10/14 8:20
39	Some test	77869	2/10/15 10:48
57	Some test	40	7/19/12 17:00
57	Some test	10563	12/9/13 17:04
57	Some test	40	5/24/14 4:39
64	Some test	40	7/2/13 11:10
64	Some test	1128	12/2/13 16:36
64	Some test	981	12/13/13 10:12
64	Some test	435	1/13/14 10:02
64	Some test	1439	1/27/14 15:15
64	Some test	2755	4/14/14 17:05
64	Some test	199	6/2/14 16:56
64	Some test	49090	8/27/14 15:46
64	Some test	40	11/3/14 15:12
64	Some test	1041	2/12/15 16:28

If any particular id has results <= 40 for six or more consecutive months, I want to know that. For example, for id==5 , result <=40 from 12/5/2012 to 12/20/2013, which is a year (>=6months)
It is not sufficient to just know which ids have results like that but I also need to have that period marked/tagged in some way.
I declared the data as an xtset but could not successfully use the commands within to fully benefit from that.

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35486

13 May 2016, 03:30

Saurabh: Please take some time to read and act on http://www.statalist.org/forums/help#stata so that you can present examples using dataex (SSC). Your data example is helpful, but requires some minutes of surgery to adapt to being manageable.

I note that you have been pointed to the FAQ Advice twice in previous threads.

The detail of time of day as well as daily date seems spurious precision when you are interested in 6 month intervals. I have compromised by working with daily dates and arbitrarily using 183 days as an interval.

xtset is unlikely to help much here as your data are irregularly spaced in time, so none of the standard operators will help much.

My main suggestion is that rangestat (SSC) can help you make progress here. If you set up an indicator for results <= 40 then it is easy to count how observations there are within a fixed time interval and also to sum the indicator within the same interval. Clearly if the count and the sum are equal, then all measurements in the interval qualify.

Code:

clear
input id result str42 resultdatetime
5        40    "12/5/12 11:54"
5        39    "4/1/13 9:24"
5        25    "12/20/13 9:10"
5        100    "7/10/14 16:39"
5        40    "8/13/14 15:40"
8        40    "2/28/13 10:59"
8        540    "8/19/13 13:32"
8        40    "6/19/14 8:27"
8        40    "7/14/14 16:33"
8        40    "2/10/15 8:44"
39        40    "4/13/14 3:30"
39        40    "5/11/14 8:10"
39        40    "8/10/14 8:20"
39        77869    "2/10/15 10:48"
57        40    "7/19/12 17:00"
57        10563    "12/9/13 17:04"
57        40    "5/24/14 4:39"
64        40    "7/2/13 11:10"
64        1128    "12/2/13 16:36"
64        981    "12/13/13 10:12"
64        435    "1/13/14 10:02"
64        1439    "1/27/14 15:15"
64        2755    "4/14/14 17:05"
64        199    "6/2/14 16:56"
64        49090    "8/27/14 15:46"
64        40    "11/3/14 15:12"
64        1041    "2/12/15 16:28"
end

gen ddate = daily(word(resultdatetime, 1), "MDY", 2050)
gen lowres = result <= 40
rangestat (sum) sum=lowres (count) count=lowres, by(id) interval(ddate 0 183)

list, sepby(id)

     +-------------------------------------------------------------+
     | id   result   resultdatetime   ddate   lowres   sum   count |
     |-------------------------------------------------------------|
  1. |  5       40    12/5/12 11:54   19332        1     2       2 |
  2. |  5       39      4/1/13 9:24   19449        1     1       1 |
  3. |  5       25    12/20/13 9:10   19712        1     1       1 |
  4. |  5      100    7/10/14 16:39   19914        0     1       2 |
  5. |  5       40    8/13/14 15:40   19948        1     1       1 |
     |-------------------------------------------------------------|
  6. |  8       40    2/28/13 10:59   19417        1     1       2 |
  7. |  8      540    8/19/13 13:32   19589        0     0       1 |
  8. |  8       40     6/19/14 8:27   19893        1     2       2 |
  9. |  8       40    7/14/14 16:33   19918        1     1       1 |
 10. |  8       40     2/10/15 8:44   20129        1     1       1 |
     |-------------------------------------------------------------|
 11. | 39       40     4/13/14 3:30   19826        1     3       3 |
 12. | 39       40     5/11/14 8:10   19854        1     2       2 |
 13. | 39       40     8/10/14 8:20   19945        1     1       1 |
 14. | 39    77869    2/10/15 10:48   20129        0     0       1 |
     |-------------------------------------------------------------|
 15. | 57       40    7/19/12 17:00   19193        1     1       1 |
 16. | 57    10563    12/9/13 17:04   19701        0     1       2 |
 17. | 57       40     5/24/14 4:39   19867        1     1       1 |
     |-------------------------------------------------------------|
 18. | 64       40     7/2/13 11:10   19541        1     1       3 |
 19. | 64     1128    12/2/13 16:36   19694        0     0       6 |
 20. | 64      981   12/13/13 10:12   19705        0     0       5 |
 21. | 64      435    1/13/14 10:02   19736        0     0       4 |
 22. | 64     1439    1/27/14 15:15   19750        0     0       3 |
 23. | 64     2755    4/14/14 17:05   19827        0     0       3 |
 24. | 64      199     6/2/14 16:56   19876        0     1       3 |
 25. | 64    49090    8/27/14 15:46   19962        0     1       3 |
 26. | 64       40    11/3/14 15:12   20030        1     1       2 |
 27. | 64     1041    2/12/15 16:28   20131        0     0       1 |
     +-------------------------------------------------------------+

For the announcement of rangestat see http://www.statalist.org/forums/foru...s-within-range and for other examples search the forum.

Last edited by Nick Cox; 13 May 2016, 03:34.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35486

13 May 2016, 04:40

Another approach is to look for spells of low results and work out their lengths. The spell identification requires an xtset or tsset, but that can just be in terms of a sequence identifier. The lengths of the spells in days can be then be calculated directly.

For this code to work you need tsspell from SSC.

Code:

clear
input id result str42 resultdatetime
5        40    "12/5/12 11:54"
5        39    "4/1/13 9:24"
5        25    "12/20/13 9:10"
5        100    "7/10/14 16:39"
5        40    "8/13/14 15:40"
8        40    "2/28/13 10:59"
8        540    "8/19/13 13:32"
8        40    "6/19/14 8:27"
8        40    "7/14/14 16:33"
8        40    "2/10/15 8:44"
39        40    "4/13/14 3:30"
39        40    "5/11/14 8:10"
39        40    "8/10/14 8:20"
39        77869    "2/10/15 10:48"
57        40    "7/19/12 17:00"
57        10563    "12/9/13 17:04"
57        40    "5/24/14 4:39"
64        40    "7/2/13 11:10"
64        1128    "12/2/13 16:36"
64        981    "12/13/13 10:12"
64        435    "1/13/14 10:02"
64        1439    "1/27/14 15:15"
64        2755    "4/14/14 17:05"
64        199    "6/2/14 16:56"
64        49090    "8/27/14 15:46"
64        40    "11/3/14 15:12"
64        1041    "2/12/15 16:28"
end

gen ddate = daily(word(resultdatetime, 1), "MDY", 2050)
bysort id (ddate) : gen pseudot = _n
xtset id pseudot
tsspell, pcond(result <= 40)
bysort id _spell (_seq) : gen duration = ddate[_N] - ddate[1] + 1 if _spell
sort id pseudot
list, sepby(id)

     +----------------------------------------------------------------------------------+
     | id   result   resultdatetime   ddate   pseudot   _seq   _spell   _end   duration |
     |----------------------------------------------------------------------------------|
  1. |  5       40    12/5/12 11:54   19332         1      1        1      0        381 |
  2. |  5       39      4/1/13 9:24   19449         2      2        1      0        381 |
  3. |  5       25    12/20/13 9:10   19712         3      3        1      1        381 |
  4. |  5      100    7/10/14 16:39   19914         4      0        0      0          . |
  5. |  5       40    8/13/14 15:40   19948         5      1        2      1          1 |
     |----------------------------------------------------------------------------------|
  6. |  8       40    2/28/13 10:59   19417         1      1        1      1          1 |
  7. |  8      540    8/19/13 13:32   19589         2      0        0      0          . |
  8. |  8       40     6/19/14 8:27   19893         3      1        2      0        237 |
  9. |  8       40    7/14/14 16:33   19918         4      2        2      0        237 |
 10. |  8       40     2/10/15 8:44   20129         5      3        2      1        237 |
     |----------------------------------------------------------------------------------|
 11. | 39       40     4/13/14 3:30   19826         1      1        1      0        120 |
 12. | 39       40     5/11/14 8:10   19854         2      2        1      0        120 |
 13. | 39       40     8/10/14 8:20   19945         3      3        1      1        120 |
 14. | 39    77869    2/10/15 10:48   20129         4      0        0      0          . |
     |----------------------------------------------------------------------------------|
 15. | 57       40    7/19/12 17:00   19193         1      1        1      1          1 |
 16. | 57    10563    12/9/13 17:04   19701         2      0        0      0          . |
 17. | 57       40     5/24/14 4:39   19867         3      1        2      1          1 |
     |----------------------------------------------------------------------------------|
 18. | 64       40     7/2/13 11:10   19541         1      1        1      1          1 |
 19. | 64     1128    12/2/13 16:36   19694         2      0        0      0          . |
 20. | 64      981   12/13/13 10:12   19705         3      0        0      0          . |
 21. | 64      435    1/13/14 10:02   19736         4      0        0      0          . |
 22. | 64     1439    1/27/14 15:15   19750         5      0        0      0          . |
 23. | 64     2755    4/14/14 17:05   19827         6      0        0      0          . |
 24. | 64      199     6/2/14 16:56   19876         7      0        0      0          . |
 25. | 64    49090    8/27/14 15:46   19962         8      0        0      0          . |
 26. | 64       40    11/3/14 15:12   20030         9      1        2      1          1 |
 27. | 64     1041    2/12/15 16:28   20131        10      0        0      0          . |
     +----------------------------------------------------------------------------------+

Comment

Saurabh Chavan

Join Date: Jun 2015

Posts: 25
#4

13 May 2016, 12:31

Nick,
Thank you for the solutions. I will definitely keep in mind the correct presentation of a query example using the tools available.
The first solution does not work because for id==57, sum==count for observations 1 and 3 but it is not something I am looking for since that does not qualify for an uninterrupted period of six months or more with the low results (i.e. <=40)
The second solution seems to take care of that. Counting the spells of a certain value etc. being satisfied (<=40 in my case) and then using the spell result/count to whatever advantage (at least six months of a spell in my case.)
As for the date precisions being spurious; if someone has a low result on say Dec 16, 2001 and the next low result is less than six months away from THIS date, then that person does not qualify. But if it is on say, June 20, 2002 then it is AFTER six months and this person qualifies for having <=40 for at least six months. Maybe I did not understand in what sense you said they were of spurious precision. Can you please explain how?
On a related note, the time was needed to generate the pseudot for the _n, since many patients had two of the same tests on the same day, in which case, timevar would not be unique within each panel. That is why I kept them, just in case, xtset or tsset are a part of the solution.
Towards the final steps of my exercise, I believe I can simply drop the obserrvationswhere duration==. or duration<183 and then mark the last date of any spell. Hopefully, I will have only one spell per person but if not then I have to select the latest spell and delete all the earlier ones. And then finally merge these data with survey result data for the same patients, the final goal being finding out how many people with certain selected answers to the survey had such spell of six months or more with low results, with the survey date within those six months. A long way to Mars yet, but your solution has definitely launched me in the correct trajectory into space.
Thank you again.
Best,
Saurabh
Comment
Saurabh Chavan

Join Date: Jun 2015

Posts: 25
#5

13 May 2016, 13:35

One other question, does pcond accept multiple conditions? I realised that <=40 is not my only condition.
Test A, <=75,
Test B, <=400,
Test C and Test D, <=40
Simply put, can I use a result condition and a test name condition (using strmatch) and separate them by | (or) to put multiple such pairs of conditions within pcond?
something like
tsspell, pcond ( ( result<=75 & strmatch(testname,"Test A")==1 ) | ( result<=400 & strmatch(testname, "Test B")==1 ) | ( result<=40 & ( strmatch(testname,"Test C")==1 | strmatch(testname,"Test D")==1 ) ) )
Does that work?
I mean, the code executed but my question to you as the author is, will it execute as intended for these purposes?
Thanks.
Comment

Announcement

Panel data : How to select or tag only consecutive values of variables that satisfy a benchmark

Comment

Comment

Comment

Comment