Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data : How to select or tag only consecutive values of variables that satisfy a benchmark

    I have data that look like this.
    id testname result resultdatetime
    5 Some test 40 12/5/12 11:54
    5 Some test 39 4/1/13 9:24
    5 Some test 25 12/20/13 9:10
    5 Some test 100 7/10/14 16:39
    5 Some test 40 8/13/14 15:40
    8 Some test 40 2/28/13 10:59
    8 Some test 540 8/19/13 13:32
    8 Some test 40 6/19/14 8:27
    8 Some test 40 7/14/14 16:33
    8 Some test 40 2/10/15 8:44
    39 Some test 40 4/13/14 3:30
    39 Some test 40 5/11/14 8:10
    39 Some test 40 8/10/14 8:20
    39 Some test 77869 2/10/15 10:48
    57 Some test 40 7/19/12 17:00
    57 Some test 10563 12/9/13 17:04
    57 Some test 40 5/24/14 4:39
    64 Some test 40 7/2/13 11:10
    64 Some test 1128 12/2/13 16:36
    64 Some test 981 12/13/13 10:12
    64 Some test 435 1/13/14 10:02
    64 Some test 1439 1/27/14 15:15
    64 Some test 2755 4/14/14 17:05
    64 Some test 199 6/2/14 16:56
    64 Some test 49090 8/27/14 15:46
    64 Some test 40 11/3/14 15:12
    64 Some test 1041 2/12/15 16:28
    If any particular id has results <= 40 for six or more consecutive months, I want to know that. For example, for id==5 , result <=40 from 12/5/2012 to 12/20/2013, which is a year (>=6months)
    It is not sufficient to just know which ids have results like that but I also need to have that period marked/tagged in some way.
    I declared the data as an xtset but could not successfully use the commands within to fully benefit from that.

  • #2
    Saurabh: Please take some time to read and act on http://www.statalist.org/forums/help#stata so that you can present examples using dataex (SSC). Your data example is helpful, but requires some minutes of surgery to adapt to being manageable.

    I note that you have been pointed to the FAQ Advice twice in previous threads.

    The detail of time of day as well as daily date seems spurious precision when you are interested in 6 month intervals. I have compromised by working with daily dates and arbitrarily using 183 days as an interval.

    xtset is unlikely to help much here as your data are irregularly spaced in time, so none of the standard operators will help much.

    My main suggestion is that rangestat (SSC) can help you make progress here. If you set up an indicator for results <= 40 then it is easy to count how observations there are within a fixed time interval and also to sum the indicator within the same interval. Clearly if the count and the sum are equal, then all measurements in the interval qualify.

    Code:
    clear
    input id result str42 resultdatetime
    5        40    "12/5/12 11:54"
    5        39    "4/1/13 9:24"
    5        25    "12/20/13 9:10"
    5        100    "7/10/14 16:39"
    5        40    "8/13/14 15:40"
    8        40    "2/28/13 10:59"
    8        540    "8/19/13 13:32"
    8        40    "6/19/14 8:27"
    8        40    "7/14/14 16:33"
    8        40    "2/10/15 8:44"
    39        40    "4/13/14 3:30"
    39        40    "5/11/14 8:10"
    39        40    "8/10/14 8:20"
    39        77869    "2/10/15 10:48"
    57        40    "7/19/12 17:00"
    57        10563    "12/9/13 17:04"
    57        40    "5/24/14 4:39"
    64        40    "7/2/13 11:10"
    64        1128    "12/2/13 16:36"
    64        981    "12/13/13 10:12"
    64        435    "1/13/14 10:02"
    64        1439    "1/27/14 15:15"
    64        2755    "4/14/14 17:05"
    64        199    "6/2/14 16:56"
    64        49090    "8/27/14 15:46"
    64        40    "11/3/14 15:12"
    64        1041    "2/12/15 16:28"
    end
    
    gen ddate = daily(word(resultdatetime, 1), "MDY", 2050)
    gen lowres = result <= 40
    rangestat (sum) sum=lowres (count) count=lowres, by(id) interval(ddate 0 183)
    
    list, sepby(id)
    
         +-------------------------------------------------------------+
         | id   result   resultdatetime   ddate   lowres   sum   count |
         |-------------------------------------------------------------|
      1. |  5       40    12/5/12 11:54   19332        1     2       2 |
      2. |  5       39      4/1/13 9:24   19449        1     1       1 |
      3. |  5       25    12/20/13 9:10   19712        1     1       1 |
      4. |  5      100    7/10/14 16:39   19914        0     1       2 |
      5. |  5       40    8/13/14 15:40   19948        1     1       1 |
         |-------------------------------------------------------------|
      6. |  8       40    2/28/13 10:59   19417        1     1       2 |
      7. |  8      540    8/19/13 13:32   19589        0     0       1 |
      8. |  8       40     6/19/14 8:27   19893        1     2       2 |
      9. |  8       40    7/14/14 16:33   19918        1     1       1 |
     10. |  8       40     2/10/15 8:44   20129        1     1       1 |
         |-------------------------------------------------------------|
     11. | 39       40     4/13/14 3:30   19826        1     3       3 |
     12. | 39       40     5/11/14 8:10   19854        1     2       2 |
     13. | 39       40     8/10/14 8:20   19945        1     1       1 |
     14. | 39    77869    2/10/15 10:48   20129        0     0       1 |
         |-------------------------------------------------------------|
     15. | 57       40    7/19/12 17:00   19193        1     1       1 |
     16. | 57    10563    12/9/13 17:04   19701        0     1       2 |
     17. | 57       40     5/24/14 4:39   19867        1     1       1 |
         |-------------------------------------------------------------|
     18. | 64       40     7/2/13 11:10   19541        1     1       3 |
     19. | 64     1128    12/2/13 16:36   19694        0     0       6 |
     20. | 64      981   12/13/13 10:12   19705        0     0       5 |
     21. | 64      435    1/13/14 10:02   19736        0     0       4 |
     22. | 64     1439    1/27/14 15:15   19750        0     0       3 |
     23. | 64     2755    4/14/14 17:05   19827        0     0       3 |
     24. | 64      199     6/2/14 16:56   19876        0     1       3 |
     25. | 64    49090    8/27/14 15:46   19962        0     1       3 |
     26. | 64       40    11/3/14 15:12   20030        1     1       2 |
     27. | 64     1041    2/12/15 16:28   20131        0     0       1 |
         +-------------------------------------------------------------+
    For the announcement of rangestat see http://www.statalist.org/forums/foru...s-within-range and for other examples search the forum.
    Last edited by Nick Cox; 13 May 2016, 03:34.

    Comment


    • #3
      Another approach is to look for spells of low results and work out their lengths. The spell identification requires an xtset or tsset, but that can just be in terms of a sequence identifier. The lengths of the spells in days can be then be calculated directly.

      For this code to work you need tsspell from SSC.

      Code:
      clear
      input id result str42 resultdatetime
      5        40    "12/5/12 11:54"
      5        39    "4/1/13 9:24"
      5        25    "12/20/13 9:10"
      5        100    "7/10/14 16:39"
      5        40    "8/13/14 15:40"
      8        40    "2/28/13 10:59"
      8        540    "8/19/13 13:32"
      8        40    "6/19/14 8:27"
      8        40    "7/14/14 16:33"
      8        40    "2/10/15 8:44"
      39        40    "4/13/14 3:30"
      39        40    "5/11/14 8:10"
      39        40    "8/10/14 8:20"
      39        77869    "2/10/15 10:48"
      57        40    "7/19/12 17:00"
      57        10563    "12/9/13 17:04"
      57        40    "5/24/14 4:39"
      64        40    "7/2/13 11:10"
      64        1128    "12/2/13 16:36"
      64        981    "12/13/13 10:12"
      64        435    "1/13/14 10:02"
      64        1439    "1/27/14 15:15"
      64        2755    "4/14/14 17:05"
      64        199    "6/2/14 16:56"
      64        49090    "8/27/14 15:46"
      64        40    "11/3/14 15:12"
      64        1041    "2/12/15 16:28"
      end
      
      gen ddate = daily(word(resultdatetime, 1), "MDY", 2050)
      bysort id (ddate) : gen pseudot = _n
      xtset id pseudot
      tsspell, pcond(result <= 40)
      bysort id _spell (_seq) : gen duration = ddate[_N] - ddate[1] + 1 if _spell
      sort id pseudot
      list, sepby(id)
      
           +----------------------------------------------------------------------------------+
           | id   result   resultdatetime   ddate   pseudot   _seq   _spell   _end   duration |
           |----------------------------------------------------------------------------------|
        1. |  5       40    12/5/12 11:54   19332         1      1        1      0        381 |
        2. |  5       39      4/1/13 9:24   19449         2      2        1      0        381 |
        3. |  5       25    12/20/13 9:10   19712         3      3        1      1        381 |
        4. |  5      100    7/10/14 16:39   19914         4      0        0      0          . |
        5. |  5       40    8/13/14 15:40   19948         5      1        2      1          1 |
           |----------------------------------------------------------------------------------|
        6. |  8       40    2/28/13 10:59   19417         1      1        1      1          1 |
        7. |  8      540    8/19/13 13:32   19589         2      0        0      0          . |
        8. |  8       40     6/19/14 8:27   19893         3      1        2      0        237 |
        9. |  8       40    7/14/14 16:33   19918         4      2        2      0        237 |
       10. |  8       40     2/10/15 8:44   20129         5      3        2      1        237 |
           |----------------------------------------------------------------------------------|
       11. | 39       40     4/13/14 3:30   19826         1      1        1      0        120 |
       12. | 39       40     5/11/14 8:10   19854         2      2        1      0        120 |
       13. | 39       40     8/10/14 8:20   19945         3      3        1      1        120 |
       14. | 39    77869    2/10/15 10:48   20129         4      0        0      0          . |
           |----------------------------------------------------------------------------------|
       15. | 57       40    7/19/12 17:00   19193         1      1        1      1          1 |
       16. | 57    10563    12/9/13 17:04   19701         2      0        0      0          . |
       17. | 57       40     5/24/14 4:39   19867         3      1        2      1          1 |
           |----------------------------------------------------------------------------------|
       18. | 64       40     7/2/13 11:10   19541         1      1        1      1          1 |
       19. | 64     1128    12/2/13 16:36   19694         2      0        0      0          . |
       20. | 64      981   12/13/13 10:12   19705         3      0        0      0          . |
       21. | 64      435    1/13/14 10:02   19736         4      0        0      0          . |
       22. | 64     1439    1/27/14 15:15   19750         5      0        0      0          . |
       23. | 64     2755    4/14/14 17:05   19827         6      0        0      0          . |
       24. | 64      199     6/2/14 16:56   19876         7      0        0      0          . |
       25. | 64    49090    8/27/14 15:46   19962         8      0        0      0          . |
       26. | 64       40    11/3/14 15:12   20030         9      1        2      1          1 |
       27. | 64     1041    2/12/15 16:28   20131        10      0        0      0          . |
           +----------------------------------------------------------------------------------+

      Comment


      • #4
        Nick,
        Thank you for the solutions. I will definitely keep in mind the correct presentation of a query example using the tools available.
        The first solution does not work because for id==57, sum==count for observations 1 and 3 but it is not something I am looking for since that does not qualify for an uninterrupted period of six months or more with the low results (i.e. <=40)
        The second solution seems to take care of that. Counting the spells of a certain value etc. being satisfied (<=40 in my case) and then using the spell result/count to whatever advantage (at least six months of a spell in my case.)
        As for the date precisions being spurious; if someone has a low result on say Dec 16, 2001 and the next low result is less than six months away from THIS date, then that person does not qualify. But if it is on say, June 20, 2002 then it is AFTER six months and this person qualifies for having <=40 for at least six months. Maybe I did not understand in what sense you said they were of spurious precision. Can you please explain how?
        On a related note, the time was needed to generate the pseudot for the _n, since many patients had two of the same tests on the same day, in which case, timevar would not be unique within each panel. That is why I kept them, just in case, xtset or tsset are a part of the solution.
        Towards the final steps of my exercise, I believe I can simply drop the obserrvationswhere duration==. or duration<183 and then mark the last date of any spell. Hopefully, I will have only one spell per person but if not then I have to select the latest spell and delete all the earlier ones. And then finally merge these data with survey result data for the same patients, the final goal being finding out how many people with certain selected answers to the survey had such spell of six months or more with low results, with the survey date within those six months. A long way to Mars yet, but your solution has definitely launched me in the correct trajectory into space.
        Thank you again.
        Best,
        Saurabh

        Comment


        • #5
          One other question, does pcond accept multiple conditions? I realised that <=40 is not my only condition.
          Test A, <=75,
          Test B, <=400,
          Test C and Test D, <=40
          Simply put, can I use a result condition and a test name condition (using strmatch) and separate them by | (or) to put multiple such pairs of conditions within pcond?
          something like
          tsspell, pcond ( ( result<=75 & strmatch(testname,"Test A")==1 ) | ( result<=400 & strmatch(testname, "Test B")==1 ) | ( result<=40 & ( strmatch(testname,"Test C")==1 | strmatch(testname,"Test D")==1 ) ) )
          Does that work?
          I mean, the code executed but my question to you as the author is, will it execute as intended for these purposes?
          Thanks.

          Comment

          Working...
          X