Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying gaps in dates of a certain size in a panel dataset with repeated datetime variables

    Hi all - back with more panel & date woes. I have a dataset with many, often repeated datetimes within individuals. For most individuals, these datetimes are mostly runs of observations logged on either the same day or the day before or after, so a pretty continuous string of dates. But, some individuals have gaps between dates of 30 or more days. Most often when these gaps occur, people have an observation or two on the earliest dates logged for them in the dataset, and then the big gap, and then the continuous run of dates. I basically want to identify and drop any dates that don't occur within the relatively continuous run.

    The trick is that, like I mentioned, there are sometimes duplicate datetimes within IDs that I need to keep because they represent different actions. If I didn't have those, I might have been able to tsset and used tsspell using a +/-2 days rule to group spells within IDs. But, I can't do that since I have repeated datetime values that I want to retain. That also means I can't use the lead/lag operators I normally could in timeseries data, too.

    Any ideas about how I might solve this?

    Here's how the data look:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int participant_id float(recorded_date datemin datemax day daymax daygap daygapmax)
    4049 20859 20859 23342    1 2484    . 2404
    4049 23263 20859 23342 2405 2484 2404 2404
    4049 23265 20859 23342 2407 2484    2 2404
    4049 23287 20859 23342 2429 2484   22 2404
    4049 23288 20859 23342 2430 2484    1 2404
    4049 23289 20859 23342 2431 2484    1 2404
    4049 23290 20859 23342 2432 2484    1 2404
    4049 23291 20859 23342 2433 2484    1 2404
    4049 23292 20859 23342 2434 2484    1 2404
    4049 23293 20859 23342 2435 2484    1 2404
    3065 22732 22732 23318    1  587    .  522
    3065 23254 22732 23318  523  587  522  522
    3065 23255 22732 23318  524  587    1  522
    3065 23256 22732 23318  525  587    1  522
    3065 23257 22732 23318  526  587    1  522
    3065 23258 22732 23318  527  587    1  522
    3065 23259 22732 23318  528  587    1  522
    3065 23260 22732 23318  529  587    1  522
    3065 23261 22732 23318  530  587    1  522
    3065 23262 22732 23318  531  587    1  522
    3046 22911 22911 23342    1  432    .  364
    3046 23275 22911 23342  365  432  364  364
    3046 23276 22911 23342  366  432    1  364
    3046 23276 22911 23342  366  432    0  364
    3046 23276 22911 23342  366  432    0  364
    3046 23276 22911 23342  366  432    0  364
    3046 23277 22911 23342  367  432    0  364
    3046 23277 22911 23342  367  432    1  364
    3046 23277 22911 23342  367  432    0  364
    3046 23277 22911 23342  367  432    0  364
    3319 22919 22919 23317    1  399    .  365
    3319 23284 22919 23317  366  399  365  365
    3319 23286 22919 23317  368  399    2  365
    3319 23286 22919 23317  368  399    0  365
    3319 23286 22919 23317  368  399    0  365
    3319 23286 22919 23317  368  399    0  365
    3319 23286 22919 23317  368  399    0  365
    3319 23286 22919 23317  368  399    0  365
    3319 23286 22919 23317  368  399    0  365
    3319 23286 22919 23317  368  399    0  365
     964 22990 22990 23340    1  351    .  240
     964 22990 22990 23340    1  351    0  240
     964 23230 22990 23340  241  351  240  240
     964 23230 22990 23340  241  351    0  240
     964 23230 22990 23340  241  351    0  240
     964 23230 22990 23340  241  351    0  240
     964 23230 22990 23340  241  351    0  240
     964 23230 22990 23340  241  351    0  240
     964 23230 22990 23340  241  351    0  240
     964 23230 22990 23340  241  351    0  240
    end
    format %tdNN/DD/YY recorded_date
    format %tdNN/DD/YY datemin
    format %tdNN/DD/YY datemax
    Here's what I'm trying to achieve:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int participant_id float(recorded_date datemin datemax day daymax daygap daygapmax drop)
    4049 20859 20859 23342    1 2484    . 2404 1
    4049 23263 20859 23342 2405 2484 2404 2404 0
    4049 23265 20859 23342 2407 2484    2 2404 0
    4049 23287 20859 23342 2429 2484   22 2404 0
    4049 23288 20859 23342 2430 2484    1 2404 0
    4049 23289 20859 23342 2431 2484    1 2404 0
    4049 23290 20859 23342 2432 2484    1 2404 0
    4049 23291 20859 23342 2433 2484    1 2404 0
    4049 23292 20859 23342 2434 2484    1 2404 0
    4049 23293 20859 23342 2435 2484    1 2404 0
    3065 22732 22732 23318    1  587    .  522 1
    3065 23254 22732 23318  523  587  522  522 0
    3065 23255 22732 23318  524  587    1  522 0
    3065 23256 22732 23318  525  587    1  522 0
    3065 23257 22732 23318  526  587    1  522 0
    3065 23258 22732 23318  527  587    1  522 0
    3065 23259 22732 23318  528  587    1  522 0
    3065 23260 22732 23318  529  587    1  522 0
    3065 23261 22732 23318  530  587    1  522 0
    3065 23262 22732 23318  531  587    1  522 0
    3046 22911 22911 23342    1  432    .  364 1
    3046 23275 22911 23342  365  432  364  364 0
    3046 23276 22911 23342  366  432    1  364 0
    3046 23276 22911 23342  366  432    0  364 0
    3046 23276 22911 23342  366  432    0  364 0
    3046 23276 22911 23342  366  432    0  364 0
    3046 23277 22911 23342  367  432    0  364 0
    3046 23277 22911 23342  367  432    1  364 0
    3046 23277 22911 23342  367  432    0  364 0
    3046 23277 22911 23342  367  432    0  364 0
    3319 22919 22919 23317    1  399    .  365 1
    3319 23284 22919 23317  366  399  365  365 0
    3319 23286 22919 23317  368  399    2  365 0
    3319 23286 22919 23317  368  399    0  365 0
    3319 23286 22919 23317  368  399    0  365 0
    3319 23286 22919 23317  368  399    0  365 0
    3319 23286 22919 23317  368  399    0  365 0
    3319 23286 22919 23317  368  399    0  365 0
    3319 23286 22919 23317  368  399    0  365 0
    3319 23286 22919 23317  368  399    0  365 0
     964 22990 22990 23340    1  351    .  240 1
     964 22990 22990 23340    1  351    0  240 1
     964 23230 22990 23340  241  351  240  240 0
     964 23230 22990 23340  241  351    0  240 0
     964 23230 22990 23340  241  351    0  240 0
     964 23230 22990 23340  241  351    0  240 0
     964 23230 22990 23340  241  351    0  240 0
     964 23230 22990 23340  241  351    0  240 0
     964 23230 22990 23340  241  351    0  240 0
     964 23230 22990 23340  241  351    0  240 0
    end
    format %tdNN/DD/YY recorded_date
    format %tdNN/DD/YY datemin
    format %tdNN/DD/YY datemax
    Thank you for any help!

  • #2
    Just change the rules. Count observations in sequences and look for gaps of 30 or more days.

    tsspell is from SSC.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int participant_id float(recorded_date datemin datemax day daymax daygap daygapmax drop)
    4049 20859 20859 23342    1 2484    . 2404 1
    4049 23263 20859 23342 2405 2484 2404 2404 0
    4049 23265 20859 23342 2407 2484    2 2404 0
    4049 23287 20859 23342 2429 2484   22 2404 0
    4049 23288 20859 23342 2430 2484    1 2404 0
    4049 23289 20859 23342 2431 2484    1 2404 0
    4049 23290 20859 23342 2432 2484    1 2404 0
    4049 23291 20859 23342 2433 2484    1 2404 0
    4049 23292 20859 23342 2434 2484    1 2404 0
    4049 23293 20859 23342 2435 2484    1 2404 0
    3065 22732 22732 23318    1  587    .  522 1
    3065 23254 22732 23318  523  587  522  522 0
    3065 23255 22732 23318  524  587    1  522 0
    3065 23256 22732 23318  525  587    1  522 0
    3065 23257 22732 23318  526  587    1  522 0
    3065 23258 22732 23318  527  587    1  522 0
    3065 23259 22732 23318  528  587    1  522 0
    3065 23260 22732 23318  529  587    1  522 0
    3065 23261 22732 23318  530  587    1  522 0
    3065 23262 22732 23318  531  587    1  522 0
    3046 22911 22911 23342    1  432    .  364 1
    3046 23275 22911 23342  365  432  364  364 0
    3046 23276 22911 23342  366  432    1  364 0
    3046 23276 22911 23342  366  432    0  364 0
    3046 23276 22911 23342  366  432    0  364 0
    3046 23276 22911 23342  366  432    0  364 0
    3046 23277 22911 23342  367  432    0  364 0
    3046 23277 22911 23342  367  432    1  364 0
    3046 23277 22911 23342  367  432    0  364 0
    3046 23277 22911 23342  367  432    0  364 0
    3319 22919 22919 23317    1  399    .  365 1
    3319 23284 22919 23317  366  399  365  365 0
    3319 23286 22919 23317  368  399    2  365 0
    3319 23286 22919 23317  368  399    0  365 0
    3319 23286 22919 23317  368  399    0  365 0
    3319 23286 22919 23317  368  399    0  365 0
    3319 23286 22919 23317  368  399    0  365 0
    3319 23286 22919 23317  368  399    0  365 0
    3319 23286 22919 23317  368  399    0  365 0
    3319 23286 22919 23317  368  399    0  365 0
     964 22990 22990 23340    1  351    .  240 1
     964 22990 22990 23340    1  351    0  240 1
     964 23230 22990 23340  241  351  240  240 0
     964 23230 22990 23340  241  351    0  240 0
     964 23230 22990 23340  241  351    0  240 0
     964 23230 22990 23340  241  351    0  240 0
     964 23230 22990 23340  241  351    0  240 0
     964 23230 22990 23340  241  351    0  240 0
     964 23230 22990 23340  241  351    0  240 0
     964 23230 22990 23340  241  351    0  240 0
    end
    format %tdNN/DD/YY recorded_date
    format %tdNN/DD/YY datemin
    format %tdNN/DD/YY datemax
    
    bysort participant_id (recorded_date) : gen pstime = _n 
    tsset participant_id pstime 
    
    tsspell, fcond(D.recorded_date >= 30) 
    
    list participant_id recorded_date _*, sepby(participant_id _spell) 
    
         +--------------------------------------------+
         | partic~d   record~e   _spell   _seq   _end |
         |--------------------------------------------|
      1. |      964   12/11/22        1      1      0 |
      2. |      964   12/11/22        1      2      1 |
         |--------------------------------------------|
      3. |      964   08/08/23        2      1      0 |
      4. |      964   08/08/23        2      2      0 |
      5. |      964   08/08/23        2      3      0 |
      6. |      964   08/08/23        2      4      0 |
      7. |      964   08/08/23        2      5      0 |
      8. |      964   08/08/23        2      6      0 |
      9. |      964   08/08/23        2      7      0 |
     10. |      964   08/08/23        2      8      1 |
         |--------------------------------------------|
     11. |     3046   09/23/22        1      1      1 |
         |--------------------------------------------|
     12. |     3046   09/22/23        2      1      0 |
     13. |     3046   09/23/23        2      2      0 |
     14. |     3046   09/23/23        2      3      0 |
     15. |     3046   09/23/23        2      4      0 |
     16. |     3046   09/23/23        2      5      0 |
     17. |     3046   09/24/23        2      6      0 |
     18. |     3046   09/24/23        2      7      0 |
     19. |     3046   09/24/23        2      8      0 |
     20. |     3046   09/24/23        2      9      1 |
         |--------------------------------------------|
     21. |     3065   03/28/22        1      1      1 |
         |--------------------------------------------|
     22. |     3065   09/01/23        2      1      0 |
     23. |     3065   09/02/23        2      2      0 |
     24. |     3065   09/03/23        2      3      0 |
     25. |     3065   09/04/23        2      4      0 |
     26. |     3065   09/05/23        2      5      0 |
     27. |     3065   09/06/23        2      6      0 |
     28. |     3065   09/07/23        2      7      0 |
     29. |     3065   09/08/23        2      8      0 |
     30. |     3065   09/09/23        2      9      1 |
         |--------------------------------------------|
     31. |     3319   10/01/22        1      1      1 |
         |--------------------------------------------|
     32. |     3319   10/01/23        2      1      0 |
     33. |     3319   10/03/23        2      2      0 |
     34. |     3319   10/03/23        2      3      0 |
     35. |     3319   10/03/23        2      4      0 |
     36. |     3319   10/03/23        2      5      0 |
     37. |     3319   10/03/23        2      6      0 |
     38. |     3319   10/03/23        2      7      0 |
     39. |     3319   10/03/23        2      8      0 |
     40. |     3319   10/03/23        2      9      1 |
         |--------------------------------------------|
     41. |     4049   02/09/17        1      1      1 |
         |--------------------------------------------|
     42. |     4049   09/10/23        2      1      0 |
     43. |     4049   09/12/23        2      2      0 |
     44. |     4049   10/04/23        2      3      0 |
     45. |     4049   10/05/23        2      4      0 |
     46. |     4049   10/06/23        2      5      0 |
     47. |     4049   10/07/23        2      6      0 |
     48. |     4049   10/08/23        2      7      0 |
     49. |     4049   10/09/23        2      8      0 |
     50. |     4049   10/10/23        2      9      1 |
         +--------------------------------------------+

    Comment


    • #3
      Works beautifully, as usual. Thank you!

      Comment

      Working...
      X