Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating difference in dates between rows and deletiing/ keeping an observation based on the result

    Hello,

    In my dataset I have variable called Pseudo_ID, and this is a unique key related to a specific individual. I also have an admission and discharge date to hospital. Over 95% of cases, a person appears just once in the dataset. A minority appears twice i.e. same pseudo id but difference admission and discharge dates, and one observation appears three times. In most cases these are likely to be data quality issues, as it is implausible for someone to be discharged from hospital and then re-admitted if the interval between the first discharge date and following admission date is less than 10 months. Most of these observations the interval is in days not months. However, before deleting an observation, and keeping the one with the most recent discharge date - I need to check that the interval is less than 10 months.

    I am not sure how to do this because I don't know how to calculate the difference in dates when I am looking between rows.

    The sample data below contains only those observations where Pseudo_ID appears more than once. In the first of these examples the psedu_ID is p100000010813, and we have two sets admission and discharge dates:

    Admission_Date Discharge_Date
    02mar2022 04mar2022
    15apr2021 15apr2021

    The interval between discharge date 04mar2022 and admission date 15apr2021 is greater than or equal to 10 months - so I would keep both observations. However, for observations for pseduo_id p100000084816 the difference is in only a couple of days, so I would reject the observation with the earliest discharge date and keep the observation with latest discharge date.

    So my main concern is how to calculate the difference between the rows by pseudo_id. It does not matter if the difference is positive or negative, for instance the difference 04mar2022 and 15apr2021 is 323 or -323 days depending on which way you do the calculation, its the absolute value that matters I think.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str13 Pseudo_ID int(Admission_Date Discharge_Date) str17 Total_Cost_Exc_MFF
    "p100000010813" 22706 22708 "2499" 
    "p100000010813" 22385 22385 "2499" 
    "p100000046849" 22366 22371 "3335" 
    "p100000046849" 22712 22725 "6646" 
    "p100000084652" 22402 22402 "4117" 
    "p100000084652" 22721 22723 "3335" 
    "p100000084816" 22635 22635 "3335" 
    "p100000084816" 22617 22620 "3335" 
    "p100000199404" 22714 22718 "4832" 
    "p100000199404" 22379 22386 "4832" 
    "p100000219463" 22373 22374 "3335" 
    "p100000219463" 22726 22727 "2499" 
    "p100000329612" 22705 22707 "2051" 
    "p100000329612" 22370 22377 "4832" 
    "p100000457046" 22625 22630 "3335" 
    "p100000457046" 22595 22629 "16579"
    "p100000574309" 22412 22412 "2051" 
    "p100000574309" 22721 22722 "4832" 
    "p100000650152" 22421 22422 "2051" 
    "p100000650152" 22424 22427 "2499" 
    "p100000656961" 22378 22385 "6085" 
    "p100000656961" 22663 22665 "6085" 
    "p100000663172" 22555 22556 "3335" 
    "p100000663172" 22554 22554 "3335" 
    "p100000733492" 22595 22601 "6085" 
    "p100000733492" 22568 22572 "6085" 
    "p100000737591" 22642 22658 "7504" 
    "p100000737591" 22404 22408 "3335" 
    "p100000769543" 22728 22729 "2051" 
    "p100000769543" 22727 22727 "2051" 
    "p100000812041" 22594 22600 "6085" 
    "p100000812041" 22612 22615 "4832" 
    "p100000838265" 22532 22532 "3335" 
    "p100000838265" 22532 22538 "3335" 
    "p100000867504" 22682 22683 "3335" 
    "p100000867504" 22687 22692 "3335" 
    "p100000869842" 22728 22728 "2499" 
    "p100000869842" 22561 22562 "2499" 
    "p100000910635" 22692 22697 "4832" 
    "p100000910635" 22662 22665 "4832" 
    "p100000942297" 22484 22484 "2499" 
    "p100000942297" 22467 22467 "2499" 
    "p100000964420" 22460 22461 "2499" 
    "p100000964420" 22464 22465 "2499" 
    "p100001025212" 22432 22434 "4832" 
    "p100001025212" 22427 22427 "4832" 
    "p100001056424" 22393 22396 "2499" 
    "p100001056424" 22610 22611 "2499" 
    "p100001075008" 22586 22590 "3335" 
    "p100001075008" 22601 22601 "2499" 
    "p100001092661" 22438 22439 "4832" 
    "p100001092661" 22723 22730 "6085" 
    "p100001151503" 22691 22692 "3335" 
    "p100001151503" 22375 22379 "4832" 
    "p100001176761" 22378 22379 "2499" 
    "p100001176761" 22699 22701 "3335" 
    "p100001247817" 22379 22381 "2499" 
    "p100001247817" 22723 22726 "3335" 
    "p100001325630" 22729 22730 "3335" 
    "p100001325630" 22408 22414 "4832" 
    "p100001483238" 22395 22408 "6085" 
    "p100001483238" 22408 22413 "6085" 
    "p100001499999" 22694 22695 "3335" 
    "p100001499999" 22707 22708 "3335" 
    "p100001619664" 22502 22502 "3335" 
    "p100001619664" 22500 22502 "4117" 
    "p100001741725" 22537 22538 "6085" 
    "p100001741725" 22574 22575 "2499" 
    "p100001755388" 22406 22407 "2499" 
    "p100001755388" 22729 22733 "3335" 
    "p100001840035" 22383 22384 "4117" 
    "p100001840035" 22726 22727 "4832" 
    "p100001879563" 22727 22729 "6085" 
    "p100001879563" 22408 22412 "3335" 
    "p100001959998" 22732 22735 "4117" 
    "p100001959998" 22404 22407 "4117" 
    "p100001976225" 22656 22661 "6085" 
    "p100001976225" 22391 22394 "2051" 
    "p100002008886" 22378 22380 "2499" 
    "p100002008886" 22724 22725 "2051" 
    "p100002052988" 22712 22713 "4832" 
    "p100002052988" 22369 22371 "2499" 
    "p100002100178" 22724 22726 "2499" 
    "p100002100178" 22402 22405 "2499" 
    "p100002200017" 22678 22678 "4117" 
    "p100002200017" 22698 22699 "4117" 
    "p100002303518" 22449 22450 "4832" 
    "p100002303518" 22450 22451 "4117" 
    "p100002320251" 22622 22638 "7504" 
    "p100002320251" 22638 22644 "6085" 
    "p100002325351" 22432 22436 "6085" 
    "p100002325351" 22439 22442 "4832" 
    "p100002331450" 22391 22395 "3335" 
    "p100002331450" 22719 22721 "4117" 
    "p100002351772" 22720 22728 "3918" 
    "p100002351772" 22381 22384 "4117" 
    "p100002466555" 22699 22701 "4117" 
    "p100002466555" 22693 22695 "3335" 
    "p100002473310" 22647 22650 "4117" 
    "p100002473310" 22645 22646 "3335" 
    end
    format %td Admission_Date
    format %td Discharge_Date

  • #2
    Here is some technique with your data example (thanks):

    Code:
    bysort Pseudo_ID (Admission_Date) : gen gap = Admission_Date - Discharge_Date[_n-1]
    by Pseudo_ID : egen smallest = min(gap)
    list if smallest < 300, sepby(Pseudo_ID)
    
         +-------------------------------------------------------------------+
         |     Pseudo_ID   Admissi~e   Dischar~e   Total_~F   gap   smallest |
         |-------------------------------------------------------------------|
      7. | p100000084816   03dec2021   06dec2021       3335     .         15 |
      8. | p100000084816   21dec2021   21dec2021       3335    15         15 |
         |-------------------------------------------------------------------|
     15. | p100000457046   11nov2021   15dec2021      16579     .         -4 |
     16. | p100000457046   11dec2021   16dec2021       3335    -4         -4 |
         |-------------------------------------------------------------------|
     19. | p100000650152   21may2021   22may2021       2051     .          2 |
     20. | p100000650152   24may2021   27may2021       2499     2          2 |
         |-------------------------------------------------------------------|
     21. | p100000656961   08apr2021   15apr2021       6085     .        278 |
     22. | p100000656961   18jan2022   20jan2022       6085   278        278 |
         |-------------------------------------------------------------------|
     23. | p100000663172   01oct2021   01oct2021       3335     .          1 |
     24. | p100000663172   02oct2021   03oct2021       3335     1          1 |
         |-------------------------------------------------------------------|
     25. | p100000733492   15oct2021   19oct2021       6085     .         23 |
     26. | p100000733492   11nov2021   17nov2021       6085    23         23 |
         |-------------------------------------------------------------------|
     27. | p100000737591   04may2021   08may2021       3335     .        234 |
     28. | p100000737591   28dec2021   13jan2022       7504   234        234 |
         |-------------------------------------------------------------------|
     29. | p100000769543   23mar2022   23mar2022       2051     .          1 |
     30. | p100000769543   24mar2022   25mar2022       2051     1          1 |
         |-------------------------------------------------------------------|
     31. | p100000812041   10nov2021   16nov2021       6085     .         12 |
     32. | p100000812041   28nov2021   01dec2021       4832    12         12 |
         |-------------------------------------------------------------------|
     33. | p100000838265   09sep2021   09sep2021       3335     .          0 |
     34. | p100000838265   09sep2021   15sep2021       3335     0          0 |
         |-------------------------------------------------------------------|
     35. | p100000867504   06feb2022   07feb2022       3335     .          4 |
     36. | p100000867504   11feb2022   16feb2022       3335     4          4 |
         |-------------------------------------------------------------------|
     37. | p100000869842   08oct2021   09oct2021       2499     .        166 |
     38. | p100000869842   24mar2022   24mar2022       2499   166        166 |
         |-------------------------------------------------------------------|
     39. | p100000910635   17jan2022   20jan2022       4832     .         27 |
     40. | p100000910635   16feb2022   21feb2022       4832    27         27 |
         |-------------------------------------------------------------------|
     41. | p100000942297   06jul2021   06jul2021       2499     .         17 |
     42. | p100000942297   23jul2021   23jul2021       2499    17         17 |
         |-------------------------------------------------------------------|
     43. | p100000964420   29jun2021   30jun2021       2499     .          3 |
     44. | p100000964420   03jul2021   04jul2021       2499     3          3 |
         |-------------------------------------------------------------------|
     45. | p100001025212   27may2021   27may2021       4832     .          5 |
     46. | p100001025212   01jun2021   03jun2021       4832     5          5 |
         |-------------------------------------------------------------------|
     47. | p100001056424   23apr2021   26apr2021       2499     .        214 |
     48. | p100001056424   26nov2021   27nov2021       2499   214        214 |
         |-------------------------------------------------------------------|
     49. | p100001075008   02nov2021   06nov2021       3335     .         11 |
     50. | p100001075008   17nov2021   17nov2021       2499    11         11 |
         |-------------------------------------------------------------------|
     51. | p100001092661   07jun2021   08jun2021       4832     .        284 |
     52. | p100001092661   19mar2022   26mar2022       6085   284        284 |
         |-------------------------------------------------------------------|
     61. | p100001483238   25apr2021   08may2021       6085     .          0 |
     62. | p100001483238   08may2021   13may2021       6085     0          0 |
         |-------------------------------------------------------------------|
     63. | p100001499999   18feb2022   19feb2022       3335     .         12 |
     64. | p100001499999   03mar2022   04mar2022       3335    12         12 |
         |-------------------------------------------------------------------|
     65. | p100001619664   08aug2021   10aug2021       4117     .          0 |
     66. | p100001619664   10aug2021   10aug2021       3335     0          0 |
         |-------------------------------------------------------------------|
     67. | p100001741725   14sep2021   15sep2021       6085     .         36 |
     68. | p100001741725   21oct2021   22oct2021       2499    36         36 |
         |-------------------------------------------------------------------|
     77. | p100001976225   21apr2021   24apr2021       2051     .        262 |
     78. | p100001976225   11jan2022   16jan2022       6085   262        262 |
         |-------------------------------------------------------------------|
     85. | p100002200017   02feb2022   02feb2022       4117     .         20 |
     86. | p100002200017   22feb2022   23feb2022       4117    20         20 |
         |-------------------------------------------------------------------|
     87. | p100002303518   18jun2021   19jun2021       4832     .          0 |
     88. | p100002303518   19jun2021   20jun2021       4117     0          0 |
         |-------------------------------------------------------------------|
     89. | p100002320251   08dec2021   24dec2021       7504     .          0 |
     90. | p100002320251   24dec2021   30dec2021       6085     0          0 |
         |-------------------------------------------------------------------|
     91. | p100002325351   01jun2021   05jun2021       6085     .          3 |
     92. | p100002325351   08jun2021   11jun2021       4832     3          3 |
         |-------------------------------------------------------------------|
     97. | p100002466555   17feb2022   19feb2022       3335     .          4 |
     98. | p100002466555   23feb2022   25feb2022       4117     4          4 |
         |-------------------------------------------------------------------|
     99. | p100002473310   31dec2021   01jan2022       3335     .          1 |
    100. | p100002473310   02jan2022   05jan2022       4117     1          1 |
         +-------------------------------------------------------------------+

    Comment


    • #3
      Dear Nick, thanks a lot for this useful example. How would calculate the time difference for a dataset where ID can span several rows for some persons? So some persons might have two entry rows (like the example above) but others have many more and I want to calculate the difference between the first admission and the final discharge. Thank you!

      Comment


      • #4
        Code:
        by Pseudo_ID (Admission_Date), sort: gen wanted = Discharge_Date[_N] - Admission_date[1]
        Or, if you are using version 17 or later you can do it as
        Code:
        by Pseudo_ID (Admission_Date), sort: gen wanted = datediff(Admission_date[1], Discharge_Date[_N], "d")

        Comment

        Working...
        X