Hello,
In my dataset I have variable called Pseudo_ID, and this is a unique key related to a specific individual. I also have an admission and discharge date to hospital. Over 95% of cases, a person appears just once in the dataset. A minority appears twice i.e. same pseudo id but difference admission and discharge dates, and one observation appears three times. In most cases these are likely to be data quality issues, as it is implausible for someone to be discharged from hospital and then re-admitted if the interval between the first discharge date and following admission date is less than 10 months. Most of these observations the interval is in days not months. However, before deleting an observation, and keeping the one with the most recent discharge date - I need to check that the interval is less than 10 months.
I am not sure how to do this because I don't know how to calculate the difference in dates when I am looking between rows.
The sample data below contains only those observations where Pseudo_ID appears more than once. In the first of these examples the psedu_ID is p100000010813, and we have two sets admission and discharge dates:
Admission_Date Discharge_Date
02mar2022 04mar2022
15apr2021 15apr2021
The interval between discharge date 04mar2022 and admission date 15apr2021 is greater than or equal to 10 months - so I would keep both observations. However, for observations for pseduo_id p100000084816 the difference is in only a couple of days, so I would reject the observation with the earliest discharge date and keep the observation with latest discharge date.
So my main concern is how to calculate the difference between the rows by pseudo_id. It does not matter if the difference is positive or negative, for instance the difference 04mar2022 and 15apr2021 is 323 or -323 days depending on which way you do the calculation, its the absolute value that matters I think.
In my dataset I have variable called Pseudo_ID, and this is a unique key related to a specific individual. I also have an admission and discharge date to hospital. Over 95% of cases, a person appears just once in the dataset. A minority appears twice i.e. same pseudo id but difference admission and discharge dates, and one observation appears three times. In most cases these are likely to be data quality issues, as it is implausible for someone to be discharged from hospital and then re-admitted if the interval between the first discharge date and following admission date is less than 10 months. Most of these observations the interval is in days not months. However, before deleting an observation, and keeping the one with the most recent discharge date - I need to check that the interval is less than 10 months.
I am not sure how to do this because I don't know how to calculate the difference in dates when I am looking between rows.
The sample data below contains only those observations where Pseudo_ID appears more than once. In the first of these examples the psedu_ID is p100000010813, and we have two sets admission and discharge dates:
Admission_Date Discharge_Date
02mar2022 04mar2022
15apr2021 15apr2021
The interval between discharge date 04mar2022 and admission date 15apr2021 is greater than or equal to 10 months - so I would keep both observations. However, for observations for pseduo_id p100000084816 the difference is in only a couple of days, so I would reject the observation with the earliest discharge date and keep the observation with latest discharge date.
So my main concern is how to calculate the difference between the rows by pseudo_id. It does not matter if the difference is positive or negative, for instance the difference 04mar2022 and 15apr2021 is 323 or -323 days depending on which way you do the calculation, its the absolute value that matters I think.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str13 Pseudo_ID int(Admission_Date Discharge_Date) str17 Total_Cost_Exc_MFF "p100000010813" 22706 22708 "2499" "p100000010813" 22385 22385 "2499" "p100000046849" 22366 22371 "3335" "p100000046849" 22712 22725 "6646" "p100000084652" 22402 22402 "4117" "p100000084652" 22721 22723 "3335" "p100000084816" 22635 22635 "3335" "p100000084816" 22617 22620 "3335" "p100000199404" 22714 22718 "4832" "p100000199404" 22379 22386 "4832" "p100000219463" 22373 22374 "3335" "p100000219463" 22726 22727 "2499" "p100000329612" 22705 22707 "2051" "p100000329612" 22370 22377 "4832" "p100000457046" 22625 22630 "3335" "p100000457046" 22595 22629 "16579" "p100000574309" 22412 22412 "2051" "p100000574309" 22721 22722 "4832" "p100000650152" 22421 22422 "2051" "p100000650152" 22424 22427 "2499" "p100000656961" 22378 22385 "6085" "p100000656961" 22663 22665 "6085" "p100000663172" 22555 22556 "3335" "p100000663172" 22554 22554 "3335" "p100000733492" 22595 22601 "6085" "p100000733492" 22568 22572 "6085" "p100000737591" 22642 22658 "7504" "p100000737591" 22404 22408 "3335" "p100000769543" 22728 22729 "2051" "p100000769543" 22727 22727 "2051" "p100000812041" 22594 22600 "6085" "p100000812041" 22612 22615 "4832" "p100000838265" 22532 22532 "3335" "p100000838265" 22532 22538 "3335" "p100000867504" 22682 22683 "3335" "p100000867504" 22687 22692 "3335" "p100000869842" 22728 22728 "2499" "p100000869842" 22561 22562 "2499" "p100000910635" 22692 22697 "4832" "p100000910635" 22662 22665 "4832" "p100000942297" 22484 22484 "2499" "p100000942297" 22467 22467 "2499" "p100000964420" 22460 22461 "2499" "p100000964420" 22464 22465 "2499" "p100001025212" 22432 22434 "4832" "p100001025212" 22427 22427 "4832" "p100001056424" 22393 22396 "2499" "p100001056424" 22610 22611 "2499" "p100001075008" 22586 22590 "3335" "p100001075008" 22601 22601 "2499" "p100001092661" 22438 22439 "4832" "p100001092661" 22723 22730 "6085" "p100001151503" 22691 22692 "3335" "p100001151503" 22375 22379 "4832" "p100001176761" 22378 22379 "2499" "p100001176761" 22699 22701 "3335" "p100001247817" 22379 22381 "2499" "p100001247817" 22723 22726 "3335" "p100001325630" 22729 22730 "3335" "p100001325630" 22408 22414 "4832" "p100001483238" 22395 22408 "6085" "p100001483238" 22408 22413 "6085" "p100001499999" 22694 22695 "3335" "p100001499999" 22707 22708 "3335" "p100001619664" 22502 22502 "3335" "p100001619664" 22500 22502 "4117" "p100001741725" 22537 22538 "6085" "p100001741725" 22574 22575 "2499" "p100001755388" 22406 22407 "2499" "p100001755388" 22729 22733 "3335" "p100001840035" 22383 22384 "4117" "p100001840035" 22726 22727 "4832" "p100001879563" 22727 22729 "6085" "p100001879563" 22408 22412 "3335" "p100001959998" 22732 22735 "4117" "p100001959998" 22404 22407 "4117" "p100001976225" 22656 22661 "6085" "p100001976225" 22391 22394 "2051" "p100002008886" 22378 22380 "2499" "p100002008886" 22724 22725 "2051" "p100002052988" 22712 22713 "4832" "p100002052988" 22369 22371 "2499" "p100002100178" 22724 22726 "2499" "p100002100178" 22402 22405 "2499" "p100002200017" 22678 22678 "4117" "p100002200017" 22698 22699 "4117" "p100002303518" 22449 22450 "4832" "p100002303518" 22450 22451 "4117" "p100002320251" 22622 22638 "7504" "p100002320251" 22638 22644 "6085" "p100002325351" 22432 22436 "6085" "p100002325351" 22439 22442 "4832" "p100002331450" 22391 22395 "3335" "p100002331450" 22719 22721 "4117" "p100002351772" 22720 22728 "3918" "p100002351772" 22381 22384 "4117" "p100002466555" 22699 22701 "4117" "p100002466555" 22693 22695 "3335" "p100002473310" 22647 22650 "4117" "p100002473310" 22645 22646 "3335" end format %td Admission_Date format %td Discharge_Date
Comment