Identifying observations by relation to time to another observation, within sets of ids

jemima spencer

Join Date: Jan 2022

Posts: 38
#1

Identifying observations by relation to time to another observation, within sets of ids

14 Sep 2023, 07:27

Hi,

Could I possibly have some help identifying observations that lie closest in time to another observation for the same individual?

I have a large dataset in long format. Within this I have repeated observations for individuals (individual identified by "patid"). One observation for each individual refers to the date of an MI. The others are all kidney function readings on different dates. What I would like to do, is to a) identify the most recent kidney function reading for each individual, prior to the date of their MI, and b) to determine how many days prior to the MI this reading was taken.

I was wondering whether this might be better in wide format, but some individuals have >150 kidney function readings so STATA rather gives up on this idea.

(Apologies if this has been discussed elsewhere - I don't know what command(s) I'm looking for, so rather hard to search for advice on it!)

Dataex example:

input double patid float(dup obsdate currentMI CKD)
2920018 1 19652 1 0
2920018 2 22102 0 0

14820018 1 19428 0 1
14820018 2 19444 0 1
14820018 3 19514 1 0
14820018 4 19523 0 1
14820018 5 19795 0 1
14820018 6 19795 0 0
14820018 7 19816 0 1
14820018 8 19844 0 1
14820018 9 19845 0 1
14820018 10 20166 0 1
14820018 11 20255 0 1
14820018 12 20527 0 1
14820018 13 20769 0 1
14820018 14 20886 0 1
14820018 15 20958 0 1
14820018 16 21083 0 1
14820018 17 21111 0 1
14820018 18 21175 0 1
14820018 19 21256 0 1
14820018 20 21532 0 1
14820018 21 21672 0 1
14820018 22 22042 0 1
14820018 23 22043 0 1
14820018 24 22069 0 1

For example, in the dataex above, the first patid has only two observation dates (one for the date of the MI and the other for the date of the kidney function reading (CKD)), whereas the 2nd observation has 23 kidney function readings.

Thanks in advance for any help received.

JEmima
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35775

14 Sep 2023, 09:47

I guess that you guess that it's obvious that MI means myocardial infarction. On the contrary, it is obvious that MI means multiple imputation!

More seriously, this may help. See also https://www.stata.com/support/faqs/d...issing-values/

(I agree that wide layout is far less suitable here in Stata, or indeed generally.)

Code:

clear 
input double patid float(dup obsdate currentMI CKD)
2920018 1 19652 1 0
2920018 2 22102 0 0
14820018 1 19428 0 1
14820018 2 19444 0 1
14820018 3 19514 1 0
14820018 4 19523 0 1
14820018 5 19795 0 1
14820018 6 19795 0 0
14820018 7 19816 0 1
14820018 8 19844 0 1
14820018 9 19845 0 1
14820018 10 20166 0 1
14820018 11 20255 0 1
14820018 12 20527 0 1
14820018 13 20769 0 1
14820018 14 20886 0 1
14820018 15 20958 0 1
14820018 16 21083 0 1
14820018 17 21111 0 1
14820018 18 21175 0 1
14820018 19 21256 0 1
14820018 20 21532 0 1
14820018 21 21672 0 1
14820018 22 22042 0 1
14820018 23 22043 0 1
14820018 24 22069 0 1
end 

gen lastCKD = obsdate if CKD 
bysort patid (obsdate) : replace lastCKD = lastCKD[_n-1] if missing(lastCKD)
gen wanted = obsdate - lastCKD if currentMI 

list, sepby(patid)


    +--------------------------------------------------------------+
     |    patid   dup   obsdate   curren~I   CKD   lastCKD   wanted |
     |--------------------------------------------------------------|
  1. |  2920018     1     19652          1     0         .        . |
  2. |  2920018     2     22102          0     0         .        . |
     |--------------------------------------------------------------|
  3. | 14820018     1     19428          0     1     19428        . |
  4. | 14820018     2     19444          0     1     19444        . |
  5. | 14820018     3     19514          1     0     19444       70 |
  6. | 14820018     4     19523          0     1     19523        . |
  7. | 14820018     5     19795          0     1     19795        . |
  8. | 14820018     6     19795          0     0     19795        . |
  9. | 14820018     7     19816          0     1     19816        . |
 10. | 14820018     8     19844          0     1     19844        . |
 11. | 14820018     9     19845          0     1     19845        . |
 12. | 14820018    10     20166          0     1     20166        . |
 13. | 14820018    11     20255          0     1     20255        . |
 14. | 14820018    12     20527          0     1     20527        . |
 15. | 14820018    13     20769          0     1     20769        . |
 16. | 14820018    14     20886          0     1     20886        . |
 17. | 14820018    15     20958          0     1     20958        . |
 18. | 14820018    16     21083          0     1     21083        . |
 19. | 14820018    17     21111          0     1     21111        . |
 20. | 14820018    18     21175          0     1     21175        . |
 21. | 14820018    19     21256          0     1     21256        . |
 22. | 14820018    20     21532          0     1     21532        . |
 23. | 14820018    21     21672          0     1     21672        . |
 24. | 14820018    22     22042          0     1     22042        . |
 25. | 14820018    23     22043          0     1     22043        . |
 26. | 14820018    24     22069          0     1     22069        . |
     +--------------------------------------------------------------+

Comment

jemima spencer

Join Date: Jan 2022

Posts: 38
#3

15 Sep 2023, 01:54

That (of course!) worked perfectly, thank you SO much. I'd been working on my attempt at that for days (or maybe it just felt that way!).

Sorry about MI - a clinician for too long

Jemima
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35775
#4

15 Sep 2023, 06:06

Most of us have much reason to be grateful to clinicians!

See also https://journals.sagepub.com/doi/pdf...867X1101100309 for another discussion.
Comment

Announcement

Identifying observations by relation to time to another observation, within sets of ids

Comment

Comment

Comment