Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Flagging observations that don't exist in subsequent observations

    Hello, I am trying to flag unique observations that were not followed in subsequent visits. I apologize as this is similar to a previous question titled "Identifying unique markers". However, this issue is related to one specific issue that was not truly explained in the previous thread. I did my best in explaining the problem and I thank you for reading.


    Below is the example code.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int PATIENT str5 READER long VISIT int MARKER_ID byte problem
    100 "BOB"   1 231 0
    100 "BOB"   1 234 0
    100 "BOB"   2 231 0
    100 "BOB"   2 234 0
    100 "BOB"   3 231 0
    100 "BOB"   3 234 0
    100 "BOB"   3 235 0
    100 "BOB"   4 231 0
    100 "BOB"   4 234 0
    100 "BOB"   4 235 0
    100 "BOB"   5 231 0
    100 "BOB"   5 234 0
    100 "BOB"   5 235 0
    100 "BILL"  1 693 0
    100 "BILL"  1 699 0
    100 "BILL"  2 702 0
    100 "BILL"  2 693 0
    100 "BILL"  2 699 0
    100 "BILL"  3 699 0
    100 "BILL"  3 693 0
    100 "BILL"  3 702 1
    100 "BILL"  4 699 0
    100 "BILL"  4 693 0
    100 "BILL"  5 693 0
    100 "BILL"  5 699 0
    100 "BILL"  5 702 0
    101 "DANNY" 1 703 0
    101 "DANNY" 1 701 1
    101 "DANNY" 2 703 0
    101 "DANNY" 4 703 0
    101 "RONNY" 1 703 0
    101 "RONNY" 1 701 1
    101 "RONNY" 2 703 0
    101 "RONNY" 4 701 0
    101 "RONNY" 4 703 0
    end

    The set of observations, when bysort PATIENT READER VISIT, must have MARKER_ID that exists in the subsequent VISIT (i.e. if it is in VISIT 1, it has to be followed even at the last VISIT). VISIT does not necessitate being in order e.g. 1-4 or 1-7. There can be gaps. If a MARKER_ID is not followed in the subsequent VISIT, then it should be flagged. In my case, problem = 1 when the subsequent VISIT does not contain that particular MARKER_ID

    Example: When looking at PATIENT = 100 and READER = BOB, we see that each MARKER_ID is followed in the following VISIT and there's no flags for a new MARKER_ID (235 is in VISIT 3 and followed). This is the perfect example.

    When looking at PATIENT = 100 and READER = BELL, we see that MARKER_ID = 702 is not in VISIT = 4 and it is flagged in VISIT = 3.

    When looking at PATIENT = 101 and READER = DANNY, we see that MARKER_ID = 701 is not in VISIT = 2 or 4 and it is flagged in VISIT = 1.

    When looking at PATIENT = 101 and READER = RONNY, we see that MARKER_ID = 701 is not in VISIT = 2 and it is flagged in VISIT = 1.

    flagged in this case means problem = 1


    I hope my explanation makes perfect sense. Feel free to ask any questions and I will try to be as concise as possible. Thank you.

  • #2
    So within PATIENT READER by-groups, each distinct VISIT code can include more than one MARKER_ID code. I'll assume that a higher VISIT code indicates a later visit. So the first thing to do is to generate a new identifier variable for visits that respects the order but makes each visit sequential. The value of vis_group on the last observation of a PATIENT READER by-group indicates the overall number of visits. You can then sort observations to form PATIENT READER MARKER_ID by-groups (within each by-group, observations are ordered by vis_group). You can then flag non-consecutive observations within these by-groups. Since the last observation within the by-group does not have a subsequent observation, you have to target it separately and check that its vis_group is equal to the overall number of visits.

    Code:
    * validate assumptions about the data
    isid PATIENT READER VISIT MARKER_ID, sort
    
    * create a truly sequential visit identifier
    by PATIENT READER: gen vis_group = sum(VISIT != VISIT[_n-1])
    
    * for each patient reader by-group, the overall number of visits
    by PATIENT READER: gen nvisits = vis_group[_N]
    
    * flag non-continuous visits; the id of last visit must match nvisits
    sort PATIENT READER MARKER_ID vis_group
    by PATIENT READER MARKER_ID: gen wanted = !mi(vis_group[_n+1]) & ///
        vis_group + 1 != vis_group[_n+1]
    by PATIENT READER MARKER_ID: replace wanted = 1 if _n == _N  & ///
        vis_group != nvisits

    Comment


    • #3
      Perfect, thank you Robert. I'll definitely start using isid and mi() in my code

      Comment

      Working...
      X