Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple observations per id. How to filter observations if within a number of days of episode

    Hi! I'm working with a large data set that has multiple observations per id. Basically, there's one observation per collected sample. I'm trying to filter by "patients" that sent a sample within a number of days (before and/or after) of a positive result. Roughly my data looks like this:
    id age sample date sample site result
    1 1 10/14/2022 0 0
    1 1 10/25/2022 1 0
    1 1 11/1/2022 1 1
    1 1 11/8/2022 1 0
    2 9 09/27/2022 1 0
    2 9 10/15/2022 1 1
    2 9 10/24/2022 0 0
    3 4 9/22/2022 1 1
    3 4 10/05/2022 1 0
    Could you help me figure out how to get this data?

  • #2
    Originally posted by Maria Deza View Post
    I'm trying to filter by "patients" that sent a sample within a number of days (before and/or after) of a positive result.
    Within a number of days is ambiguous. How many days? Also, the solution will depend on whether your sample date variable is a proper date variable (i.e., one recognized by Stata). For this reason, you should post back with a dataex example which should be easier to generate than the table you posted. You can copy and paste the result of

    Code:
    sort id
    dataex in 1/20

    Comment


    • #3
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(id age) float sample_date byte(samplesite result)
      1 1 22932 0 0
      1 1 22943 1 0
      1 1 22950 1 1
      1 1 22957 1 0
      2 9 22915 1 0
      2 9 22933 1 1
      2 9 22942 0 0
      3 4 22910 1 1
      3 4 22923 1 0
      end
      format %tdNN/DD/CCYY sample_date
      
      assert !missing(result, sample_date)
      
      local a_number_of_days 10
      
      gen timely = 1
      rangestat (count) timely , by(id) ///
          interval(sample_date -`a_number_of_days' `a_number_of_days') excludeself
      replace timely = 0 if missing(timely)
      by id (sample_date), sort: egen wanted = max(timely_count > 0 & result)
      Note: -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

      You do not say how many days is "a number of days." I chose to illustrate the code using 10 because some of your id's meet that criterion and one does not. Evidently, change that line of code according to what you actually had in mind.

      The code also assumes that sample_date is in fact a true Stata internal format date variable. Because you posted a tableau instead of using -dataex- to show your example data, there is no metadata available and this is a guess on my part. If I guessed wrong, the code will not run until you convert it. To avoid wasting your time and mine with guesswork in the future, please use the -dataex- command to do so, as I have in this response. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Added: Crossed with #2.


      Comment


      • #4
        (Unsurprising similarities to #1 and #2)

        Please note the advice to use dataex -- especially whenever dates are part of the data. The date variable you gave needs conversion before it is any use.

        I don't fully follow this. In particular, it seems that all your patients qualify, as all patients have samplesite 1 when result is 1. So does before or after exclude the same day?

        I used rangestat from SSC. The interval given is just notional as I can see nothing in #1 about how many days you allow.


        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte(d age) str10 sampledate byte(samplesite result)
        1 1 "10/14/2022" 0 0
        1 1 "10/25/2022" 1 0
        1 1 "11/1/2022"  1 1
        1 1 "11/8/2022"  1 0
        2 9 "09/27/2022" 1 0
        2 9 "10/15/2022" 1 1
        2 9 "10/24/2022" 0 0
        3 4 "9/22/2022"  1 1
        3 4 "10/05/2022" 1 0
        end
        
        gen date = daily(sampledate, "MDY")
        format date %td 
        
        rangestat (sum) count=samplesite, int(date -3 3) by(d)
        replace count = 0 if result == 0 
        
        egen wanted = max(count > 0), by(d) 
        
        list, sepby(d)
        
             +-----------------------------------------------------------------------+
             | d   age   sampledate   samp~ite   result        date   count   wanted |
             |-----------------------------------------------------------------------|
          1. | 1     1   10/14/2022          0        0   14oct2022       0        1 |
          2. | 1     1   10/25/2022          1        0   25oct2022       0        1 |
          3. | 1     1    11/1/2022          1        1   01nov2022       1        1 |
          4. | 1     1    11/8/2022          1        0   08nov2022       0        1 |
             |-----------------------------------------------------------------------|
          5. | 2     9   09/27/2022          1        0   27sep2022       0        1 |
          6. | 2     9   10/15/2022          1        1   15oct2022       1        1 |
          7. | 2     9   10/24/2022          0        0   24oct2022       0        1 |
             |-----------------------------------------------------------------------|
          8. | 3     4    9/22/2022          1        1   22sep2022       1        1 |
          9. | 3     4   10/05/2022          1        0   05oct2022       0        1 |
             +-----------------------------------------------------------------------+

        Comment


        • #5
          So does before or after exclude the same day?
          In my response in #3, I assumed that the same day is excluded, hence the -excludeself- option in my -rangestat- command.. The question doesn't even make sense otherwise.

          That said, we do sometimes get questions here that don't make sense, in which case the response needs to be to point that out and help refine the question to one that does make sense. So probably I shouldn't have made that assumption, or at least should have made clear that I was making that assumption..

          Comment

          Working...
          X