Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing dates within a certain threshold in long format

    Hi There,

    I have a dataset containing start and end dates of events across a number of subjects in long format (see data extract below). I'd like to start with the first recorded event per-patient, and remove any dates/events occurring within 14 days, and then repeat this process from the next available date (i.e. more than 14 days after the first) and so on.

    The data initially looks like this:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str13 subjectid double(event_start event_end)
    "AA100" 20804 20818
    "AA100" 20805 20819
    "AA100" 20807 20821
    "AA100" 20809 20823
    "AA100" 20810 20824
    "AA100" 20812 20826
    "AA100" 20813 20827
    "AA100" 20855 20869
    "AA100" 20856 20870
    "AA100" 20857 20871
    "AA100" 21172 21186
    "AA100" 21175 21189
    "AA100" 21236 21250
    "AA100" 21237 21251
    "AA100" 21238 21252
    "AA100" 21287 21301
    end
    format %td event_start
    format %td event_end
    And I would like it to eventually look like this:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str13 subjectid double(event_start event_end)
    "AA100" 20804 20818
    "AA100" 20855 20869
    "AA100" 21172 21186
    "AA100" 21236 21250
    "AA100" 21287 21301
    end
    format %td event_start
    format %td event_end
    I can get this to work for the first date and 14-day period, but then having trouble changing the anchor point to the next available date outside of the initial 14-day period to repeat the process.

    I hope that make sense, and thank you in advance.

    Kind Regards,

    Rob.

  • #2
    When you say you want to remove everything within 14 days of an event, do you mean within 14 days of when that event started or when it ended?

    On the assumption you mean within 14 days of when it ended, the following will serve:

    Code:
    capture program drop one_subjectid
    program define one_subjectid
        local index 1
        while `index' < _N {
            local current = `index'  + 1
            while `current' <= _N & event_start[`current'] <= event_end[`index'] + 14 {
                local ++current
            }
            drop in `=`index'+1'/`=`current'-1'
            local ++index
        }
        exit
    end
    
    sort subjectid event_start event_end
    runby one_subjectid, by(subjectid) verbose
    
    list, noobs clean
    Note: -runby- is written by Robert Picard and me, and is available from SSC.
    Last edited by Clyde Schechter; 18 Nov 2021, 09:44.

    Comment


    • #3
      Another solution:

      Code:
      bys subjectid: gen after14 = event_end + 14 if _n == 1
      bys subjectid: replace after14 = cond(event_end<=after14[_n-1], after14[_n-1], event_end+14) if _n > 1
      bys subjectid after14 (event_end): keep if _n == 1
      drop after14

      Comment


      • #4
        Fei Wang 's solution in #3 is better than mine. Use that. In general, it is best to avoid looping over observations, as my code does, and to rely instead on -by- when possible. I didn't perceive the possibility of a -by- based solution here. But he found one.

        Comment


        • #5
          Thank you both for your responses and solutions. I wanted to avoid looping over observations if possible, as you suggest Clyde, so Fei's solution is perfect.

          Kind Regards,

          Rob.

          Comment

          Working...
          X