Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying events within a certain time period

    Hello. I have a simplified data set (see below) with variables subject_id, encounter_date and encounter_num. I am interested in counting repeat encounters that happen >1 but <15 days after the first encounter. In addition, I would like to carry out the same process for subsequent encounters, as long as they were >30 days from the prior index encounter. For the subject 33204 listed below, I would like to be able to use encounter number 1 as the initial event, then count encounters 4 and 5 (because they occurred >1 and <15 days from encounter 1.) Then I would like to next identify and use encounter number 8 (because it occurred 30 days from the prior index encounter) and count encounter 9 (occurred >1 and <15 days from encounter 8). Essentially I'm trying to count the number of events that occur in a specific time period and repeat this process every 30 days.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double subject_id float(encounter_date encounter_num)
    33204 18265  1
    33204 18265  2
    33204 18265  3
    33204 18269  4
    33204 18275  5
    33204 18288  6
    33204 18288  7
    33204 18302  8
    33204 18311  9
    33204 18324 10
    33204 18337 11
    33204 18393 12
    33204 18394 13
    33204 18419 14
    33204 18420 15
    33204 18428 16
    33204 18456 17
    33204 18526 18
    33204 18530 19
    33204 18532 20
    33204 18541 21
    33204 18570 22
    33204 18599 23
    33204 18632 24
    33204 18655 25
    33204 18686 26
    33205 18265  1
    33205 18266  2
    33205 18272  3
    33205 18293  4
    33205 18331  5
    33205 18348  6
    33205 18393  7
    33205 18444  8
    33205 18543  9
    33205 18637 10
    33205 18687 11
    33205 18693 12
    33205 18694 13
    33205 18725 14
    33205 18892 15
    33205 19005 16
    end
    format %td encounter_date

  • #2
    Your question is unclear. In what way do you want to "use" or "identify" these later encounters. Do you want to create a new variable that tells you how many of them there are? Do you want to create a new variable (or variables) that identify their encounter numbers? Do you want to drop all the other encounters from the data set? Something else? Putting it concretely, for the example data you show, what would the results you want look like in Stata?

    Comment


    • #3
      Good questions Clyde. I'd like to create a new variable that is did encounter have another encounter (yes or no) in the preceding 30 days and if no create a count of the subsequent encounters that occurred >1 but <15 days. So the data below would look like :

      33204 18265 1 0 2 33204 18265 2 1 33204 18265 3 1 33204 18269 4 1 33204 18275 5 1 33204 18288 6 1 33204 18288 7 1 33204 18302 8 1 33204 18311 9 1 33204 18324 10 1 33204 18337 11 1 33204 18393 12 0 0 33204 18394 13 1 33204 18419 14 1 33204 18420 15 1 33204 18428 16 1 33204 18456 17 1 33204 18526 18 0 2 33204 18530 19 1 33204 18532 20 1 33204 18541 21 1

      Comment


      • #4
        Sorry, the formatting got messed up the last post. This is what I want to generate to create a new variable 4 if they had an encounter in the preceding 30 days yes or no and if no then variable 5, how many encounters occurred >1 but <15 days post that encounter.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double subject_id float(encounter_date encounter_num var4 var5)
        33204 18265  1 0 2
        33204 18265  2 1 .
        33204 18265  3 1 .
        33204 18269  4 1 .
        33204 18275  5 1 .
        33204 18288  6 1 .
        33204 18288  7 1 .
        33204 18302  8 1 .
        33204 18311  9 1 .
        33204 18324 10 1 .
        33204 18337 11 1 .
        33204 18393 12 0 0
        33204 18394 13 1 .
        33204 18419 14 1 .
        33204 18420 15 1 .
        33204 18428 16 1 .
        33204 18456 17 1 .
        33204 18526 18 0 2
        33204 18530 19 1 .
        33204 18532 20 1 .
        33204 18541 21 1 .
        33205 18265  1 . .
        33205 18266  2 . .
        33205 18272  3 . .
        33205 18293  4 . .
        33205 18331  5 . .
        33205 18348  6 . .
        33205 18393  7 . .
        33205 18444  8 . .
        33205 18543  9 . .
        33205 18637 10 . .
        33205 18687 11 . .
        33205 18693 12 . .
        33205 18694 13 . .
        33205 18725 14 . .
        33205 18892 15 . .
        33205 19005 16 . .
        end
        format %td encounter_date

        Comment


        • #5
          OK, this is pretty complicated, but I think I have it.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input double subject_id float(encounter_date encounter_num)
          33204 18265  1
          33204 18265  2
          33204 18265  3
          33204 18269  4
          33204 18275  5
          33204 18288  6
          33204 18288  7
          33204 18302  8
          33204 18311  9
          33204 18324 10
          33204 18337 11
          33204 18393 12
          33204 18394 13
          33204 18419 14
          33204 18420 15
          33204 18428 16
          33204 18456 17
          33204 18526 18
          33204 18530 19
          33204 18532 20
          33204 18541 21
          33205 18265  1
          33205 18266  2
          33205 18272  3
          33205 18293  4
          33205 18331  5
          33205 18348  6
          33205 18393  7
          33205 18444  8
          33205 18543  9
          33205 18637 10
          33205 18687 11
          33205 18693 12
          33205 18694 13
          33205 18725 14
          33205 18892 15
          33205 19005 16
          end
          format %td encounter_date
          
          by subject_id (encounter_num), sort: assert encounter_date >= encounter_date[_n-1] if _n > 1
          by subject_id (encounter_num): assert encounter_num[1] == 1
          by subject_id (encounter_num): assert encounter_num == encounter_num[_n-1]+1 if _n > 1
          
          
          capture program drop mark_blocks
          program define mark_blocks
              rangestat (max) next_block = encounter_num, interval(encounter_date 0 30)
              local start = 1
              local block_num = 1
              gen block_num = .
              while `start' <= _N {
                  local end = next_block[`start']
                  if missing(`end') {
                      local end = _N
                  }
                  replace block_num = `block_num' in `start'/`end'
                  local ++block_num
                  local start = `end' + 1
              }
              exit
          end
          
          runby mark_blocks, by(subject_id) verbose
          
          by subject_id block_num (encounter_num), sort: egen var5 ///
              = total(inrange(encounter_date-encounter_date[1], 1, 15))
          by subject_id block_num (encounter_num): gen byte var4 = (_n > 1), before(var5)
          by subject_id block_num (encounter_num): replace var5 = . if var4 | _N == 1

          -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer and is available from SSC.
          -runby- is written by Robert Picard and me, and is also available from SSC.

          Note: The three assert commands immediately following the loading of data are just there to verify that encounter_num and encounter_date sort the same way, and that encounter_num is, within patients, a consecutive numbering of observations starting from 1. If any of these -assert- commands fails, execution will break--the data are not suitable for use with this code.

          I would be delighted if somebody comes up with a simpler solution.

          Added: You probably want to drop the -verbose- option from that -runby- command. I put it in there so I could see what was going on while I developed the code, but it's going to generate a lot of useless output cluttering up your log, now that it's working properly.
          Last edited by Clyde Schechter; 13 May 2021, 18:32.

          Comment


          • #6
            Hi Clyde, thanks for your help. When I run the the line:
            runby mark_blocks, by (subject_id)

            All of my variables disappear. Any idea why this would happen?

            I also get this output:

            number of by-groups = 191906
            by-groups with errors = 191906
            by-groups with no data = 0
            observations processed = 698453
            observations saved = 0

            Comment


            • #7
              Well, it says that all the by-groups have errors, which his why everything is disappearing. So the question is where the errors are coming from. The code works with the example data you provided, so I can only infer that your actual data is different in some material way. Please post an example with actual data that reproduces this problem and I will try to fix it.

              Comment


              • #8
                I have 40 variables in my data set, and this seems to be too big for dataex, is there another command I should use?

                Comment


                • #9
                  Here's another approach. Use the code the way I originally wrote it, with the -verbose- command. And, for brevity, just run it on a subset of your data, say the first 10 subject_id's. You will see error messages spit out while -runby- executes program mark_blocks. Post those error messages here and we may be able to figure it out just from that.

                  Comment


                  • #10
                    I still find your describtion is not clear enough. If I may understand it correctly, below code should help.
                    Code:
                    rangestat (count) v4 = encounter_num, interval(encounter_date -30 -1) by(subject_id)
                    bys subject_id encounter_date (encounter_num): replace v4 = (_n != 1) | (v4 != .)
                    
                    rangestat (count) v5 = encounter_num, interval(encounter_date 2 14) by(subject_id)
                    replace v5 = cond(v4,.,cond(v5 ==.,0,v5))
                    Note that, there must be different understandings between mine and Professor Clyde's on what you need. The output of my code is quite different from Professor Clyde's in #5 (see some exmples below). A clarification for your desire, thus, is still needed.
                    PHP Code:

                    id 33204date 09 Feb 2010 (obs 8should not be qualified to be a targetsince there is an encounter on 26 Jan 2010just 14 days before.

                    id 33204date 21 Sep 2010 (obs 18), the counting outcome is only 2 (including encounters in 25 Sep27 Sep)

                    id 33205date 11 May 2010 (obs 28should be picked up since the most recent encounter before that is on 27 March 2010i.e 45 days before

                    Comment


                    • #11
                      Romalpa Akzo I, too, initially found the question unclear. But in #4, O.P. shows the results he is looking for. The code in #5, using the example data in #5, does match what is asked for in #4. (FWIW, before I saw #3, my first thought was exactly the code you show in #10, but it produces different results.)

                      Comment


                      • #12
                        Many thanks, professore Clyde, I see your point now. Then I also note that the O.P's description in the starting of #4 seems different from the results he mentioned in the same post. Until now, I still do not understand the mechanism behind the examples that I have mentioned in #10. Kindly instruct me some more about your flow.

                        Comment


                        • #13
                          I think the basic idea is that O.P. wants to start with the first observation for each subject_id and examine 30 days from there. All the observations in that block, except the first, are designated as var4 = 1, and the first is var4 = 0 because it begins a 30 day block of time. Then, within that 30 day block of time, he wants to identify those observations that occur from 1 to 15 days after the initial one, setting var5 to a count of those. With the first 30 days taken care of, those observations are to be put aside, and a new 30 day period begins with the first observation following that 30 day period (if any). That new 30 day period is then to be handled just as the first one was. Once that is done, those observations are to be put aside, and a new 30 day block begins with the next observation (if any), and so on.

                          This marking out of the data into 30 day blocks is what the program mark_blocks does, and -runby- simply iterates it over subject_id. The code after -runby- works within those 30 day blocks using simple -by- prefixed commands to calculate var4 and var5 within each block.

                          Comment


                          • #14
                            Many thanks, professor Clyde. Your logic is clear to me now and it makes the puzzle interesting. However, I notice it still differs from O.P' output in #4, at least for the case of obs 16 (id 33204, date 15 Jun 2010). Thus, a clarification from the O.P is still needed.

                            Below is my try to solve for the (interesting) puzzle following your description.
                            Code:
                            gen b = encounter_date
                            bys subject_id (encounter_date encounter_num): replace b = b[_n-1] if b < b[_n-1] + 31 & _n>1
                            bys subject_id b: gen v4 = _n>1
                            
                            rangestat (count) v5 = encounter_num, interval(encounter_date 2 14) by(subject_id)
                            replace v5 = cond(v4,.,cond(v5 ==.,0,v5))

                            Comment


                            • #15
                              Brilliant solution!

                              And you are right, in #4, encounter_num 16 for id 33204 does not follow the generalization that he, in other respects, seems to want. I suspect he made a mistake when he worked that out by hand.

                              Comment

                              Working...
                              X