Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a new variable for overlapping time intervals

    Hi Statalist,

    I have been lost in coding since I started to deal with this issue and here is my problem. In the example data below, I give you the IDs of individuals and the code of action they take during the day(irrelevant ones are removed). Action's being equal to 111 means that that person is at work. Time is basically the amount of time devoted to that activity. Start and End in fact represents the hours on which a particular activity is initiated and finished, yet they are stated in terms of minutes; for example, 240 means 4:00 AM.

    Here is what I would like to do. I want to calculate the how much of work time of these two individuals overlap. For this particular example, the answer is 230. Therefore, I want a create a column where there are 230s as long as FamilyID, which shows that those individuals belong to the same family, equals 1.

    (The order of the variables: FamilyID, ID, Action,Time, Start, End)

    Code:
    FamilyID ID Action Time Start End
    1 159   . 180  240  420
    1 159   .  10  420  430
    1 159   .  20  430  450
    1 159   .  30  450  480
    1 159   .  60  480  540
    1 159 111 210  540  750
    1 159   .  30  750  780
    1 159 111 180  780  960
    1 159   .  60  960 1020
    1 159 111 180 1020 1200
    1 159   .  60 1200 1260
    1 159   .  30 1260 1290
    1 159   .  20 1290 1310
    1 159   .  10 1310 1320
    1 159   .  60 1320 1380
    1 159   .  60 1380 1440
    1 160   . 180  240  420
    1 160   . 120  420  540
    1 160   .  10  540  550
    1 160   .  10  550  560
    1 160   .  40  560  600
    1 160   .  60  600  660
    1 160   .  20  660  680
    1 160   . 100  680  780
    1 160   . 120  780  900
    1 160 111  60  900  960
    1 160   .  30  960  990
    1 160 111 140  990 1130
    1 160   .  10 1130 1140
    1 160 111  60 1140 1200
    1 160   .  60 1200 1260
    1 160   .  30 1260 1290
    1 160   .  10 1290 1300
    1 160   .  20 1300 1320
    1 160   .  60 1320 1380
    1 160   .  20 1380 1400
    1 160   .  40 1400 1440

  • #2
    Well, the following code, I believe works. But it produces different results from what you ask, but I believe your count is incorrect. For example, minutes 900 through 960 are minutes of overlap. I believe you are counting them as 60 minutes, but in fact they are 61 minutes. There are three such intervals in your example, so my count comes up 3 minutes larger than yours. In fact, your Start and End values double count all of the boundary minutes. You might want to first revise your data so that, for example, 240 to 420 is replaced by 240 to 419, etc. If you do that and apply my code, your count and mine will agree.

    Code:
    clear
    input FamilyID ID Action Time Start End
    1 159 . 180 240 420
    1 159 . 10 420 430
    1 159 . 20 430 450
    1 159 . 30 450 480
    1 159 . 60 480 540
    1 159 111 210 540 750
    1 159 . 30 750 780
    1 159 111 180 780 960
    1 159 . 60 960 1020
    1 159 111 180 1020 1200
    1 159 . 60 1200 1260
    1 159 . 30 1260 1290
    1 159 . 20 1290 1310
    1 159 . 10 1310 1320
    1 159 . 60 1320 1380
    1 159 . 60 1380 1440
    1 160 . 180 240 420
    1 160 . 120 420 540
    1 160 . 10 540 550
    1 160 . 10 550 560
    1 160 . 40 560 600
    1 160 . 60 600 660
    1 160 . 20 660 680
    1 160 . 100 680 780
    1 160 . 120 780 900
    1 160 111 60 900 960
    1 160 . 30 960 990
    1 160 111 140 990 1130
    1 160 . 10 1130 1140
    1 160 111 60 1140 1200
    1 160 . 60 1200 1260
    1 160 . 30 1260 1290
    1 160 . 10 1290 1300
    1 160 . 20 1300 1320
    1 160 . 60 1320 1380
    1 160 . 20 1380 1400
    1 160 . 40 1400 1440
    end
    
    // SAVE A COPY OF THE ORIGINAL DATA
    tempfile copy
    save `copy'
    
    // ELIMINATE NON WORK OBSERVATIONS
    keep if Action == 111
    
    // VERIFY ONLY TWO PERSONS PER FAMILY
    by FamilyID (ID), sort: assert inlist(ID, ID[1], ID[_N])
    
    // VERIFY NO OVERLAP OF INTERVALS WITHIN PERSON
    by FamilyID ID (Start), sort: assert Start >= End[_n-1] if _n > 1
    
    // EXPAND EACH OBSERVATION TO ITS NUMBER OF MINUTES
    expand End - Start + 1
    by FamilyID ID Start, sort: gen minute = Start + _n - 1
    
    // IDENTIFY MINUTES THAT ARE OVERLAPPED
    by FamilyID minute, sort: gen overlap = _N - 1
    
    // AND ADD THEM UP
    by FamilyID: replace overlap = sum(overlap)
    by FamilyID: keep if _n == _N
    // REVERSE DOUBLE-COUNTING
    replace overlap = overlap/2
    
    // MERGE BACK TO ORIGINAL DATA
    merge 1:m FamilyID using `copy', keep(match using) nogenerate
    drop minute

    Comment


    • #3
      Here's another approach based on a technique described in

      SJ-13-1 dm0068 . . . . . Stata tip 114: Expand paired dates to pairs of dates
      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
      Q1/13 SJ 13(1):217--219 (no commands)
      tip on using expand to deal with paired dates

      http://www.stata-journal.com/article...article=dm0068
      that does not require expansion by the minute. The following example includes only work observations, a date variable just in case, and a unique identifier for each action. I also added a couple of observation of a third individual.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long actionid float(date FamilyID ID Start End)
      1 20903 1 159  540  750
      2 20903 1 159  780  960
      3 20903 1 159 1020 1200
      4 20903 1 160  900  960
      5 20903 1 160  990 1130
      6 20903 1 160 1140 1200
      7 20903 1 170  749  751
      8 20903 1 170 1150 1160
      end
      format %td date
      
      * convert time interval to long form and create action toggle switch
      expand 2
      bysort actionid: gen atime = cond(_n==1, Start, End)
      by actionid: gen onoff = cond(_n==1, 1, -1)
      
      * order by time within a family on a given day;
      * put on events before off events;
      * use a running sum to track events through time
      * note the start of the next time period
      gsort FamilyID date atime -onoff actionid
      gen runsum = sum(onoff)
      by FamilyID date: gen nextp = atime[_n+1]
      
      * the status at the end of the period reflects all on/off events
      by FamilyID date atime: gen status = runsum[_N]
      
      * identify spells of overlap
      gen overlap = status > 1
      gen spell = sum(overlap != overlap[_n-1])
      
      * for each spell with an overlap, calculate length of spell for one observation
      bysort FamilyID date spell (atime actionid): gen spell_overlap = nextp[_N] - atime[1] + 1 if overlap & _n == 1
      
      * add overlaps within a group of FamilyID date
      by FamilyID date: egen wanted = total(spell_overlap)
      
      * reduce to one observation per FamilyID date
      by FamilyID date: keep if _n == 1
      keep FamilyID date wanted

      Comment


      • #4
        Firstly sorry for the late response and thank you so much, Clyde and Robert. You guys are amazing. Clyde's method has worked well for me, yet I still tried Robert's too. However, I could not manage to apply it; I think the additional variables he used confused me a little. Thanks a lot.

        Comment

        Working...
        X