Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to subtract dates in stata within groups?


    I am working on large data set and stuck on the following point.

    I would like to calculate time(hours) where group==1. For example, it will be 4 hours for (05/08/2013 22:00 to 05/08/2013 18:00). If this condition meet more than one time within ID, I want to sum this and count how many times this happen.
    Code:
    input id str20 date group
    1 "05/08/2013 14:50" 0
    1 "05/08/2013 17:20" 0
    1 "05/08/2013 18:00" 1
    1 "05/08/2013 20:00" 1
    1 "05/08/2013 22:00" 1
    1 "05/08/2013 23:00" 0
    2 "01/07/2015 05:59" 0
    2 "01/07/2015 14:30" 0
    2 "01/07/2015 16:00" 1
    2 "01/07/2015 19:00" 0
    2 "01/07/2015 21:00" 0
    2 "02/07/2015 02:00" 1
    2 "02/07/2015 06:50" 1
    2 "02/07/2015 13:01" 1
    2 "02/07/2015 20:00" 0
    end
    Any hint, please
    Last edited by Muhammad Faisal; 04 May 2018, 17:06.

  • #2
    I'm not 100% certain I understand what you want. But I think this is it:

    Code:
    clear
    input id str20 date group
    1 "05/08/2013 14:50" 0
    1 "05/08/2013 17:20" 0
    1 "05/08/2013 18:00" 1
    1 "05/08/2013 20:00" 1
    1 "05/08/2013 22:00" 1
    1 "05/08/2013 23:00" 0
    2 "01/07/2015 05:59" 0
    2 "01/07/2015 14:30" 0
    2 "01/07/2015 16:00" 1
    2 "01/07/2015 19:00" 0
    2 "01/07/2015 21:00" 0
    2 "02/07/2015 02:00" 1
    2 "02/07/2015 06:50" 1
    2 "02/07/2015 13:01" 1
    2 "02/07/2015 20:00" 0
    
    end
    
    //    CREATE STATA INTERNAL FORMAT CLOCK VARIABLES
    gen double date_time = clock(date, "MDYhm")
    format date_time %tc
    assert missing(date_time) == missing(date)
    
    by id (date_time), sort: gen spell_start = (group != group[_n-1])
    by id (date_time), sort: gen spell = sum(spell_start)
    replace spell = 0 if group == 0
    by id spell (date_time), sort: gen duration = ///
        (date_time[_N]-date_time[1])/msofminutes(60) if spell
        
    by id (date_time), sort: egen num_of_spells = total(cond(spell, spell_start, .))
    by id (date_time), sort: egen total_duration = total(cond(spell_start, duration, .))
    At the end, the variable num_of_spells will tell you how many times each id enters group 1, and total_duration will be the total time spent in group 1 across all of those times.

    Added: I can't really tell if your dates are MM/DD/CCYY or DD/MM/CCYY as the particular numbers shown are consistent with either. This code assumes MM/DD/CCYY. Change the second argument in the -clock()- function accordingly if I guessed wrong.

    Comment


    • #3
      Thank you Clyde for your quick response!

      This solution is not far from what I am aiming to compute. I think that I have not explained the problem well.

      Let me explain it again.
      I have three number of spells. I would like to add one extra point in the calculation of duration for each spell. I am expecting following results:

      For ID=1, number of spell = 1, and time_duration would be 5 hours as I would like to add one extra point (05/08/2013 18:00 to 05/08/2013 23:00).
      For ID=2, number of spell = 2, and time_duration would be 3 hours as I would like to add one extra point (01/07/2015 16:00 to 01/07/2015 19:00).
      For ID=2, number of spell = 3 and time_duration would be 18 hours as I would like to add one extra point (02/07/2015 02:00 to 02/07/2015 20:00).



      PS:My date format is DD/MM/YYYY hh:mm, however it does not matter in this code.

      Last edited by Muhammad Faisal; 05 May 2018, 05:17.

      Comment


      • #4
        So, what you mean to do, I take it, is include the time elapsed between the final observation that is part of the spell and the observation that follows it. This modified code should do what you ask:

        Code:
        //    CREATE STATA INTERNAL FORMAT CLOCK VARIABLES
        gen double date_time = clock(date, "DMYhm")
        format date_time %tc
        assert missing(date_time) == missing(date)
        
        by id (date_time), sort: gen double next_time = date_time[_n+1]
        format next_time %tc
        
        by id (date_time): gen spell_start = (group != group[_n-1])
        by id (date_time): gen spell = sum(spell_start)
        replace spell = 0 if group == 0
        by id spell (date_time), sort: gen duration = ///
            (next_time[_N]-date_time[1])/msofminutes(60) if spell
            
        by id (date_time), sort: egen num_of_spells = total(cond(spell, spell_start, .))
        by id (date_time), sort: egen total_duration = total(cond(spell_start, duration, .))
        Note: I've also fixed the date formatting to DD/MM/YYYY.

        Comment

        Working...
        X