Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate all possible unique ranges of value

    I'm attempting to build on a previous question (https://www.statalist.org/forums/for...ly-survey-data), where I'm essentially hoping to figure out executing an intermediary step in that process (which would produce some new variables/information of interest).

    In my previous question, the resulting code produces a dataset that calculates the unique total hours for each person-dayofwk grouping. What I've now been working on is generating new variables that contain the actual start and stop times for each person-dayofwk grouping. The main challenge in my initial coding attempts has been that it's possible for people to have non-overlapping hours worked across a given day of the week, which necessitates creating second or third sets (and in the real data, even more than that) of unique start-stop variables because we don't want to include any intervening hours. E.g., if a person was available on a given day from 9-12, and then again from 17-21, two new sets of variables would be created: uniquestart1=9, uniquestop1=12, uniquestart2=17, uniquestop2=21. We would not want to capture the intervening 12-17 hour range.

    To give a sense of how this would start, here's some initial code that creates the initial unique start and stop time based on the first day_count for each person-dayofwk grouping.

    Code:
    sort resp_id dayofwk start stop
    by resp_id dayofwk: gen uniquestart1 = start[1]
    by resp_id dayofwk: gen uniquestop1 = stop[1]



    Some example data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str5 resp_id int date byte(dayofwk day_count) double(start stop)
    "ID200" 21849 0 1    0     3
    "ID200" 21842 0 2    0     3
    "ID200" 21856 0 3    0     3
    "ID200" 21835 0 4    0   3.5
    "ID200" 21856 0 5   18 23.99
    "ID200" 21835 0 6   19 23.99
    "ID200" 21857 1 1    0     3
    "ID200" 21836 1 2    0     3
    "ID200" 21837 2 1   19 23.99
    "ID200" 21838 3 1    0     3
    "ID200" 21852 3 2   18 23.99
    "ID200" 21845 3 3   18 23.99
    "ID200" 21846 4 1    0     3
    "ID200" 21853 4 2    0     3
    "ID200" 21832 4 3   17 23.99
    "ID200" 21839 4 4   19 23.99
    "ID200" 21846 4 5   19 23.99
    "ID200" 21847 5 1    0     3
    "ID200" 21833 5 2    0     3
    "ID200" 21840 5 3    0     3
    "ID200" 21833 5 4   18 23.99
    "ID200" 21840 5 5 18.5 23.99
    "ID200" 21854 5 6   19 23.99
    "ID200" 21847 5 7   19 23.99
    "ID200" 21848 6 1    0     3
    "ID200" 21834 6 2    0     3
    "ID200" 21841 6 3    0     3
    "ID200" 21855 6 4    0     3
    "ID200" 21848 6 5    7 23.99
    "ID200" 21834 6 6   17 23.99
    "ID200" 21855 6 7   18 23.99
    "ID200" 21841 6 8   19 23.99
    "ID300" 21793 0 1    6    14
    "ID300" 21800 0 2    6    14
    "ID300" 21807 0 3    9    15
    "ID300" 21822 1 1    6    14
    "ID300" 21808 1 2    6    14
    "ID300" 21815 1 3    6    14
    "ID300" 21809 2 1    6    14
    "ID300" 21796 3 1    5    14
    "ID300" 21810 3 2    6    14
    "ID300" 21803 3 3    6    16
    "ID300" 21817 3 4    6    16
    "ID300" 21797 4 1    7    17
    "ID300" 21804 4 2    9    17
    "ID300" 21818 4 3    9    17
    "ID300" 21811 4 4    9    21
    "ID300" 21798 5 1    6    14
    "ID300" 21819 5 2    6    14
    "ID300" 21805 5 3    9    17
    "ID300" 21799 6 1    6    12
    "ID300" 21820 6 2   14    16
    "ID300" 21806 6 3   20 23.99
    end
    format %tddd-Mon-YY date



    Here's how I think the final data would look, where essentially the final/highest day_count within each person-dayofwk grouping would contain the final unique start/stop values and you could just keep only those rows.
    resp_id date dayofwk day_count start stop uniquestart1 uniquestop1 uniquestart2 uniquestop2 uniquestart3 uniquestop3
    ID200 27-Oct-19 0 1 0 3 0 3
    ID200 20-Oct-19 0 2 0 3 0 3
    ID200 3-Nov-19 0 3 0 3 0 3
    ID200 13-Oct-19 0 4 0 3.5 0 3.5
    ID200 3-Nov-19 0 5 18 23.99 0 3.5 18 23.99
    ID200 13-Oct-19 0 6 19 23.99 0 3 18 23.99
    ID200 4-Nov-19 1 1 0 3 0 3
    ID200 14-Oct-19 1 2 0 3 0 3
    ID200 15-Oct-19 2 1 19 23.99 19 23.99
    ID200 16-Oct-19 3 1 0 3 0 3
    ID200 30-Oct-19 3 2 18 23.99 0 3 18 23.99
    ID200 23-Oct-19 3 3 18 23.99 0 3 18 23.99
    ID200 24-Oct-19 4 1 0 3 0 3
    ID200 31-Oct-19 4 2 0 3 0 3
    ID200 10-Oct-19 4 3 17 23.99 0 3 17 23.99
    ID200 17-Oct-19 4 4 19 23.99 0 3 17 23.99
    ID200 24-Oct-19 4 5 19 23.99 0 3 17 23.99
    ID200 25-Oct-19 5 1 0 3 0 3
    ID200 11-Oct-19 5 2 0 3 0 3
    ID200 18-Oct-19 5 3 0 3 0 3
    ID200 11-Oct-19 5 4 18 23.99 0 3 18 23.99
    ID200 18-Oct-19 5 5 18.5 23.99 0 3 18 23.99
    ID200 1-Nov-19 5 6 19 23.99 0 3 18 23.99
    ID200 25-Oct-19 5 7 19 23.99 0 3 18 23.99
    ID200 26-Oct-19 6 1 0 3 0 3
    ID200 12-Oct-19 6 2 0 3 0 3
    ID200 19-Oct-19 6 3 0 3 0 3
    ID200 2-Nov-19 6 4 0 3 0 3
    ID200 26-Oct-19 6 5 7 23.99 0 3 7 23.99
    ID200 12-Oct-19 6 6 17 23.99 0 3 7 23.99
    ID200 2-Nov-19 6 7 18 23.99 0 3 7 23.99
    ID200 19-Oct-19 6 8 19 23.99 0 3 7 23.99
    ID300 1-Sep-19 0 1 6 14 6 14
    ID300 8-Sep-19 0 2 6 14 6 14
    ID300 15-Sep-19 0 3 9 15 6 15
    ID300 30-Sep-19 1 1 6 14 6 14
    ID300 16-Sep-19 1 2 6 14 6 14
    ID300 23-Sep-19 1 3 6 14 6 14
    ID300 17-Sep-19 2 1 6 14 6 14
    ID300 4-Sep-19 3 1 5 14 5 14
    ID300 18-Sep-19 3 2 6 14 5 14
    ID300 11-Sep-19 3 3 6 16 5 16
    ID300 25-Sep-19 3 4 6 16 5 16
    ID300 5-Sep-19 4 1 7 17 7 17
    ID300 12-Sep-19 4 2 9 17 7 17
    ID300 26-Sep-19 4 3 9 17 7 17
    ID300 19-Sep-19 4 4 9 21 7 21
    ID300 6-Sep-19 5 1 6 14 6 14
    ID300 27-Sep-19 5 2 6 14 6 14
    ID300 13-Sep-19 5 3 9 17 6 17
    ID300 7-Sep-19 6 1 6 12 6 12
    ID300 28-Sep-19 6 2 14 16 6 12 14 16
    ID300 14-Sep-19 6 3 20 23.99 6 12 14 16 20 23.99

  • #2
    So you want the total number of hours available? That's just the sum of all the differences between stop/start times.

    Comment


    • #3
      George Ford not the sum, that was the interest in the question I linked to that Clyde Schechter helped with.

      I'm now hoping to create new variables that capture those actual unique start/stop times (this would be uniquestart1, uniquestop1, uniquestart2, uniquestop2, etc. in the table which I generated by hand as an example).

      Comment

      Working...
      X