Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a variable that calculates gaps in between items

    Hi All,

    In the attached example dataset, I have workers (worker_id) who deliver orders from stores (store_id). Each worker can deliver orders from multiple stores. I want to create a new variable "gaps" that calculates how many orders were picked in between 2 orders that were picked from the same store. For example, in the sample data provided below, worker no. 59 picks an order from store no. 5330. After that the worker picks 2 orders from different stores (5941 and 163) and then again picks from store 5330. So, the variable "gap" is 2 for the 4th item in the same data. Since for the 1st 3 items we do not know how long were the gaps before this, those are denoted as missing (.)
    The result for each column should be something like: . , . , . , 2 , . , 2 , 4 , 0 , 0 , and so on. Gaps are shown as 0 if the previous order was picked from the same store.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long worker_id int(store_id date_pickstart) str11 courier_store
    59 5330 22237 "59-5330"
    59 5941 22237 "59-5941"
    59  163 22238 "59-163" 
    59 5330 22238 "59-5330"
    59 5051 22238 "59-5051"
    59  163 22239 "59-163" 
    59 5941 22239 "59-5941"
    59 5941 22239 "59-5941"
    59 5941 22240 "59-5941"
    59  163 22240 "59-163" 
    59 5330 22240 "59-5330"
    59 5941 22240 "59-5941"
    59 5330 22241 "59-5330"
    59 5941 22242 "59-5941"
    59 5941 22243 "59-5941"
    59 5284 22243 "59-5284"
    59 5941 22243 "59-5941"
    59 5941 22244 "59-5941"
    59 5941 22244 "59-5941"
    59  163 22246 "59-163" 
    end
    format %td date_pickstart
    Thank you!

  • #2
    Consider:

    Code:
    gen `c(obs_t)' obs_num = _n
    bysort courier_store (date_pickstart obs_num): gen gap = obs_num - obs_num[_n-1] - 1
    sort obs_num
    drop obs_num
    which produces:

    Code:
    . list, noobs sep(0) abbrev(15)
    
      +-------------------------------------------------------------+
      | worker_id   store_id   date_pickstart   courier_store   gap |
      |-------------------------------------------------------------|
      |        59       5330        18nov2020         59-5330     . |
      |        59       5941        18nov2020         59-5941     . |
      |        59        163        19nov2020          59-163     . |
      |        59       5330        19nov2020         59-5330     2 |
      |        59       5051        19nov2020         59-5051     . |
      |        59        163        20nov2020          59-163     2 |
      |        59       5941        20nov2020         59-5941     4 |
      |        59       5941        20nov2020         59-5941     0 |
      |        59       5941        21nov2020         59-5941     0 |
      |        59        163        21nov2020          59-163     3 |
      |        59       5330        21nov2020         59-5330     6 |
      |        59       5941        21nov2020         59-5941     2 |
      |        59       5330        22nov2020         59-5330     1 |
      |        59       5941        23nov2020         59-5941     1 |
      |        59       5941        24nov2020         59-5941     0 |
      |        59       5284        24nov2020         59-5284     . |
      |        59       5941        24nov2020         59-5941     1 |
      |        59       5941        25nov2020         59-5941     0 |
      |        59       5941        25nov2020         59-5941     0 |
      |        59        163        27nov2020          59-163     9 |
      +-------------------------------------------------------------+

    Comment


    • #3
      Thanks for the data example. Does this help?

      Given orders on the same daily date. daily date is not enough to determine sequence.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long worker_id int(store_id date_pickstart) str11 courier_store
      59 5330 22237 "59-5330"
      59 5941 22237 "59-5941"
      59  163 22238 "59-163"
      59 5330 22238 "59-5330"
      59 5051 22238 "59-5051"
      59  163 22239 "59-163"
      59 5941 22239 "59-5941"
      59 5941 22239 "59-5941"
      59 5941 22240 "59-5941"
      59  163 22240 "59-163"
      59 5330 22240 "59-5330"
      59 5941 22240 "59-5941"
      59 5330 22241 "59-5330"
      59 5941 22242 "59-5941"
      59 5941 22243 "59-5941"
      59 5284 22243 "59-5284"
      59 5941 22243 "59-5941"
      59 5941 22244 "59-5941"
      59 5941 22244 "59-5941"
      59  163 22246 "59-163"
      end
      format %td date_pickstart
      
      gen long seq = _n
      
      bysort worker_id store_id (seq) : gen gap = seq - seq[_n-1] - 1
      
      sort worker_id seq
      
      list worker_id store_id seq gap, sepby(worker_id store_id)
      Code:
           +---------------------------------+
           | worker~d   store_id   seq   gap |
           |---------------------------------|
        1. |       59       5330     1     . |
           |---------------------------------|
        2. |       59       5941     2     . |
           |---------------------------------|
        3. |       59        163     3     . |
           |---------------------------------|
        4. |       59       5330     4     2 |
           |---------------------------------|
        5. |       59       5051     5     . |
           |---------------------------------|
        6. |       59        163     6     2 |
           |---------------------------------|
        7. |       59       5941     7     4 |
        8. |       59       5941     8     0 |
        9. |       59       5941     9     0 |
           |---------------------------------|
       10. |       59        163    10     3 |
           |---------------------------------|
       11. |       59       5330    11     6 |
           |---------------------------------|
       12. |       59       5941    12     2 |
           |---------------------------------|
       13. |       59       5330    13     1 |
           |---------------------------------|
       14. |       59       5941    14     1 |
       15. |       59       5941    15     0 |
           |---------------------------------|
       16. |       59       5284    16     . |
           |---------------------------------|
       17. |       59       5941    17     1 |
       18. |       59       5941    18     0 |
       19. |       59       5941    19     0 |
           |---------------------------------|
       20. |       59        163    20     9 |
           +---------------------------------+

      Comment


      • #4
        Thank you Hemanshu Kumar and Nick Cox! Yes, both approaches work perfectly!

        Comment


        • #5
          Thanks for the report. The two answers are actually the same idea.

          Comment

          Working...
          X