Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a count variable that counts the number of times an event has occurred

    I have a dataset (example attached below) of orders a shopper has delivered to a customer after shopping from a certain store. I have the order_id (which is unique), a courier_id (which is the id of the shopper, that repeats itself since the same shopper can deliver multiple orders), a date_delivered (which is the date on which the delivery was made), and a store_id (which is the id of the store, which repeats itself as different shoppers might have shopped from the same store). I want to create a new variable "familiarity" that indicates how familiar a shopper is to a particular store, which indicates how many previous orders has a particular shopper shopped from the same store. I believe I need to use the "count" function that counts the number of times a shopper shopped from a particular store till that delivery date. I would appreciate any help/assistance with the code regarding this. Thanks!
    Example of the data:
    order_id store_id courier_id date_delivered
    1001 11 101 10may2021
    1002 10 102 11may2021
    1003 12 103 12may2021
    1004 12 104 12may2021
    1005 12 103 13may2021
    1006 13 102 13may2021
    The result I want is as follows:
    order_id store_id courier_id date_delivered familiarity
    1001 11 101 10may2021 1
    1002 10 102 11may2021 1
    1003 12 103 12may2021 1
    1004 12 104 12may2021 1
    1005 12 103 13may2021 2
    1006 13 102 13may2021 1
    So, the courier id 103 has shopped in store id 12 for 1 time till delivery date of 12may, but 2 times till the delivery date of 13may. How should I create this? Also, if sorting is required, should the orders be sorted by delivery_date?

    Thanks again for any help regarding this!

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int order_id byte store_id int courier_id str9 date_delivered
    1001 11 101 "10may2021"
    1002 10 102 "11may2021"
    1003 12 103 "12may2021"
    1004 12 104 "12may2021"
    1006 13 102 "13may2021"
    1005 12 103 "13may2021"
    end
    
    by courier_id store_id (date_delivered), sort: gen familiarity = _n
    sort date courier_id
    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.


    Comment


    • #3
      Thank you Clyde for the code. It works perfectly! And thanks for the tip regarding dataex. That's definitely useful!

      Comment

      Working...
      X