Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count number of events by id and for a specific time window

    I want to generate a variable which sums up the number of events for a specific company in the past 10 years. I don't want to include in this count the current event date (i.e. if there is no prior event for a specific company/date, it will be assigned a value of 0 for that company/date)

    What I have is "company id" (Column 1) and "event date" (Column 2) and I want to generate is "# events in the last 10 years" Column 3 as below:
    company id event date # events in the last 10 years
    Y8547N 15Jan07 0
    Y8547N 13Apr07 1
    Y8547N 29May07 2
    Y8547N 15Jun07 3
    Y8547N 03Aug07 4
    Y8547N 03Sep09 5
    Y8547N 09Nov09 6
    Y8547N 23Mar11 7
    Y8547N 15Jun12 8
    Y8547N 15Jun12 8
    Y8547N 11Apr13 10
    Y8547N 29Jul13 11
    Y8547N 19May15 12
    Y8547N 23Dec16 13
    Y8547N 25Jan17 13
    Y85484 27Mar12 0
    Y85484 12Nov13 1
    Y8548U 15Jan07 0
    Y8548U 26Jan07 1
    Y8548U 27Mar07 2
    Y8548U 27Mar07 2
    Y8548U 14Dec07 4

  • #2
    Thanks for the example. Following the precise advice at FAQ Advice would have made it a little easier to work with your data.

    rangestat (SSC) can help here.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 companyid float eventdate
    "Y8547N" 17181
    "Y8547N" 17269
    "Y8547N" 17315
    "Y8547N" 17332
    "Y8547N" 17381
    "Y8547N" 18143
    "Y8547N" 18210
    "Y8547N" 18709
    "Y8547N" 19159
    "Y8547N" 19159
    "Y8547N" 19459
    "Y8547N" 19568
    "Y8547N" 20227
    "Y8547N" 20811
    "Y8547N" 20844
    "Y85484" 19079
    "Y85484" 19674
    "Y8548U" 17181
    "Y8548U" 17192
    "Y8548U" 17252
    "Y8548U" 17252
    "Y8548U" 17514
    end
    format %td eventdate
    
    * ssc inst rangestat 
    rangestat (count) eventdate, int(eventdate -3652 -1) by(company)
    replace eventdate_count = 0 if missing(eventdate_count)

    Comment


    • #3
      Just found the solution in a preivous forum post! Sorry.

      Originally posted by Clyde Schechter View Post
      So, of course, Date needs to be a Stata internal date variable in order for anything good to come of it. I assume you have it that way.

      Code:
      rangestat (sum) Dummy, interval(Date, -29, 0) by(CompanyID) describe
      Note: This calculation uses a window that includes the current date and goes back 30 days from there. If you want the preceding 30 days but not including the current one, change -29, 0 to -30, -1.

      To get the -rangestat- command, run -ssc install rangestat-. It is a new command designed for purposes like this by Robert Picard, Roberto Ferrer, and Nick Cox.

      In the future, you are requested to post example data using the -dataex- command so that those who would like to help you can easily and faithfully reproduce the data situation you actually face. Listings of the type shown in #1 are not helpful: it takes longer to clean them up and put them into Stata than to solve your problem! Please see FAQ #12 for additional advice on the best way to post code and results on this forum.

      Comment


      • #4
        Thank you.

        Originally posted by Nick Cox View Post
        Thanks for the example. Following the precise advice at FAQ Advice would have made it a little easier to work with your data.

        rangestat (SSC) can help here.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str6 companyid float eventdate
        "Y8547N" 17181
        "Y8547N" 17269
        "Y8547N" 17315
        "Y8547N" 17332
        "Y8547N" 17381
        "Y8547N" 18143
        "Y8547N" 18210
        "Y8547N" 18709
        "Y8547N" 19159
        "Y8547N" 19159
        "Y8547N" 19459
        "Y8547N" 19568
        "Y8547N" 20227
        "Y8547N" 20811
        "Y8547N" 20844
        "Y85484" 19079
        "Y85484" 19674
        "Y8548U" 17181
        "Y8548U" 17192
        "Y8548U" 17252
        "Y8548U" 17252
        "Y8548U" 17514
        end
        format %td eventdate
        
        * ssc inst rangestat
        rangestat (count) eventdate, int(eventdate -3652 -1) by(company)
        replace eventdate_count = 0 if missing(eventdate_count)

        Comment


        • #5
          Hello,
          I have a related question & would appreciate any help!

          For every monthyear-id combination, I want to calculate for a given id, how many times did that id appear in the id2 column.
          For example, for monthyear 530, id 3217 appeared in the id2 column 1 time.
          For monthyear 535, id 4217 appeared in the id2 column 5 times (for that month).

          I tried:
          Code:
          program myprog
              qui su id if id==id2
              gen obs=r(N)
          end
          
          rangerun myprog, int(monthyear 0 0) by(id)
          But of course this just counted for which rows, id==id2


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double(id monthyear id2)
          4217 525    .
          3217 526 4217
          4217 526    .
          3217 528 4217
          4217 528    .
          3217 529    .
          4217 529 4217
          3217 530 4217
          4217 530 3217
          4217 530 4217
          4217 533    .
          4217 533 4217
          3217 535 4217
          4217 535 4217
          4217 535 4217
          4217 535 4217
          4217 535 4217
          3217 536 4217
          3217 536 4217
          4217 536 4217
          3217 537 4217
          3217 537 4217
          4217 685    .
          end
          format %tm monthyear
          Last edited by Harry Kaj; 13 Oct 2019, 10:07. Reason: figured it out by using countmatch!

          Comment

          Working...
          X