Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating cumulative sum of per group

    Hello everyone, this is my first post on here so my apologies if the format is not up to the usual standards.

    I have a data set covering a time span between 1998 and 2016 with about 65.000 observations that may be divided into 15 groups (Dutch political parties) The aspects that are relevant for this question are listed below:

    Code:
    clear
    input byte order str9 date str8 time byte partyid str10 sponsor byte(success cumsuccess)
    1 "26aug1998" "08:11:41" 5 "CDA" 0 0
    2 "26aug1998" "08:12:12" 10 "GroenLinks" 0 0
    3 "26aug1998" "08:18:51" 5 "CDA" 0 0
    4 "26aug1998" "08:46:35" 10 "GroenLinks" 0 0
    5 "26aug1998" "09:00:21" 10 "GroenLinks" 0 0
    6 "08sep1998" "07:41:39" 5 "CDA" 1 1
    7 "08sep1998" "08:50:57" 5 "CDA" 0 1
    8 "08sep1998" "09:05:53" 24 "SP" 0 0
    9 "17sep1998" "07:39:19" 10 "GroenLinks" 0 0
    10 "17sep1998" "07:44:20" 25 "VVD" 0 0
    11 "17sep1998" "07:59:09" 7 "D66" 0 0
    12 "17sep1998" "08:03:12" 20 "PvdA" 0 0
    13 "17sep1998" "08:12:32" 10 "GroenLinks" 0 0
    14 "17sep1998" "08:12:55" 23 "SGP" 1 1
    15 "17sep1998" "08:14:15" 24 "SP" 0 0
    16 "17sep1998" "08:16:44" 5 "CDA" 0 1
    17 "17sep1998" "08:21:38" 20 "PvdA" 0 0
    18 "17sep1998" "08:22:40" 5 "CDA" 1 2
    19 "17sep1998" "08:24:20" 5 "CDA" 0 2
    20 "17sep1998" "08:27:57" 20 "PvdA" 1 1
    21 "17sep1998" "08:42:09" 10 "GroenLinks" 0 0
    22 "17sep1998" "08:44:11" 7 "D66" 0 0
    23 "17sep1998" "08:54:14" 20 "PvdA" 0 1
    24 "17sep1998" "08:55:03" 10 "GroenLinks" 0 0
    25 "17sep1998" "09:02:38" 24 "SP" 0 0
    26 "17sep1998" "09:08:47" 7 "D66" 0 0
    27 "22sep1998" "07:25:52" 24 "SP" 1 0
    28 "22sep1998" "09:37:04" 10 "GroenLinks" 1 1
    29 "29sep1998" "07:15:41" 20 "PvdA" 0 1
    30 "29sep1998" "07:22:35" 5 "CDA" 1 3
    end
    
    drop cumsuccess
    sort partyid order
    bys partyid:gen cumsuccess=sum(success)
    sort order
    What I want to do is create a variable that is the sum of the variable 'success' per 'partyid'(numeric reflection of sponsorparty).

    I manually added the variable 'cumsuccess' in the table above manually to better illustrate what I mean. As you can see, the variable 'success' is just 0's and 1's to illustrate rejection (0) or success (1). The numbers in cumsuccess should reflect the cumulative sum of success per party. Which is why 'CDA' is at 3 and other parties are not.

    I hope this makes sense. Please tell me if I could make some things more clear. Thank you!
    Last edited by Toon Zegers; 26 May 2019, 09:24.

  • #2
    is your "party id" really a time???? I don't think so and therefor the code below is untested; please read the FAQ and learn about, e.g., -dataex- so you can provide example data that people can use; meanwhile, making some assumptions, try this:
    Code:
    egen cumsuccess=total(total), by(party_id)

    Comment


    • #3
      Hi Rich! Thank you for your reply. The formatting is a bit unclear. But, for instance for the first row, partyid = 5 and sponsorparty = CDA. I don't really get why it 'splits' the date/time variable either.

      With regards to the code you posted, that doesn't really work.

      I will take a further look though at dataex, thank you.

      EDIT: It should (hopefully) be better now and up to the usual dataex standards
      Last edited by Toon Zegers; 26 May 2019, 08:53.

      Comment


      • #4
        Hello Toon, you can post your data using -dataex- in the future here. For your question, here is a code for your reference:
        Code:
        clear
        input byte order str9 date str8 time byte partyid str10 sponsor byte(success cumsuccess)
         1 "26aug1998" "08:11:41"  5 "CDA"        0 0
         2 "26aug1998" "08:12:12" 10 "GroenLinks" 0 0
         3 "26aug1998" "08:18:51"  5 "CDA"        0 0
         4 "26aug1998" "08:46:35" 10 "GroenLinks" 0 0
         5 "26aug1998" "09:00:21" 10 "GroenLinks" 0 0
         6 "08sep1998" "07:41:39"  5 "CDA"        1 1
         7 "08sep1998" "08:50:57"  5 "CDA"        0 1
         8 "08sep1998" "09:05:53" 24 "SP"         0 0
         9 "17sep1998" "07:39:19" 10 "GroenLinks" 0 0
        10 "17sep1998" "07:44:20" 25 "VVD"        0 0
        11 "17sep1998" "07:59:09"  7 "D66"        0 0
        12 "17sep1998" "08:03:12" 20 "PvdA"       0 0
        13 "17sep1998" "08:12:32" 10 "GroenLinks" 0 0
        14 "17sep1998" "08:12:55" 23 "SGP"        1 1
        15 "17sep1998" "08:14:15" 24 "SP"         0 0
        16 "17sep1998" "08:16:44"  5 "CDA"        0 1
        17 "17sep1998" "08:21:38" 20 "PvdA"       0 0
        18 "17sep1998" "08:22:40"  5 "CDA"        1 2
        19 "17sep1998" "08:24:20"  5 "CDA"        0 2
        20 "17sep1998" "08:27:57" 20 "PvdA"       1 1
        21 "17sep1998" "08:42:09" 10 "GroenLinks" 0 0
        22 "17sep1998" "08:44:11"  7 "D66"        0 0
        23 "17sep1998" "08:54:14" 20 "PvdA"       0 1
        24 "17sep1998" "08:55:03" 10 "GroenLinks" 0 0
        25 "17sep1998" "09:02:38" 24 "SP"         0 0
        26 "17sep1998" "09:08:47"  7 "D66"        0 0
        27 "22sep1998" "07:25:52" 24 "SP"         1 0
        28 "22sep1998" "09:37:04" 10 "GroenLinks" 1 1
        29 "29sep1998" "07:15:41" 20 "PvdA"       0 1
        30 "29sep1998" "07:22:35"  5 "CDA"        1 3
        end
        
        drop cumsuccess
        sort partyid order
        bys partyid:gen cumsuccess=sum(success)
        sort order
        2B or not 2B, that's a question!

        Comment


        • #5
          Thank you Liu

          Comment

          Working...
          X