Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find matching values and accordingly sum values

    Hello,

    I currently have an issue with my data. I have a dataset (see below) which includes an Event ID, columns for the year, month and day, a Country variable, a variable counting the number of casualties (nkill) and some other columns (related EH EI EJ).
    Some of the entries are "multiple Events" which are connected to some other entry. Hence, the related eventid is indicated in the related column, or if there is more than one related Event, further eventids are indicated in the columns EH, EI, EJ, .....
    Now what I would like Stata to do is the following:

    1. For each line in the dataset that is part of a multiple Event:
    make sure that EACH of the eventids in the related and further columns can be found in the dataset (in the first column eventid) --> if not ALL related Events can be found, the line should be deleted
    2. If ALL of the related Events can be found AND the date and Country are identical among ALL events, I would like Stata to sum up all of these events in one line, summing up the numbers for nkill for these related Events and additionally writing the Name of the City where nkill was highest. --> in the end I only want to Keep 1 line for a "multiple Event"

    I would be very happy about any Kind of help. I already tried around with fndmtch, ... but couldn't solve my Problem.

    Thanks a lot!



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double eventid int iyear byte(imonth iday) str32 country_txt double(nkill related EH EI EJ)
    197000000001 1970 0  0 "Dominican Republic"  1            .            .            . .
    197000000002 1970 0  0 "Mexico"              0            .            .            . .
    197001000001 1970 1  0 "Philippines"         1            .            .            . .
    197001000002 1970 1  0 "Greece"              .            .            .            . .
    197001000003 1970 1  0 "Japan"               .            .            .            . .
    197001010002 1970 1  1 "United States"       0            .            .            . .
    197001020001 1970 1  2 "Uruguay"             0            .            .            . .
    197001020002 1970 1  2 "United States"       0            .            .            . .
    197001020003 1970 1  2 "United States"       0            .            .            . .
    197001030001 1970 1  3 "United States"       0            .            .            . .
    197001050001 1970 1  1 "United States"       0            .            .            . .
    197001060001 1970 1  6 "United States"       0            .            .            . .
    197001080001 1970 1  8 "Italy"               0            .            .            . .
    197001090001 1970 1  9 "United States"       0            .            .            . .
    197001090002 1970 1  9 "United States"       0            .            .            . .
    197001100001 1970 1 10 "East Germany (GDR)"  .            .            .            . .
    197001110001 1970 1 11 "Ethiopia"            1            .            .            . .
    197001120001 1970 1 12 "United States"       0            .            .            . .
    197001120002 1970 1 12 "United States"       0            .            .            . .
    197001130001 1970 1 13 "United States"       0            .            .            . .
    197001140001 1970 1 14 "United States"       0            .            .            . .
    197001150001 1970 1 15 "Uruguay"             0            .            .            . .
    197001190002 1970 1 19 "United States"       0            .            .            . .
    197001190003 1970 1 19 "United States"       0            .            .            . .
    197001190004 1970 1 19 "United States"       0            .            .            . .
    197001200001 1970 1 20 "Guatemala"           1            .            .            . .
    197001210001 1970 1 21 "Philippines"         0            .            .            . .
    197001220001 1970 1 22 "Venezuela"           0            .            .            . .
    197001220002 1970 1 22 "United States"       0            .            .            . .
    197001250001 1970 1 25 "United States"       0            .            .            . .
    197001250002 1970 1 25 "United States"       0            .            .            . .
    197001260001 1970 1 26 "United States"       0            .            .            . .
    197001260003 1970 1 26 "United States"       0            .            .            . .
    197001270002 1970 1 27 "United States"       0            .            .            . .
    197001280001 1970 1 28 "East Germany (GDR)"  .            .            .            . .
    197001280002 1970 1 28 "United States"       0            .            .            . .
    197001300001 1970 1 30 "United States"       0 197001300002            .            . .
    197001300002 1970 1 30 "United States"       0 197001300001            .            . .
    197001300003 1970 1 30 "United States"       0            .            .            . .
    197001310001 1970 1 31 "Philippines"         0            .            .            . .
    197001310002 1970 1 31 "United States"       0            .            .            . .
    197002010001 1970 2  1 "United States"       0            .            .            . .
    197002010002 1970 2  1 "United States"       0            .            .            . .
    197002020001 1970 2  2 "United States"       0            .            .            . .
    197002030001 1970 2  3 "United States"       0 197002030002            .            . .
    197002030002 1970 2  3 "United States"       0 197002030001            .            . .
    197002040001 1970 2  4 "United States"       0            .            .            . .
    197002060001 1970 2  6 "United States"       0            .            .            . .
    197002060002 1970 2  6 "United States"       0            .            .            . .
    197002060003 1970 2  6 "United States"       0            .            .            . .
    197002060004 1970 2  6 "United States"       0 197002060005 197002060006            . .
    197002060005 1970 2  6 "United States"       0 197002060004 197002060006            . .
    197002060006 1970 2  6 "United States"       0 197002060004 197002060005            . .
    197002060007 1970 2  6 "United States"       0            .            .            . .
    197002070003 1970 2  7 "United States"       0            .            .            . .
    197002080001 1970 2  8 "United States"       0 197002080002 197002090003            . .
    197002080002 1970 2  8 "United States"       0 197002080001 197002090003            . .
    197002090003 1970 2  9 "United States"       0 197002080001 197002080002            . .
    197002090004 1970 2  9 "United States"       0            .            .            . .
    197002100001 1970 2 10 "West Germany (FRG)"  1            .            .            . .
    197002110001 1970 2 11 "United States"       0            .            .            . .
    197002130001 1970 2 13 "United States"       0            .            .            . .
    197002130002 1970 2 13 "West Germany (FRG)"  7            .            .            . .
    197002130003 1970 2 13 "United States"       0            .            .            . .
    197002150001 1970 2 15 "United States"       0            .            .            . .
    197002150002 1970 2 15 "United States"       0            .            .            . .
    197002160001 1970 2 16 "United States"       1 197002160004            .            . .
    197002160002 1970 2 16 "United States"       0 197002160003            .            . .
    197002160003 1970 2 16 "United States"       0 197002160002            .            . .
    197002160004 1970 2 16 "United States"       0 197002160001            .            . .
    197002170001 1970 2 17 "United States"       0            .            .            . .
    197002170002 1970 2 17 "United States"       0            .            .            . .
    197002170003 1970 2 17 "United States"       0            .            .            . .
    197002170004 1970 2 17 "United States"       0            .            .            . .
    197002180002 1970 2 18 "United States"       0            .            .            . .
    197002180003 1970 2 18 "United States"       0            .            .            . .
    197002200001 1970 2 20 "United States"       0            .            .            . .
    197002200002 1970 2 20 "United States"       0            .            .            . .
    197002200003 1970 2 20 "United States"       0            .            .            . .
    197002210001 1970 2 21 "West Germany (FRG)"  0            .            .            . .
    197002210002 1970 2 21 "Switzerland"        47            .            .            . .
    197002210003 1970 2 21 "United States"       0            .            .            . .
    197002210004 1970 2 21 "United States"       0            .            .            . .
    197002210005 1970 2 21 "United States"       0            .            .            . .
    197002210006 1970 2 21 "United States"       0            .            .            . .
    197002220001 1970 2 22 "United States"       0            .            .            . .
    197002220002 1970 2 22 "United States"       0            .            .            . .
    197002230001 1970 2 23 "United States"       0 197002230002 197002230003 197002230004 .
    197002230002 1970 2 23 "United States"       0 197002230001 197002230003 197002230004 .
    197002230003 1970 2 23 "United States"       0 197002230001 197002230002 197002230004 .
    197002230004 1970 2 23 "United States"       0 197002230001 197002230002 197002230003 .
    197002230005 1970 2 23 "United States"       0            .            .            . .
    197002240002 1970 2 24 "United States"       0            .            .            . .
    197002240003 1970 2 24 "United States"       0            .            .            . .
    197002270001 1970 2 27 "United States"       0            .            .            . .
    197002280001 1970 2 28 "Jordan"              .            .            .            . .
    197003000001 1970 3  0 "Philippines"         0            .            .            . .
    197003010001 1970 3  1 "Italy"               0            .            .            . .
    197003010005 1970 3  1 "United States"       0            .            .            . .
    197003010006 1970 3  1 "United States"       0            .            .            . .
    end

  • #2
    1. For each line in the dataset that is part of a multiple Event:
    make sure that EACH of the eventids in the related and further columns can be found in the dataset (in the first column eventid) --> if not ALL related Events can be found, the line should be deleted
    I've found that the easiest way to do this is through the -merge- command. The idea is that you create three new datasets where you rename the EH/I/J variables to eventid and then merge each of these datasets to the original one. The _merge variable will then tell you if there is a match or not. It's a bit convoluted, but it usually works quite well.

    Comment


    • #3
      Thanks for the recommendation. However, in my entire dataset I have a lot more than only EH EI and EJ, so it would get extremely messy. Moreover, if I would do it via merge, if one Event is e.g. composed of 4 single Events, as a result I would get a fully merged Event for each of the 4 subevents instead of only obtaining ONE fully merged Event and deleting the others...

      Comment


      • #4
        You can install rowsort (by Nick Cox) by typing in Stata's Command window:
        Code:
        search rowsort
        and then click on the pr0046 link at the top to get an install link.

        With rowsort, you create new identifier variables, sorted in ascending order. Once in order, you can group observations that have the same ordered variables. Since you also want the date and country to match, you also throw that into the mix. A good group is one where there is as many non-missing ordered id variables as there are observations in the group. There's some ambiguity as to what to do with cases that do not follow the logic so I sidestep the problem.

        To reduce to one observation per group, you order the observations by number of nkill (and city to break ties) and drop all but the last observation in the group.

        Code:
        * assume that eventid uniquely identifies observations
        isid eventid, sort
        
        * type search rowsort and click on the pr0046 to install
        rowsort eventid related EH EI EJ, gen(rel_id1-rel_id5)
        format %12.0f rel_id*
        
        * number of codes in sorted rel_id variables
        egen nid = rownonmiss(rel_id*)
        
        * a good group is one where the number of observation == nid
        * AND the dates match AND countries match
        egen gid = group(rel_id* iyear imonth iday country_txt), missing
        bysort gid (eventid): gen is_good = nid == _N
        tab is_good
        list eventid-EJ if !is_good
        
        * make up a city name
        by gid: gen city = word(c(Weekdays),_n)
        
        * sort by number of nkill within group and keep the obs with the highest nkill
        sort gid nkill city
        by gid: egen tkill = total(nkill)
        by gid: drop if is_good & _n < _N

        Comment


        • #5
          Thank you so much, it works perfectly

          Comment

          Working...
          X