Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bysort based on multiple groups

    I want to calculate the total number of votes casted for each state in each year. "totalvotes_dist" variable has the total number of votes for each district and each state for a given time period.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year str20 state byte district str47 party long totalvotes_dist
    1976 "ALABAMA" 1 "DEMOCRAT"   157170
    1976 "ALABAMA" 1 "REPUBLICAN" 157170
    1976 "ALABAMA" 2 "DEMOCRAT"   156362
    1976 "ALABAMA" 2 "REPUBLICAN" 156362
    1976 "ALABAMA" 3 "DEMOCRAT"   108048
    1976 "ALABAMA" 4 "DEMOCRAT"   176022
    1976 "ALABAMA" 4 "REPUBLICAN" 176022
    1976 "ALABAMA" 5 "DEMOCRAT"   113560
    1976 "ALABAMA" 6 "DEMOCRAT"   162518
    1976 "ALABAMA" 6 "REPUBLICAN" 162518
    1976 "ALABAMA" 7 "DEMOCRAT"   110501
    1976 "ALASKA"  0 "DEMOCRAT"   118208
    1976 "ALASKA"  0 "REPUBLICAN" 118208
    1976 "ARIZONA" 1 "DEMOCRAT"   168119
    1976 "ARIZONA" 1 "REPUBLICAN" 168119
    1976 "ARIZONA" 2 "DEMOCRAT"   182128
    1976 "ARIZONA" 2 "REPUBLICAN" 182128
    1976 "ARIZONA" 3 "DEMOCRAT"   187165
    1976 "ARIZONA" 3 "REPUBLICAN" 187165
    1976 "ARIZONA" 4 "DEMOCRAT"   191590
    1976 "ARIZONA" 4 "REPUBLICAN" 191590
    end


    I tried "bysort state year: egen state_year_totalvotes = total(totalvotes_dist)" and collapse (sum) totalvotes, by(year state district) but it double counts the totalvotes_dist.

  • #2
    No, it's not double-counting anything. The problem is that the data contains redundant values of totalvotes_dist. Notice that the value of totalvotes_dist is always the same for both observations of the district and year, when there is more than one observation for that combination (which is the usual situation). So you need to restrict things in a way that excludes the repeated values. I think the easiest way to do this would be:
    Code:
    egen tag = tag(state district year)
    by state year, sort: egen wanted = total(cond(tag, totalvotes_dist, .))

    Comment


    • #3
      Thank you Clyde! This was very helpful!

      Comment

      Working...
      X