Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tabulations using unique id (remove duplications)

    Hi everyone,

    I have this vehicle crash data and I want to estimate the distinct vehicle crashes by year. The problem is that some of the observations have the same collision report number, meaning they were involved in the same crash. I want to estimate the number of crashes by year if MCFlag=1 and MCDriver=1 using distinct collision report numbers for each car crash. and then tabulate the number of persons involved in the car by gender if MCFlag=1 and MCDriver=1 (the unit of analysis is the individual, so no need to remove duplications -- have an idea how to do this one just want to know if there is other approaches that can be used to get the same resorts

    I used the following code
    tag=tag(collisionreportnumber)


    tab Gender if MCFlag==1 and MCDriver==1


    tab year if tag==1 & MCDriver==1 & MCDriver==1



    ​​​​​​​
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(VechicleNumber CollisionReportNumber) str4 Region int Year str8 Month byte Fatalities str12 VehicleType str6 Gender byte(MCFlag MCdriver)
     1 12 "Ren"  2017 "January"  1 "Motorcycle"   "Male"   1 1
     2 12 "Ren"  2017 "february" 1 "Motorcycle"   "Female" 1 1
     3 13 "Ren"  2017 "march"    1 "Moped"        "Male"   1 1
     4 14 "Ren"  2018 "april"    3 "scooter bike" "Female" 1 1
     5 15 "Ren"  2018 "December" 0 "CDL"          "Male"   1 0
     6 13 "Ren"  2018 "November" 0 "CDL"          "Female" 1 1
     7 14 "Ren"  2018 "January"  0 "Motorcycle"   "Male"   1 1
     8 15 "Ren"  2019 "february" 1 "Motorcycle"   "Female" 1 1
     9 15 "Ren"  2019 "march"    1 "Moped"        "Male"   1 1
    10 13 "Ren"  2019 "april"    1 "scooter bike" "Female" 0 0
    11 13 "Ren"  2019 "December" 3 "CDL"          "Male"   0 0
    12 14 "Ren"  2020 "November" 0 "CDL"          "Female" 1 1
    13 15 "Ren"  2020 "January"  0 "Motorcycle"   "Male"   1 1
    14 13 "Ren"  2020 "february" 0 "Motorcycle"   "Female" 1 1
    15 14 "Blue" 2017 "march"    1 "Moped"        "Male"   1 1
    16 15 "Blue" 2017 "april"    1 "scooter bike" "Female" 1 1
    17 15 "Blue" 2017 "December" 1 "CDL"          "Male"   1 1
    18 12 "Blue" 2017 "November" 3 "CDL"          "Female" 1 1
    19 12 "Blue" 2018 "January"  0 "Motorcycle"   "Male"   1 1
    20 13 "Blue" 2018 "february" 0 "Motorcycle"   "Female" 1 1
    21 14 "Blue" 2018 "march"    0 "Moped"        "Male"   1 1
    22 15 "Blue" 2018 "april"    1 "scooter bike" "Female" 0 1
    23 13 "Blue" 2019 "December" 1 "CDL"          "Male"   0 1
    24 14 "Blue" 2019 "November" 1 "CDL"          "Female" 1 0
    25 15 "Blue" 2019 "January"  3 "Motorcycle"   "Male"   1 1
    26 15 "Blue" 2019 "february" 0 "Motorcycle"   "Female" 0 1
    27 13 "Blue" 2020 "march"    0 "Moped"        "Male"   1 0
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 27 out of 27 observations

  • #2
    well, "tag=tag(collisionreportnumber)" is not legal syntax so that is not what you did - maybe you forgot to include -egen- prior to the quoted part? an alternative would be to use the -duplicates tag- command; see
    Code:
    h duplicates

    Comment

    Working...
    X