Tabulations using unique id (remove duplications)

kotey dzanie

Join Date: Jul 2022
Posts: 26

Tabulations using unique id (remove duplications)

16 Jun 2023, 14:04

Hi everyone,

I have this vehicle crash data and I want to estimate the distinct vehicle crashes by year. The problem is that some of the observations have the same collision report number, meaning they were involved in the same crash. I want to estimate the number of crashes by year if MCFlag=1 and MCDriver=1 using distinct collision report numbers for each car crash. and then tabulate the number of persons involved in the car by gender if MCFlag=1 and MCDriver=1 (the unit of analysis is the individual, so no need to remove duplications -- have an idea how to do this one just want to know if there is other approaches that can be used to get the same resorts

I used the following code
tag=tag(collisionreportnumber)

tab Gender if MCFlag==1 and MCDriver==1

tab year if tag==1 & MCDriver==1 & MCDriver==1

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(VechicleNumber CollisionReportNumber) str4 Region int Year str8 Month byte Fatalities str12 VehicleType str6 Gender byte(MCFlag MCdriver)
 1 12 "Ren"  2017 "January"  1 "Motorcycle"   "Male"   1 1
 2 12 "Ren"  2017 "february" 1 "Motorcycle"   "Female" 1 1
 3 13 "Ren"  2017 "march"    1 "Moped"        "Male"   1 1
 4 14 "Ren"  2018 "april"    3 "scooter bike" "Female" 1 1
 5 15 "Ren"  2018 "December" 0 "CDL"          "Male"   1 0
 6 13 "Ren"  2018 "November" 0 "CDL"          "Female" 1 1
 7 14 "Ren"  2018 "January"  0 "Motorcycle"   "Male"   1 1
 8 15 "Ren"  2019 "february" 1 "Motorcycle"   "Female" 1 1
 9 15 "Ren"  2019 "march"    1 "Moped"        "Male"   1 1
10 13 "Ren"  2019 "april"    1 "scooter bike" "Female" 0 0
11 13 "Ren"  2019 "December" 3 "CDL"          "Male"   0 0
12 14 "Ren"  2020 "November" 0 "CDL"          "Female" 1 1
13 15 "Ren"  2020 "January"  0 "Motorcycle"   "Male"   1 1
14 13 "Ren"  2020 "february" 0 "Motorcycle"   "Female" 1 1
15 14 "Blue" 2017 "march"    1 "Moped"        "Male"   1 1
16 15 "Blue" 2017 "april"    1 "scooter bike" "Female" 1 1
17 15 "Blue" 2017 "December" 1 "CDL"          "Male"   1 1
18 12 "Blue" 2017 "November" 3 "CDL"          "Female" 1 1
19 12 "Blue" 2018 "January"  0 "Motorcycle"   "Male"   1 1
20 13 "Blue" 2018 "february" 0 "Motorcycle"   "Female" 1 1
21 14 "Blue" 2018 "march"    0 "Moped"        "Male"   1 1
22 15 "Blue" 2018 "april"    1 "scooter bike" "Female" 0 1
23 13 "Blue" 2019 "December" 1 "CDL"          "Male"   0 1
24 14 "Blue" 2019 "November" 1 "CDL"          "Female" 1 0
25 15 "Blue" 2019 "January"  3 "Motorcycle"   "Male"   1 1
26 15 "Blue" 2019 "february" 0 "Motorcycle"   "Female" 0 1
27 13 "Blue" 2020 "march"    0 "Moped"        "Male"   1 0
end

------------------ copy up to and including the previous line ------------------

Listed 27 out of 27 observations

Tags: None

Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#2

16 Jun 2023, 14:43

well, "tag=tag(collisionreportnumber)" is not legal syntax so that is not what you did - maybe you forgot to include -egen- prior to the quoted part? an alternative would be to use the -duplicates tag- command; see

Code:

h duplicates
Comment

Announcement

Tabulations using unique id (remove duplications)

Comment