I have a dataset of 98,104 observations. Each observation represents an athletic event, either a practice or competition, where athletes played and therefore had the opportunity to be injured. The variable record_uid_exp is a unique ID number for that athletic event. The variable num_athletes indicates the number of athletes participating in that event and therefore at risk of injury. The variable injury_code indicates the type of injury--if any--which occurred. As most events did not result in an injury, injury_code is recorded as a missing value in most cases. In my dataset, there are 1,454 observations or events where an injury occurred and injury_code includes 36 different specific types of injuries. Various other variables are included such as sport, gender, timeloss (time lost to injury in days), season (pre-season, in-season, post-season) etc.
I'd like to calculate the injury rate per 1000 athlete-exposures in a variety of formats. Specifically, I'd like to calculate injury rate by specific injury type, total injury rate by sport and by gender, and compare injury rates by variables such as practice vs competition, preseason versus in season, and similar.
I've used the collapse command to calculate the rate like this:
I believe these calculations are correct but this seems like an awkward and inflexible way to perform this job. I know there must be a better way to do it that will allow me to quickly compare the injury rates for the various subgroups that I'm interested in and compare injury rates by other variables. I've looked into this extensively but haven't been able to figure out the correct approach for the job. Any help to point me in the right direction is much appreciated.
I've included an example of what my dataset looks like below.
Best regards,
Andrew Ross
I'd like to calculate the injury rate per 1000 athlete-exposures in a variety of formats. Specifically, I'd like to calculate injury rate by specific injury type, total injury rate by sport and by gender, and compare injury rates by variables such as practice vs competition, preseason versus in season, and similar.
I've used the collapse command to calculate the rate like this:
Code:
collapse (count) numinjuries=num_athletes (sum) numexposures=num_athletes, by(injury_code)
quietly summarize numexposures
local exposures r(sum)
display `exposures'
generate float injury_rate = (1000*numinjuries)/`exposures'
I've included an example of what my dataset looks like below.
Best regards,
Andrew Ross
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 record_uid_exp long(injury_code sport gender) int timeloss long(season event_type)
"A3096000102008" . 9 0 . 3 1
"A3667000123173" . 11 0 . 3 2
"A3451000085824" . 11 0 . 3 2
"A4262000004288" . 25 0 . 2 2
"A1071000117282" . 14 0 . 3 1
"A2658000000029" . 9 0 . 3 2
"A3667000105158" . 11 0 . 2 2
"A3667000123180" . 11 0 . 3 1
"A1710000003654" . 4 0 . 3 1
"A3667000123164" . 11 0 . 3 2
"A4669000068557" . 6 0 . 2 2
"A3597000000665" . 6 0 . 3 1
"A4505000079656" . 2 0 . 2 2
"A1710000003272" 28 6 0 0 3 1
"A3778000111107" . 9 0 . 3 1
"A3249000083298" . 11 0 . 3 2
"A3667000123156" . 11 0 . 3 2
"A3667000123154" . 11 0 . 3 2
"A2295000015119" . 2 0 . 3 2
"A3667000123152" . 11 0 . 3 2
"A3667000123151" . 11 0 . 3 2
"A2749000001948" . 22 0 . 3 2
"A2831000000633" . 1 0 . 2 2
"A2295000016258" . 2 0 . 3 1
"A3667000123144" . 11 0 . 2 2
"A4577000003690" . 6 0 . 2 2
"A3667000123142" . 11 0 . 2 2
"A2206000011934" . 6 0 . 2 2
"A1972000105373" . 9 0 . 3 1
"A1353000002569" . 6 0 . 2 2
"A3249000068811" . 14 0 . 3 1
"A3667000123137" . 11 0 . 2 2
"A3667000123136" . 11 0 . 2 2
"A1402000069478" . 2 0 . 2 2
"A3889010006907" . 18 0 . 2 2
"A2617000003344" . 2 0 . 3 1
"A3936010000417" . 9 0 . 2 2
"A4419000009010" . 20 0 . 2 2
"A3667000123191" . 11 0 . 3 1
"A4669000116622" . 4 0 . 2 2
"A4262000004742" . 2 0 . 3 2
"A3667000119239" . 14 0 . 3 1
"A3667000119238" . 14 0 . 3 1
"A3984000010437" . 18 0 . 3 2
"A2429000113168" . 11 0 . 3 2
"A1353000008759" . 22 0 . 3 2
"A2735000108846" . 9 0 . 3 1
"A1071000122878" . 9 0 . 3 2
"A1760000098559" . 9 0 . 3 2
"A4262000013072" . 1 0 . 3 1
"A4262000012166" . 18 0 . 3 1
"A2206000005762" . 2 0 . 3 2
"A3778000074642" . 9 0 . 3 1
"A3984000011745" . 16 0 . 3 2
"A3710000076179" . 2 0 . 3 2
"A3667000123197" . 11 0 . 3 2
"A3667000123198" . 11 0 . 1 2
"A2132010012714" . 25 0 . 1 2
"A4419000003443" . 1 0 . 3 2
"A3597000001878" . 2 0 . 3 2
"A3161000097861" . 2 0 . 2 2
"A3889010008479" . 16 0 . 3 2
"A3096000077896" . 9 0 . 3 2
"A3889010003073" . 14 0 . 2 2
"A2749000002251" . 6 0 . 2 2
"A3667000126202" . 2 0 . 3 2
"A3667000126203" . 2 0 . 3 1
"A3778000073554" . 9 0 . 3 2
"A2132010012699" . 20 0 . 3 1
"A4349000124494" . 9 0 . 3 1
"A3984000009606" . 25 0 . 2 2
"A2841000131512" . 9 0 . 1 2
"A3667000116810" . 2 0 . 3 2
"A2068010004918" . 14 0 . 2 2
"A2206000009921" . 22 0 . 3 2
"A4577000003638" . 14 0 . 2 2
"A3249000071893" . 14 0 . 3 2
"A2617000002557" . 14 0 . 3 2
"A3667000126221" . 2 0 . 3 2
"A3667000077303" . 2 0 . 2 2
"A2295000010052" . 9 0 . 3 1
"A2295000013151" . 22 0 . 3 2
"A2206000014457" 18 6 0 1 3 2
"A3967000124459" 5 9 0 20 3 1
"A2295000008230" . 14 0 . 1 2
"A4262000004920" . 25 0 . 3 1
"A2206000012333" . 22 0 . 3 2
"A2295000015903" . 18 0 . 3 1
"A2841000130203" . 9 0 . 3 2
"A4559000001894" . 1 0 . 3 2
"A3096000109666" . 2 0 . 3 1
"A3667000116851" . 2 0 . 2 2
"A4419000003585" . 25 0 . 3 1
"A3096000123869" . 9 0 . 3 2
"A3889010009328" . 22 0 . 3 2
"A2429000071181" . 6 0 . 3 2
"A1353000012223" . 25 0 . 1 2
"A2295000010815" . 18 0 . 3 2
"A3667000116842" . 2 0 . 2 2
"A1353000012177" . 22 0 . 2 2
end
label values injury_code injury_code
label def injury_code 5 "Adductor (Groin) Strain", modify
label def injury_code 18 "Hip Flexor Strain", modify
label def injury_code 28 "Iliopsoas/Sartorius Strain", modify
label values sport sport1
label def sport1 1 "Men's Baseball", modify
label def sport1 2 "Men's Basketball", modify
label def sport1 4 "Men's Crosscountry", modify
label def sport1 6 "Men's Football", modify
label def sport1 9 "Men's Ice Hockey", modify
label def sport1 11 "Men's Lacrosse", modify
label def sport1 14 "Men's Soccer", modify
label def sport1 16 "Men's Swimming Diving", modify
label def sport1 18 "Men's Tennis", modify
label def sport1 20 "Men's Track Indoor", modify
label def sport1 22 "Men's Track Outdoor", modify
label def sport1 25 "Men's Wrestling", modify
label values gender gender
label def gender 0 "Male", modify
label values season season1
label def season1 1 "Postseason", modify
label def season1 2 "Preseason", modify
label def season1 3 "Regular season", modify
label values event_type event_type1
label def event_type1 1 "Competition (Game)", modify
label def event_type1 2 "Scheduled team practice", modify
Comment