Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating injury rates by athlete exposure

    I have a dataset of 98,104 observations. Each observation represents an athletic event, either a practice or competition, where athletes played and therefore had the opportunity to be injured. The variable record_uid_exp is a unique ID number for that athletic event. The variable num_athletes indicates the number of athletes participating in that event and therefore at risk of injury. The variable injury_code indicates the type of injury--if any--which occurred. As most events did not result in an injury, injury_code is recorded as a missing value in most cases. In my dataset, there are 1,454 observations or events where an injury occurred and injury_code includes 36 different specific types of injuries. Various other variables are included such as sport, gender, timeloss (time lost to injury in days), season (pre-season, in-season, post-season) etc.

    I'd like to calculate the injury rate per 1000 athlete-exposures in a variety of formats. Specifically, I'd like to calculate injury rate by specific injury type, total injury rate by sport and by gender, and compare injury rates by variables such as practice vs competition, preseason versus in season, and similar.

    I've used the collapse command to calculate the rate like this:
    Code:
    collapse (count) numinjuries=num_athletes (sum) numexposures=num_athletes, by(injury_code)
    quietly summarize numexposures
    local exposures r(sum)
    display `exposures'
    generate float injury_rate = (1000*numinjuries)/`exposures'
    I believe these calculations are correct but this seems like an awkward and inflexible way to perform this job. I know there must be a better way to do it that will allow me to quickly compare the injury rates for the various subgroups that I'm interested in and compare injury rates by other variables. I've looked into this extensively but haven't been able to figure out the correct approach for the job. Any help to point me in the right direction is much appreciated.

    I've included an example of what my dataset looks like below.


    Best regards,
    Andrew Ross

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str14 record_uid_exp long(injury_code sport gender) int timeloss long(season event_type)
    "A3096000102008"  .  9 0  . 3 1
    "A3667000123173"  . 11 0  . 3 2
    "A3451000085824"  . 11 0  . 3 2
    "A4262000004288"  . 25 0  . 2 2
    "A1071000117282"  . 14 0  . 3 1
    "A2658000000029"  .  9 0  . 3 2
    "A3667000105158"  . 11 0  . 2 2
    "A3667000123180"  . 11 0  . 3 1
    "A1710000003654"  .  4 0  . 3 1
    "A3667000123164"  . 11 0  . 3 2
    "A4669000068557"  .  6 0  . 2 2
    "A3597000000665"  .  6 0  . 3 1
    "A4505000079656"  .  2 0  . 2 2
    "A1710000003272" 28  6 0  0 3 1
    "A3778000111107"  .  9 0  . 3 1
    "A3249000083298"  . 11 0  . 3 2
    "A3667000123156"  . 11 0  . 3 2
    "A3667000123154"  . 11 0  . 3 2
    "A2295000015119"  .  2 0  . 3 2
    "A3667000123152"  . 11 0  . 3 2
    "A3667000123151"  . 11 0  . 3 2
    "A2749000001948"  . 22 0  . 3 2
    "A2831000000633"  .  1 0  . 2 2
    "A2295000016258"  .  2 0  . 3 1
    "A3667000123144"  . 11 0  . 2 2
    "A4577000003690"  .  6 0  . 2 2
    "A3667000123142"  . 11 0  . 2 2
    "A2206000011934"  .  6 0  . 2 2
    "A1972000105373"  .  9 0  . 3 1
    "A1353000002569"  .  6 0  . 2 2
    "A3249000068811"  . 14 0  . 3 1
    "A3667000123137"  . 11 0  . 2 2
    "A3667000123136"  . 11 0  . 2 2
    "A1402000069478"  .  2 0  . 2 2
    "A3889010006907"  . 18 0  . 2 2
    "A2617000003344"  .  2 0  . 3 1
    "A3936010000417"  .  9 0  . 2 2
    "A4419000009010"  . 20 0  . 2 2
    "A3667000123191"  . 11 0  . 3 1
    "A4669000116622"  .  4 0  . 2 2
    "A4262000004742"  .  2 0  . 3 2
    "A3667000119239"  . 14 0  . 3 1
    "A3667000119238"  . 14 0  . 3 1
    "A3984000010437"  . 18 0  . 3 2
    "A2429000113168"  . 11 0  . 3 2
    "A1353000008759"  . 22 0  . 3 2
    "A2735000108846"  .  9 0  . 3 1
    "A1071000122878"  .  9 0  . 3 2
    "A1760000098559"  .  9 0  . 3 2
    "A4262000013072"  .  1 0  . 3 1
    "A4262000012166"  . 18 0  . 3 1
    "A2206000005762"  .  2 0  . 3 2
    "A3778000074642"  .  9 0  . 3 1
    "A3984000011745"  . 16 0  . 3 2
    "A3710000076179"  .  2 0  . 3 2
    "A3667000123197"  . 11 0  . 3 2
    "A3667000123198"  . 11 0  . 1 2
    "A2132010012714"  . 25 0  . 1 2
    "A4419000003443"  .  1 0  . 3 2
    "A3597000001878"  .  2 0  . 3 2
    "A3161000097861"  .  2 0  . 2 2
    "A3889010008479"  . 16 0  . 3 2
    "A3096000077896"  .  9 0  . 3 2
    "A3889010003073"  . 14 0  . 2 2
    "A2749000002251"  .  6 0  . 2 2
    "A3667000126202"  .  2 0  . 3 2
    "A3667000126203"  .  2 0  . 3 1
    "A3778000073554"  .  9 0  . 3 2
    "A2132010012699"  . 20 0  . 3 1
    "A4349000124494"  .  9 0  . 3 1
    "A3984000009606"  . 25 0  . 2 2
    "A2841000131512"  .  9 0  . 1 2
    "A3667000116810"  .  2 0  . 3 2
    "A2068010004918"  . 14 0  . 2 2
    "A2206000009921"  . 22 0  . 3 2
    "A4577000003638"  . 14 0  . 2 2
    "A3249000071893"  . 14 0  . 3 2
    "A2617000002557"  . 14 0  . 3 2
    "A3667000126221"  .  2 0  . 3 2
    "A3667000077303"  .  2 0  . 2 2
    "A2295000010052"  .  9 0  . 3 1
    "A2295000013151"  . 22 0  . 3 2
    "A2206000014457" 18  6 0  1 3 2
    "A3967000124459"  5  9 0 20 3 1
    "A2295000008230"  . 14 0  . 1 2
    "A4262000004920"  . 25 0  . 3 1
    "A2206000012333"  . 22 0  . 3 2
    "A2295000015903"  . 18 0  . 3 1
    "A2841000130203"  .  9 0  . 3 2
    "A4559000001894"  .  1 0  . 3 2
    "A3096000109666"  .  2 0  . 3 1
    "A3667000116851"  .  2 0  . 2 2
    "A4419000003585"  . 25 0  . 3 1
    "A3096000123869"  .  9 0  . 3 2
    "A3889010009328"  . 22 0  . 3 2
    "A2429000071181"  .  6 0  . 3 2
    "A1353000012223"  . 25 0  . 1 2
    "A2295000010815"  . 18 0  . 3 2
    "A3667000116842"  .  2 0  . 2 2
    "A1353000012177"  . 22 0  . 2 2
    end
    label values injury_code injury_code
    label def injury_code 5 "Adductor (Groin) Strain", modify
    label def injury_code 18 "Hip Flexor Strain", modify
    label def injury_code 28 "Iliopsoas/Sartorius Strain", modify
    label values sport sport1
    label def sport1 1 "Men's Baseball", modify
    label def sport1 2 "Men's Basketball", modify
    label def sport1 4 "Men's Crosscountry", modify
    label def sport1 6 "Men's Football", modify
    label def sport1 9 "Men's Ice Hockey", modify
    label def sport1 11 "Men's Lacrosse", modify
    label def sport1 14 "Men's Soccer", modify
    label def sport1 16 "Men's Swimming Diving", modify
    label def sport1 18 "Men's Tennis", modify
    label def sport1 20 "Men's Track Indoor", modify
    label def sport1 22 "Men's Track Outdoor", modify
    label def sport1 25 "Men's Wrestling", modify
    label values gender gender
    label def gender 0 "Male", modify
    label values season season1
    label def season1 1 "Postseason", modify
    label def season1 2 "Preseason", modify
    label def season1 3 "Regular season", modify
    label values event_type event_type1
    label def event_type1 1 "Competition (Game)", modify
    label def event_type1 2 "Scheduled team practice", modify

  • #2
    Well, your data example is neither as described, nor compatible with the code you show as it does not include the key variables.

    In any case, I think your proposed code is incorrect. I think for the number of injuries you do not want (count) num_athletes, because that will count as 1 even when nobody was injured (unless num_athletes is missing when there are no injuries--but if that is true, then (sum) num_athletes will not give the correct denominator for the rate.) Your numerator for the rate should be a (sum) of a variable that is equal to the number of athletes injured in any given event (and, in particular, 0 when nobody is injured).

    All of that said, -collapse-ing to get a numerator and denominator and then calculating the ratio is the way to get these rates. I do it all the time.

    Comment


    • #3
      Thank you for the reply. I had a feeling I wouldn't get all the information right on the first try. I've put a new dataset example below with the num_athletes variable included.

      It sounds like collapse is the way to go. The way the dataset is organized injury_code is "." (missing) if the sporting event had no injuries (the vast majority of observations). If an injury occurred, injury_code is populated with the type of injury. So there is no variable to indicate the number of injuries at a given event. If multiple injuries occurred at one event, they are given separate entries and the event id code (record_uid_exp) is duplicated for that entry.

      So as far as I can tell the denominator would be sum(num_athletes) and the numerator the count of injuries, either any injury (injury_code!=".") or for a specific injury.

      In the collapse command that I included before, I was trying to produce a summary of the rate sorted by each individual injury type. The , by(injury_code) option seems to make this possible but then you cannot use count(injury_code) as Stata gives me this:

      Code:
      collapse (count)injury_code (sum)num_athletes, by(injury_code)
      injury_code may not be both target and by()
      r(198);
      The code I used in my previous post seems to offer a work around and gives me this data output (truncated):
      Code:
      . list injury_code numinjuries numexposures in 1/10, table
      
           +--------------------------------------------------------------------+
           |                                  injury_code   numinj~s   numexp~s |
           |--------------------------------------------------------------------|
        1. |                         Abductor muscle tear         35       1123 |
        2. |                   Adductor (Groin) Contusion          5         94 |
        3. |         Adductor (Groin) Myositis Ossificans          1         15 |
        4. |                       Adductor (Groin) Spasm         14        609 |
        5. |                      Adductor (Groin) Strain        320      13320 |
           |--------------------------------------------------------------------|
        6. |                          Adductor Tendonitis         17        404 |
        7. |               Apophysitis of The Iliac Crest          1         20 |
        8. |                                   Coccydynia          3         77 |
        9. |        External Rotators (Piriformis) Strain         10        368 |
       10. | Greater Trochanteric Bursitis (Snapping Hip)         18        559 |
           +--------------------------------------------------------------------+
      numinjuries seems to supply the correct injury count when I check it manually. numexposures is not the correct denominator but summed across all observations (including when no one was injured i.e. injury_code=".") seems to provide the correct denominator which I can then manually put in as the denominator when I generate a new variable to calculate the rate.

      My apologies, I know I'm probably not explaining myself very well. If any of this makes sense, my questions are:

      Is collapse the best way to get these rates, particularly when I want to look at multiple sub-groups? (So far, sounds like yes)
      Is there a way to easily get 95% CI intervals for the rates?
      Am I using the collapse command in the simplest, most efficient way or am I making it more complicated than it needs to be?

      many thanks,
      Andrew Ross


      Here is the dataset example with the key variables included:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str14 record_uid_exp int num_athletes long(injury_code sport gender season event_type)
      "A3096000102008"  21  .  9 0 3 1
      "A3667000123173"  43  . 11 0 3 2
      "A3451000085824"  40  . 11 0 3 2
      "A4262000004288"  50  . 25 0 2 2
      "A1071000117282"  28  . 14 0 3 1
      "A2658000000029"  24  .  9 0 3 2
      "A3667000105158"  47  . 11 0 2 2
      "A3667000123180"  23  . 11 0 3 1
      "A1710000003654"   7  .  4 0 3 1
      "A3667000123164"  44  . 11 0 3 2
      "A4669000068557"  91  .  6 0 2 2
      "A3597000000665"  46  .  6 0 3 1
      "A4505000079656"  17  .  2 0 2 2
      "A1710000003272"  89 28  6 0 3 1
      "A3778000111107"  19  .  9 0 3 1
      "A3249000083298"  30  . 11 0 3 2
      "A3667000123156"  44  . 11 0 3 2
      "A3667000123154"  44  . 11 0 3 2
      "A2295000015119"  15  .  2 0 3 2
      "A3667000123152"  44  . 11 0 3 2
      "A3667000123151"  44  . 11 0 3 2
      "A2749000001948"  47  . 22 0 3 2
      "A2831000000633"  24  .  1 0 2 2
      "A2295000016258"   9  .  2 0 3 1
      "A3667000123144"  43  . 11 0 2 2
      "A4577000003690" 103  .  6 0 2 2
      "A3667000123142"  44  . 11 0 2 2
      "A2206000011934" 102  .  6 0 2 2
      "A1972000105373"  22  .  9 0 3 1
      "A1353000002569" 115  .  6 0 2 2
      "A3249000068811"  18  . 14 0 3 1
      "A3667000123137"  44  . 11 0 2 2
      "A3667000123136"  45  . 11 0 2 2
      "A1402000069478"  15  .  2 0 2 2
      "A3889010006907"  12  . 18 0 2 2
      "A2617000003344"   7  .  2 0 3 1
      "A3936010000417"  25  .  9 0 2 2
      "A4419000009010"   7  . 20 0 2 2
      "A3667000123191"  42  . 11 0 3 1
      "A4669000116622"  14  .  4 0 2 2
      "A4262000004742"  15  .  2 0 3 2
      "A3667000119239"  21  . 14 0 3 1
      "A3667000119238"  16  . 14 0 3 1
      "A3984000010437"   9  . 18 0 3 2
      "A2429000113168"  47  . 11 0 3 2
      "A1353000008759"  47  . 22 0 3 2
      "A2735000108846"  21  .  9 0 3 1
      "A1071000122878"  33  .  9 0 3 2
      "A1760000098559"  27  .  9 0 3 2
      "A4262000013072"  15  .  1 0 3 1
      "A4262000012166"  10  . 18 0 3 1
      "A2206000005762"  15  .  2 0 3 2
      "A3778000074642"  20  .  9 0 3 1
      "A3984000011745"  34  . 16 0 3 2
      "A3710000076179"  15  .  2 0 3 2
      "A3667000123197"  45  . 11 0 3 2
      "A3667000123198"  45  . 11 0 1 2
      "A2132010012714"  22  . 25 0 1 2
      "A4419000003443"  40  .  1 0 3 2
      "A3597000001878"  14  .  2 0 3 2
      "A3161000097861"  16  .  2 0 2 2
      "A3889010008479"  14  . 16 0 3 2
      "A3096000077896"  25  .  9 0 3 2
      "A3889010003073"  27  . 14 0 2 2
      "A2749000002251" 100  .  6 0 2 2
      "A3667000126202"  18  .  2 0 3 2
      "A3667000126203"  12  .  2 0 3 1
      "A3778000073554"  26  .  9 0 3 2
      "A2132010012699"  36  . 20 0 3 1
      "A4349000124494"  20  .  9 0 3 1
      "A3984000009606"  25  . 25 0 2 2
      "A2841000131512"  28  .  9 0 1 2
      "A3667000116810"  10  .  2 0 3 2
      "A2068010004918"  20  . 14 0 2 2
      "A2206000009921"  30  . 22 0 3 2
      "A4577000003638"  24  . 14 0 2 2
      "A3249000071893"  21  . 14 0 3 2
      "A2617000002557"  22  . 14 0 3 2
      "A3667000126221"  11  .  2 0 3 2
      "A3667000077303"  18  .  2 0 2 2
      "A2295000010052"  21  .  9 0 3 1
      "A2295000013151"  59  . 22 0 3 2
      "A2206000014457" 107 18  6 0 3 2
      "A3967000124459"  18  5  9 0 3 1
      "A2295000008230"  27  . 14 0 1 2
      "A4262000004920"  10  . 25 0 3 1
      "A2206000012333"  14  . 22 0 3 2
      "A2295000015903"  11  . 18 0 3 1
      "A2841000130203"  28  .  9 0 3 2
      "A4559000001894"  38  .  1 0 3 2
      "A3096000109666"  13  .  2 0 3 1
      "A3667000116851"  12  .  2 0 2 2
      "A4419000003585"  10  . 25 0 3 1
      "A3096000123869"  27  .  9 0 3 2
      "A3889010009328"  20  . 22 0 3 2
      "A2429000071181"  95  .  6 0 3 2
      "A1353000012223"   6  . 25 0 1 2
      "A2295000010815"  11  . 18 0 3 2
      "A3667000116842"  18  .  2 0 2 2
      "A1353000012177"  34  . 22 0 2 2
      end
      label values injury_code injury_code
      label def injury_code 5 "Adductor (Groin) Strain", modify
      label def injury_code 18 "Hip Flexor Strain", modify
      label def injury_code 28 "Iliopsoas/Sartorius Strain", modify
      label values sport sport1
      label def sport1 1 "Men's Baseball", modify
      label def sport1 2 "Men's Basketball", modify
      label def sport1 4 "Men's Crosscountry", modify
      label def sport1 6 "Men's Football", modify
      label def sport1 9 "Men's Ice Hockey", modify
      label def sport1 11 "Men's Lacrosse", modify
      label def sport1 14 "Men's Soccer", modify
      label def sport1 16 "Men's Swimming Diving", modify
      label def sport1 18 "Men's Tennis", modify
      label def sport1 20 "Men's Track Indoor", modify
      label def sport1 22 "Men's Track Outdoor", modify
      label def sport1 25 "Men's Wrestling", modify
      label values gender gender
      label def gender 0 "Male", modify
      label values season season1
      label def season1 1 "Postseason", modify
      label def season1 2 "Preseason", modify
      label def season1 3 "Regular season", modify
      label values event_type event_type1
      label def event_type1 1 "Competition (Game)", modify
      label def event_type1 2 "Scheduled team practice", modify


      Comment


      • #4
        OK. So because of the way multiple injuries in one event is handled, it's a little more complicated. Your example data doesn't illustrate this problem, so this code, although designed to handle it, has not been rigorously tested for it. The trick is to go in two steps. First you total up the injuries per event and reduce to one observation per event. Then you collapse by your groupings. The code here illustrates calculating rates by gender and sport. The modification to other groupings should be fairly obvious:

        Code:
        //    ADD UP NUMBER OF INJURIES PER EVENT
        by record_uid_exp, sort: egen n_injuries = count(injury_code)
        
        //    VERIFY NUMBER OF ATHLETES AT RISK IS CONSTANT PER EVENT
        by record_uid_exp (num_athletes), sort: assert num_athletes[_N] == num_athletes[1]
        
        //    REDUCE TO 1 OBSERVATION PER EVENT
        by record_uid_exp: keep if _n == 1
        
        //    NOW COLLAPSE TO CALCULATE INJURY RATE NUMERATOR AND DENOMINATOR
        //    BY GROUPING (E.G. GENDER & SPORT)
        collapse (sum) n_injuries num_athletes, by(gender sport)
        
        //    CALCULATE RATE AND CONFIDENCE INTERVAL
        capture program drop one_ci
        program define one_ci
            version 15: cii proportion `=num_athletes[1]' `=n_injuries[1]'
            gen rate = 100*`r(proportion)'
            gen lb = 100*`r(lb)'
            gen ub = 100*`r(ub)'
            exit
        end
        
        runby one_ci, by(gender sport) status
        Stata does not have a command that calculates confidence intervals per observation, so to get around that we write a little program to do it and then iterate that over all observations using -runby-. -runby- is written by Robert Picard and me, and is available from SSC. Note that the logic of the above code requires that the -by()- grouping in the -collapse- command and in the -runby- command have to be the same.

        As for wanting to group by injury_code, apart from what -collapse- allows (you could fake it by just cloning injury_code and using the clone in one place and the original in the other and -collapse- wouldn't be any the wiser), it doesn't make sense to do that in this way because the numerators calculated in this code are, regardless, sums of all types of injuries, so the results would be wrong. To do that requires a different logic. You would instead have to loop over the different injury codes and calculate separate numerators for each one. Then after the -collapse- you can -reshape long- to get separate observations per injury code per group, and then calculate ci's from there:

        Code:
        //    ADD UP NUMBER OF INJURIES PER EVENT, SEPARATELY BY TYPE
        levelsof injury_code, local(codes)
        foreach c of local codes {
            by record_uid_exp, sort: egen n_injuries_`c' = total(injury_code == `c')
        }
        
        //    VERIFY NUMBER OF ATHLETES AT RISK IS CONSTANT PER EVENT
        by record_uid_exp (num_athletes), sort: assert num_athletes[_N] == num_athletes[1]
        
        //    REDUCE TO 1 OBSERVATION PER EVENT
        by record_uid_exp: keep if _n == 1
        
        //    NOW COLLAPSE TO CALCULATE INJURY RATE NUMERATOR AND DENOMINATOR
        //    BY GROUPING (E.G. GENDER & SPORT)
        collapse (sum) n_injuries_* num_athletes, by(gender sport)
        
        //    GO LONG BY CODE
        reshape long n_injuries_, i(gender sport) j(injury_code)
        label values injury_code injury_code
        
        //    PROGRAM ONE_CI IS THE SAME AS BEFORE, SO
        //    NO NEED TO REDEFINE IT HERE IF ALL OF THIS IS
        //    IN A SINGLE DO-FILE.  IF NOT, COPY THE DEFINITION
        //    SHOWN ABOVE TO HERE.
        
        runby one_ci, by(gender sport injury_code) status
        Note: Evidently the first block of code (which calculates overall injury rates) destroys the original data, so to use the second block of code you have to re-load the original data.

        Comment


        • #5
          Thanks for such a detailed response. I am sure that with this to point me in the right direction I'll be able to adapt it to my needs.

          best regards,
          Andrew Ross

          Comment

          Working...
          X