Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapse monthly totals into yearly totals in a meaningful way when some of the months have missing data

    Hi folks,

    I'm having trouble figuring out a way to collapse monthly totals into yearly totals in a meaningful way when some of the months have missing data. Ultimately, I want to calculate the ratio of newborn deaths per 1,000 deliveries. Deaths are relatively rare and there is little point in analyzing month by month. Another major issue is that data are not missing at random. They were missing far more frequently in the first two years of a five-year study as you can see from the below extract of the dataset.

    In the dataset below, I have variables, year, month, total # of deliveries/month and total # of newborn deaths within 24 hrs/month. What makes sense to me is to sum alldel over each year but only for those months where nd24hrs~=. So for the first year, would get 51+34+40+39+45 = 209 deliveries for the year and 1 newborn death for a rate of 4.8 deaths/1,000 del/yr.

    The only way I can see going forward is to create a copy of alldel (e.g., alldeltmp) and doing something like: bys fullyr:replace alldeltmp==. if nd24hrs==. I think I could then use the gen sum() function to get running sums of both alldeltmp & nd24hrs so that I can calculate the ratio. I think that will work but it seems like a kludge and I will have to do this for about 15 variables.

    Is there an easier or more elegant way to deal with these missing values?

    Thanks for any pointers and sorry for the long post.
    John
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float fullyr int(month alldel nd24hrs)
    1  7  45 .
    1  8  34 .
    1  9  46 .
    1 10  45 .
    1 11  23 .
    1 12  19 .
    1  1  51 0
    1  2  14 .
    1  3  34 0
    1  4  40 0
    1  5  39 1
    1  6  45 0
    2  7  35 0
    2  8  60 .
    2  9  55 0
    2 10  68 0
    2 11  58 0
    2 12  60 0
    2  1  67 0
    2  2  68 0
    2  3  70 0
    2  4  66 .
    2  5  53 .
    2  6  60 .
    3  7  75 0
    3  8  79 0
    3  9  88 0
    3 10  63 0
    3 11  97 .
    3 12  84 0
    3  5  96 0
    3  6 112 0
    4  7 100 1
    4  8 125 3
    4  9  85 0
    4 10 102 0
    4 11 104 1
    4 12 135 0
    4  1 125 3
    4  3 128 0
    4  4 144 1
    4  5 135 0
    4  6 151 1
    5  7 149 0
    5  8 175 0
    5  9 168 2
    5 10 196 2
    5 11 175 1
    5 12 171 1
    4  2 137 0
    end
    label values fullyr fullyr
    label def fullyr 1 "2014-2015", modify
    label def fullyr 2 "2015-2016", modify
    label def fullyr 3 "2016-2017", modify
    label def fullyr 4 "2017-2018", modify
    label def fullyr 5 "2018-2019", modify
    label values month month
    label def month 1 "January", modify
    label def month 2 "February", modify
    label def month 3 "March", modify
    label def month 4 "April", modify
    label def month 5 "May", modify
    label def month 6 "June", modify
    label def month 7 "July", modify
    label def month 8 "August", modify
    label def month 9 "September", modify
    label def month 10 "October", modify
    label def month 11 "November", modify
    label def month 12 "December", modify

  • #2
    A simpler way to accomplish the same thing is:

    Code:
    assert !missing(alldel)
    drop if missing(nd24hrs)
    collapse (sum) alldel nd24hurs, by(fullyr)
    This code requires that variable alldel never be missing. It totals up the deliveries and deaths within 24 hours for those months where the latter is non-missing.

    Comment


    • #3
      Thanks so much for the help Clyde; that's an elegant use of the drop command and a good reminder to use 'assert'. I ultimately used "collapse" but first I had to generate 7 different denominators (the sum of alldel for those months that weren't missing for that particular variable). Perhaps I could have collapsed each in turn and then merged the alldel that was created in the collapse back into the full dataset until all 7 were created and then did a final collapse but I'm not sure that would have worked. I ended up calculating all 7 variable-specific alldel denominators and then doing a single collapse. Bit of a brute force approach but it worked!

      Thanks again; I really appreciate how you and others on Statalist are so willing to help.

      Comment

      Working...
      X