Hi folks,
I'm having trouble figuring out a way to collapse monthly totals into yearly totals in a meaningful way when some of the months have missing data. Ultimately, I want to calculate the ratio of newborn deaths per 1,000 deliveries. Deaths are relatively rare and there is little point in analyzing month by month. Another major issue is that data are not missing at random. They were missing far more frequently in the first two years of a five-year study as you can see from the below extract of the dataset.
In the dataset below, I have variables, year, month, total # of deliveries/month and total # of newborn deaths within 24 hrs/month. What makes sense to me is to sum alldel over each year but only for those months where nd24hrs~=. So for the first year, would get 51+34+40+39+45 = 209 deliveries for the year and 1 newborn death for a rate of 4.8 deaths/1,000 del/yr.
The only way I can see going forward is to create a copy of alldel (e.g., alldeltmp) and doing something like: bys fullyr:replace alldeltmp==. if nd24hrs==. I think I could then use the gen sum() function to get running sums of both alldeltmp & nd24hrs so that I can calculate the ratio. I think that will work but it seems like a kludge and I will have to do this for about 15 variables.
Is there an easier or more elegant way to deal with these missing values?
Thanks for any pointers and sorry for the long post.
John
I'm having trouble figuring out a way to collapse monthly totals into yearly totals in a meaningful way when some of the months have missing data. Ultimately, I want to calculate the ratio of newborn deaths per 1,000 deliveries. Deaths are relatively rare and there is little point in analyzing month by month. Another major issue is that data are not missing at random. They were missing far more frequently in the first two years of a five-year study as you can see from the below extract of the dataset.
In the dataset below, I have variables, year, month, total # of deliveries/month and total # of newborn deaths within 24 hrs/month. What makes sense to me is to sum alldel over each year but only for those months where nd24hrs~=. So for the first year, would get 51+34+40+39+45 = 209 deliveries for the year and 1 newborn death for a rate of 4.8 deaths/1,000 del/yr.
The only way I can see going forward is to create a copy of alldel (e.g., alldeltmp) and doing something like: bys fullyr:replace alldeltmp==. if nd24hrs==. I think I could then use the gen sum() function to get running sums of both alldeltmp & nd24hrs so that I can calculate the ratio. I think that will work but it seems like a kludge and I will have to do this for about 15 variables.
Is there an easier or more elegant way to deal with these missing values?
Thanks for any pointers and sorry for the long post.
John
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float fullyr int(month alldel nd24hrs) 1 7 45 . 1 8 34 . 1 9 46 . 1 10 45 . 1 11 23 . 1 12 19 . 1 1 51 0 1 2 14 . 1 3 34 0 1 4 40 0 1 5 39 1 1 6 45 0 2 7 35 0 2 8 60 . 2 9 55 0 2 10 68 0 2 11 58 0 2 12 60 0 2 1 67 0 2 2 68 0 2 3 70 0 2 4 66 . 2 5 53 . 2 6 60 . 3 7 75 0 3 8 79 0 3 9 88 0 3 10 63 0 3 11 97 . 3 12 84 0 3 5 96 0 3 6 112 0 4 7 100 1 4 8 125 3 4 9 85 0 4 10 102 0 4 11 104 1 4 12 135 0 4 1 125 3 4 3 128 0 4 4 144 1 4 5 135 0 4 6 151 1 5 7 149 0 5 8 175 0 5 9 168 2 5 10 196 2 5 11 175 1 5 12 171 1 4 2 137 0 end label values fullyr fullyr label def fullyr 1 "2014-2015", modify label def fullyr 2 "2015-2016", modify label def fullyr 3 "2016-2017", modify label def fullyr 4 "2017-2018", modify label def fullyr 5 "2018-2019", modify label values month month label def month 1 "January", modify label def month 2 "February", modify label def month 3 "March", modify label def month 4 "April", modify label def month 5 "May", modify label def month 6 "June", modify label def month 7 "July", modify label def month 8 "August", modify label def month 9 "September", modify label def month 10 "October", modify label def month 11 "November", modify label def month 12 "December", modify
Comment