Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating Monthly and Yearly Rates from Individual level daily data.

    Hi Everyone,

    I am stuck at a problem and seek your expertise.

    I have an individual level data of pregnant women giving birth at several different districts located in different states of India on a particular date.

    At the end of her hospital stay, every woman was asked if she received quality postnatal care (Yes==1 & No==0). I want to create two variables from this information
    1. Variable 1 (monthlyrate): Percentage of woman giving birth in a particular month in each district who received quality care.
    2. Variable 1 (yearlyrate): Percentage of woman giving birth in a particular year in each district who received quality care.
    I am not sure if I should use ‘FOREACH’ loop or the ‘egen’ command


    Below is the example of my dataset.

    dataex caseid date month year district45 float(state quality)

    ----------------------- copy starting from the next line -----------------------
    [CODE]
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str15 caseid long date byte month int year long district45 float quality state
    " 19240150 02" 18294 3 2014 172 0 3
    " 19201106 02" 18294 6 2016 525 1 6
    " 02097658 04" 18294 2 2010 505 0 7
    " 19006936 04" 18294 5 2016 590 1 2
    " 20046984 02" 18294 2 2010 166 1 4
    " 19192875 02" 18294 5 2014 254 1 4
    " 19091611 06" 18294 2 2010 246 1 1
    " 19095627 02" 18294 2 2010 587 0 12
    " 19104880 04" 18294 2 2010 155 1 15
    " 04019661 02" 18294 8 2018 168 1 2
    " 35005994 02" 18295 2 2010 336 0 3
    " 19124124 06" 18295 2 2010 425 0 6
    " 19213604 04" 18295 8 2014 26 1 9
    " 35003652 02" 18295 2 2010 432 1 8
    " 19097162 04" 18295 2 2010 66 0 3
    " 19226824 02" 18295 2 2010 587 0 6


    Regards
    Dr Pavan

  • #2
    I'm confused by your example data. There are many instances where the variables month and year disagree with the date variable. For your "monthly" and "yearly" variables, do you want to group by the month and year variables, respectively, or by the month and year of the date variable?

    Code:
    . list caseid date month year if month != month(date) | year != year(date)
    
         +-----------------------------------------+
         |       caseid        date   month   year |
         |-----------------------------------------|
      1. |  19240150 02   01feb2010       3   2014 |
      2. |  19201106 02   01feb2010       6   2016 |
      4. |  19006936 04   01feb2010       5   2016 |
      6. |  19192875 02   01feb2010       5   2014 |
     10. |  04019661 02   01feb2010       8   2018 |
         |-----------------------------------------|
     13. |  19213604 04   02feb2010       8   2014 |
         +-----------------------------------------+
    Added: Also, how do you identify a "district" in this data. I notice that in addition to a variable district45, there is also a variable called state, which might be a geographic unit. But I don't see the consistency that I would expect. That is, if districts are contained within states, then district 587 should not co-occur with both state 6 and state 12. And if it's the other way around and states are contained within districts, we should not see nearly every state associated with more than one district.

    So I have two hypotheses. One of them is that state has nothing to do with district and I should just ignore state and use district45. The other is that each state has its own district numbering system, which entails that the combination of district45 and state is needed to jointly identify distinct districts. Which one is it?
    Last edited by Clyde Schechter; 06 Oct 2023, 13:17.

    Comment


    • #3
      Hello Dr Clyde,

      I had mistakenly pasted the wrong dataset in my previous post.

      Below is an example from my dataset.


      I want to create two variables from this information. I want to calculate rates at the district level (variable district 45).
      1. Variable 1 (monthlyrate): Percentage of women giving birth in a particular month (e.g. Jan 2011, March 2015, October 2018, Feb 2020, and so on...) in each district who received quality care.
      2. Variable 1 (yearlyrate): Percentage of women giving birth in a particular year (e.g., 2010, 2012, 2013, 2014, and so on ) in each district who received quality care.


      . dataex caseid date month year district45 state quality

      ----------------------- copy starting from the next line -----------------------
      [CODE]
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str15 caseid long date byte month int year long district45 float(state quality)
      " 19240150 02" 18294 2 2010 172 8 0
      " 19201106 02" 18294 2 2010 525 8 0
      " 02097658 04" 18294 2 2010 505 5 0
      " 19006936 04" 18294 2 2010 590 8 0
      " 20046984 02" 18294 2 2010 166 4 1
      " 19192875 02" 18294 2 2010 254 8 0
      " 19091611 06" 18294 2 2010 246 8 1
      " 19095627 02" 18294 2 2010 587 8 0
      " 19104880 04" 18294 2 2010 155 8 1
      " 04019661 02" 18294 2 2010 168 3 0
      " 35005994 02" 18295 2 2010 336 6 0
      " 19124124 06" 18295 2 2010 425 8 0
      " 19213604 04" 18295 2 2010 26 8 0
      " 35003652 02" 18295 2 2010 432 6 1
      " 19097162 04" 18295 2 2010 66 8 0
      " 19226824 02" 18295 2 2010 587 8 0
      " 19095878 04" 18295 2 2010 425 8 0
      " 19122894 02" 18295 2 2010 172 8 0
      " 19116733 02" 18296 2 2010 237 8 0
      " 19014508 02" 18296 2 2010 237 8 0
      " 20064555 06" 18296 2 2010 449 4 0
      " 19153509 03" 18296 2 2010 408 8 0
      " 19133189 02" 18296 2 2010 26 8 0
      " 35081695 02" 18296 2 2010 518 6 0
      " 02042822 02" 18296 2 2010 693 5 0
      " 35064062 07" 18296 2 2010 55 6 0
      " 20010370 02" 18296 2 2010 263 4 0
      " 19186410 02" 18297 2 2010 231 8 0
      " 20007563 02" 18297 2 2010 450 4 0
      " 19101320 02" 18297 2 2010 123 8 0
      " 35016656 02" 18297 2 2010 403 6 0
      " 19083610 02" 18297 2 2010 590 8 1
      " 02065345 02" 18297 2 2010 22 5 0
      " 35034109 02" 18297 2 2010 336 6 0
      " 20074239 02" 18297 2 2010 511 4 1
      " 35050882 02" 18297 2 2010 60 6 1
      " 35039368 02" 18297 2 2010 514 6 1
      " 19083393 02" 18297 2 2010 571 8 0
      " 35009087 02" 18297 2 2010 235 6 0
      " 35079614 02" 18298 2 2010 490 6 1
      " 19066564 02" 18298 2 2010 231 8 0
      " 20016648 04" 18298 2 2010 88 4 0
      " 19085867 02" 18298 2 2010 601 8 0
      " 19056985 03" 18298 2 2010 277 8 1
      " 19158378 02" 18298 2 2010 425 8 1
      " 19029081 02" 18299 2 2010 326 8 0
      " 20009671 02" 18299 2 2010 116 4 0
      " 19033764 02" 18299 2 2010 590 8 0
      " 19086149 02" 18299 2 2010 523 8 1
      " 20023723 02" 18299 2 2010 454 4 0
      " 19057717 02" 18299 2 2010 408 8 0
      " 19068829 04" 18299 2 2010 551 8 0
      " 19094803 03" 18300 2 2010 319 8 0
      " 19097422 02" 18300 2 2010 159 8 1
      " 20079479 02" 18300 2 2010 202 4 0
      " 19160917 02" 18300 2 2010 587 8 0
      " 19120758 02" 18300 2 2010 41 8 0
      " 35037421 02" 18300 2 2010 432 6 0
      " 04031847 02" 18300 2 2010 232 3 1
      " 19080103 04" 18301 2 2010 26 8 0
      " 19203080 02" 18301 2 2010 326 8 0
      " 02007104 02" 18301 2 2010 228 5 1
      " 19102395 02" 18301 2 2010 523 8 1
      " 04041167 02" 18301 2 2010 163 3 1
      " 20092590 01" 18301 2 2010 202 4 1
      " 19050231 03" 18301 2 2010 123 8 0
      " 02075779 02" 18301 2 2010 182 5 0
      " 02095739 02" 18301 2 2010 712 5 1
      " 35031797 02" 18302 2 2010 94 6 1
      " 19194635 02" 18302 2 2010 147 8 1
      " 19090211 02" 18302 2 2010 227 8 0
      " 19059499 02" 18302 2 2010 601 8 0
      " 19020684 02" 18302 2 2010 26 8 0
      " 19151640 02" 18302 2 2010 319 8 0
      " 19159709 02" 18302 2 2010 601 8 0
      " 35012128 02" 18302 2 2010 60 6 1
      " 19240395 04" 18302 2 2010 231 8 0
      " 19106796 02" 18302 2 2010 26 8 0
      " 19165037 02" 18302 2 2010 26 8 0
      " 02007556 03" 18302 2 2010 693 5 0
      " 19081256 02" 18302 2 2010 246 8 1
      " 19207687 02" 18302 2 2010 227 8 1
      " 19223454 08" 18303 2 2010 147 8 0
      " 20066079 02" 18303 2 2010 88 4 0
      " 19122266 02" 18303 2 2010 11 8 0
      " 04028892 04" 18303 2 2010 657 3 0
      " 19251098 02" 18303 2 2010 408 8 1
      " 35079147 03" 18304 2 2010 490 6 0
      " 19090416 02" 18304 2 2010 673 8 0
      " 35065679 02" 18304 2 2010 403 6 0
      " 19165525 10" 18304 2 2010 453 8 0
      " 19165810 04" 18304 2 2010 590 8 1
      " 19129768 02" 18305 2 2010 325 8 0
      " 19237222 02" 18305 2 2010 485 8 0
      " 19077418 02" 18305 2 2010 231 8 0
      " 19001127 02" 18305 2 2010 172 8 0
      " 20103854 02" 18305 2 2010 511 4 0
      " 19083154 02" 18305 2 2010 580 8 0
      " 19087137 02" 18305 2 2010 523 8 0
      " 04087418 02" 18305 2 2010 445 3 0



      Comment


      • #4
        Code:
        assert month == month(date)
        assert year == year(date)
        
        //    CALCULATE RATES
        by district45 month year, sort: egen monthly_quality_rate = mean(quality)
        by district45 year, sort: egen yearly_quality_rate = mean(quality)
        
        //    CONVERT TO PERCENTAGE
        foreach v of varlist *_quality_rate {
            replace `v' = 100*`v'
        }
        Note: there is a bit of ambiguity in the definition of the monthly rate. If you are trying to capture seasonality, so that, for example, June 2010 and June 2011 results should be lumped together as "June," then the code would be different: remove year from the first -egen- command. This code assumes that for your purposes June 2010 and June 2011 are different months.

        Comment


        • #5
          Dr Clyde,


          Thank you very much. You saved me hours of hit-and-try coding. Once again thank you very much.

          बहुत धन्यवाद्
          नमस्ते

          Comment

          Working...
          X