Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • counting months

    I have daily data (one row per day (ignoring missing days))
    and want to count months before and after a given "index"
    date (which can be any day between 10/1/2012 and 6/30/2013)

    each person has one year of data before the index date and
    one year after the index date (actually, one year minus 1
    day as the index date counts as part of the second year)

    so, each person has data for a total of 24 months but these
    are not, of course, calendar months as the period can start
    on any day in a month

    I want to group the data into months (counting either from
    -12 to +12 (with the index date part of +1) or from 1 to
    24 (with the index date part of month 13))

    Ignoring missing days, each person has either 730 days
    (rows of data) or 731 (if 2/29/2012 is part of their
    first year)

    For example, if a person's index date is 15oct2012, then
    14oct2012 would be month -1 and 14sep2012 would be part of
    month -2, while 14nov2012 would be part of month 1 and
    15nov2012 would be part of month 2 (etc.)

    the date variable for days is a Stata date with a format
    of %td


  • #2
    There are 30.416667 days in a month (365/12) so some months will have 30 days and some months 31 days. If you are not working with calendar months, I guess you are free to decide how to perform the grouping. Here's one way

    Code:
    clear
    set seed 912815
    
    * generate 100 individuals with an index date between 10/1/2012 and 6/30/2013
    set obs 100
    gen id = _n
    gen indexd = mdy(10,1,2012) + int( (mdy(6,30,2013) - mdy(10,1,2012) + 1) * runiform())
    format %td indexd
    
    * create daily data for each individual over 2 years centered on index date
    expand 730
    bysort id: gen obsdate = indexd - 365 + _n - 1
    format %td obsdate
    
    * the observation day and year, relative to index date
    isid id obsdate, sort
    by id: gen obsday = obsdate - indexd + 366
    by id: gen obsyear = int(obsday / 366) + 1
    
    * split into months
    bysort id: gen obsmonth = int((obsday - 1) / (365 / 12)) + 1
    
    * double-check
    bysort id obsmonth (obsdate): assert _n == 1 if obsmonth == 13 & obsdate == indexd
    assert inrange(obsmonth,1,24)
    tab obsmonth

    Comment


    • #3
      Robert - thank you; I'll try this with the real data and report back

      Comment


      • #4
        Well, it didn't quite work, apparently because of the leap year in 2012:
        Code:
        . ta obsday if obsmonth==25
        
             obsday |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                731 |     16,852      100.00      100.00
        ------------+-----------------------------------
              Total |     16,852      100.00
        r; t=0.65 16:13:44
        
        . count if obsday==731
          16,852
        if this turns out to be the only problem it will be easy to fix, but I'ld better do some more checking

        thanks again

        Comment


        • #5
          It's a bit tricky to get this to work with leap days. I've reworked the example to take that into account. I assume that you can have missing values so everything is calculated as offsets to the index date. Just to be sure, I redo everything and skip a random number of observations and double-check that the month and day do not change.

          Code:
          clear
          set seed 912815
          
          * generate 100 individuals with an index date between 10/1/2012 and 6/30/2013
          set obs 100
          gen id = _n
          gen indexd = mdy(10,1,2012) + int( (mdy(6,30,2013) - mdy(10,1,2012) + 1) * runiform())
          format %td indexd
          
          * start and last date for window
          gen wstart = mdy(month(indexd),day(indexd),year(indexd)-1)
          gen wend = mdy(month(indexd),day(indexd),year(indexd)+1)
          format %td wstart wend
          gen wlen = wend - wstart
          gen daysinyr1 =  indexd - wstart
          gen daysinyr2 = wend - indexd
          
          * create daily data
          expand wlen
          bysort id: gen obsdate = wstart + _n - 1
          format %td obsdate
          
          * split into months
          gen obsmonth = -int((indexd - obsdate - 1) / (daysinyr1 / 12)) - 1
          replace obsmonth = int((obsdate - indexd) / (daysinyr2 / 12)) + 1 if obsdate >= indexd
          
          * the day in the year
          gen obsday = obsdate - wstart + 1
          replace obsday = obsdate - indexd + 1 if obsdate >= indexd
          
          tab obsmonth
          
          tempfile f
          save "`f'"
          
          clear
          set seed 912815
          
          * generate 100 individuals with an index date between 10/1/2012 and 6/30/2013
          set obs 100
          gen id = _n
          gen indexd = mdy(10,1,2012) + int( (mdy(6,30,2013) - mdy(10,1,2012) + 1) * runiform())
          format %td indexd
          
          * start and last date for window
          gen wstart = mdy(month(indexd),day(indexd),year(indexd)-1)
          gen wend = mdy(month(indexd),day(indexd),year(indexd)+1)
          format %td wstart wend
          gen wlen = wend - wstart
          gen daysinyr1 =  indexd - wstart
          gen daysinyr2 = wend - indexd
          
          * create daily data
          expand wlen
          bysort id: gen obsdate = wstart + _n - 1
          format %td obsdate
          
          * drop some random obs
          drop if runiform() < .1
          
          * split into months
          gen obsmonth = -int((indexd - obsdate - 1) / (daysinyr1 / 12)) - 1
          replace obsmonth = int((obsdate - indexd) / (daysinyr2 / 12)) + 1 if obsdate >= indexd
          
          * the day in the year
          gen obsday = obsdate - wstart + 1
          replace obsday = obsdate - indexd + 1 if obsdate >= indexd
          
          
          rename (obsmonth obsday) =0
          merge 1:1 id indexd obsdate using "`f'", keep(match) nogen
          
          assert obsmonth0 == obsmonth
          assert obsday0 == obsday
          
          tab obsmonth

          Comment


          • #6
            Robert - thank you very much

            Comment

            Working...
            X