Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Incidence rate over calendar months

    I have a data which include people diagnosed with cancer from 1st January 2021 to 31st December 2024. I want to calculate the incidence rate over calendar months during the follow up period in person-years. This is sample of my data:

    id cancer diagnosis_date enter_date end_date follow_years
    1 0 07/11/2020 07/11/2020 07/11/2020 0.80
    2 0 07/01/2010 07/01/2010 07/01/2010 0.60
    3 1 07/03/1999 07/03/1999 07/03/1999 0.70
    4 1 07/05/1988 07/05/1988 07/05/1988 0.60
    5 0 07/07/1977 07/07/1977 07/07/1977 0.55
    6 0 07/09/1966 07/09/1966 07/09/1966 0.50
    7 1 07/11/1955 07/11/1955 07/11/1955 0.45
    8 1 07/01/1945 07/01/1945 07/01/1945 0.40
    9 1 07/032021 07/03/1934 07/03/1934 0.35
    10 0 07/032021 07/05/1923 07/05/1923 0.30

    Data is one row per patient.

    id: is patient ID
    cancer: 1 if diagnosed with cancer, 0 if not.
    diagnosis_date: is date of diagnosis of cancer
    enter_date: when they enter the study
    end_date: when they leave the study
    follow_years: is follow up period in years., calculated by subtracting enter_date from end_date and dividing by 365.


    I then used the following codes to calculate incidence rates over calendar month in person-years. However, i am not getting the right results. I am getting one row of results with calendar month as 1960m1 and incidence rates as 23657. I will be very grateful for help on this.


    *Declaring the survival data

    stset follow, failure(cancer==1) id(id)

    *Generating monthly cut points (January 2021 to Dec 2024)

    local month_start = ym(2021,01)
    local month_end = ym(2024,12)

    local cutpoints
    forvalues m = `month_start'/`month_end' {
    local cutpoints = `cutpoints' `=dofm(`m')'
    }

    *Splitting survival time by calendar month

    stsplit calmonth, at(`cutpoints')
    gen cal_month = mofd(_t0)
    format cal_month %tm

    *Calculating incidence rate per 100,000 person-years
    strate cal_month, per(100000)

    Thank you very much.

  • #2
    Please repost your example data: there is something wrong with it. The example data you show ahas the diagnosis_date, enter_date and end_date all equal to each other in every observation except the last two. And in those two, the diagnosis date is simply invalid.

    When posting your example data, please use the -dataex- command. If you are running version 16 or later, or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thanks very much.

      Please find attached updated information.


      I have a data which include people diagnosed with cancer from 1st January 2021 to 31st December 2024. I want to calculate the incidence rate over calendar months during the follow up period in person-years. This is sample of my data:

      copy starting from the next line ------------ ----------
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int id float(cancer diagnosis_date enter_date end_date follow)
      1 1 22803 22803 23064    .714579
      2 0 22866 22866 23158   .7994524
      3 0 23287 23287 23741  1.2429843
      4 1 23098 23098 23507   1.119781
      5 0 23120 23120 23539  1.1471595
      6 0 22584 22584 22736   .4161533
      7 1 22323 22323 22344  .05749487
      618 0 23287 23287 23741  1.2429843
      8 1 22966 22966 23309   .9390828
      9 1 22583 22583 22734   .4134155
      10 0 22287 22287 22290 .008213553
      11 1 22300 22300 22310 .027378507
      12 1 22895 22895 23202   .8405202
      13 0 23181 23181 23631   1.232033
      14 1 23588 23588 23741   .4188912
      15 1 22881 22881 23181   .8213552
      16 1 22662 22662 22852  .52019167
      17 0 22320 22320 22340  .05475701
      18 0 22439 22439 22518   .2162902
      19 1 22816 22816 23084    .733744
      20 1 22287 22287 22290 .008213553
      21 1 23654 23654 23741    .238193
      22 0 23290 23290 23741  1.2347707
      23 0 23622 23622 23741   .3258042
      24 0 22894 22894 23200   .8377823
      25 0 23455 23455 23741   .7830253
      26 1 23155 23155 23592  1.1964408
      27 0 22286 22286 22289 .008213553
      28 1 22994 22994 23350   .9746749
      30 1 23569 23569 23741   .4709103
      31 1 23645 23645 23741  .26283368
      32 0 23720 23720 23741  .05749487
      33 1 22911 22911 23226    .862423
      34 1 22468 22468 22561  .25462013
      35 0 23551 23551 23741  .52019167
      36 1 23407 23407 23741   .9144422
      37 1 23303 23303 23741  1.1991787
      38 0 23222 23222 23692    1.28679
      39 1 23209 23209 23673  1.2703627
      40 1 23738 23738 23741 .008213553
      41 1 22897 22897 23205    .843258
      42 1 22458 22458 22546  .24093087
      43 1 22450 22450 22535   .2327173
      44 0 22901 22901 23211   .8487337
      45 1 22532 22532 22658   .3449692
      46 0 22431 22431 22506   .2053388
      47 1 22630 22630 22805   .4791239
      48 0 23706 23706 23741  .09582478
      49 1 22789 22789 23043   .6954141
      50 0 23233 23233 23709   1.303217
      51 1 23055 23055 23442  1.0595483
      52 1 22807 22807 23070   .7200547
      53 1 23563 23563 23741   .4873374
      54 0 22400 22400 22459   .1615332
      55 0 23004 23004 23366    .991102
      56 1 22781 22781 23031   .6844627
      57 1 22971 22971 23316   .9445585
      58 1 23297 23297 23741  1.2156057
      59 1 22389 22389 22443  .14784394
      60 0 22876 22876 23173   .8131417
      61 0 22568 22568 22712   .3942505
      62 1 23248 23248 23731   1.322382
      63 1 23562 23562 23741   .4900753
      64 1 22565 22565 22707   .3887748
      65 0 22573 22573 22719   .3997262
      66 0 22997 22997 23355   .9801506
      67 1 23652 23652 23741   .2436687
      68 1 22881 22881 23181   .8213552
      69 0 23450 23450 23741   .7967146
      70 0 22531 22531 22656   .3422314
      71 1 22962 22962 23302   .9308693
      72 0 23651 23651 23741  .24640657
      73 1 22869 22869 23163   .8049281
      74 1 23718 23718 23741  .06297057
      75 1 22486 22486 22589  .28199863
      76 0 22923 22923 23244   .8788501
      77 1 23588 23588 23741   .4188912
      78 1 23268 23268 23741  1.2950034
      80 0 23692 23692 23741  .13415469
      81 1 22882 22882 23183   .8240931
      83 1 22606 22606 22769   .4462697
      84 1 23344 23344 23741  1.0869268
      85 1 23015 23015 23382  1.0047913
      86 1 22353 22353 22389  .09856263
      87 1 22819 22819 23088   .7364818
      88 1 22807 22807 23070   .7200547
      89 1 23359 23359 23741   1.045859
      90 1 23630 23630 23741   .3039014
      91 1 22295 22295 22302 .019164955
      92 0 22504 22504 22616   .3066393
      94 1 23572 23572 23741   .4626968
      95 0 22466 22466 22559  .25462013
      96 0 22633 22633 22809   .4818617
      97 1 22927 22927 23250   .8843258
      98 0 23542 23542 23741   .5448323
      99 0 23508 23508 23741   .6379192
      100 1 22662 22662 22853   .5229295
      101 1 23586 23586 23741   .4243669
      102 1 23532 23532 23741   .5722108
      103 1 22419 22419 22488   .1889117
      end
      format %td diagnosis_date
      format %td enter_date
      format %td end_date
      copy up to and including the previous line ------ -----------


      Data is one row per patient.

      id: is patient ID
      cancer: 1 if diagnosed with cancer, 0 if not.
      diagnosis_date: is date of diagnosis of cancer
      enter_date: when they enter the study
      end_date: when they leave the study
      follow_years: is follow up period in years., calculated by subtracting enter_date from end_date and dividing by 365.


      I then used the following codes to calculate incidence rates over calendar month in person-years. However, i am not getting the right results. I am getting one row of results with calendar month as 1960m1 and incidence rates as 23657. I will be very grateful for help on this.


      *Declaring the survival data

      stset follow, failure(cancer==1) id(id)

      *Generating monthly cut points (January 2021 to Dec 2024)

      local month_start = ym(2021,01)
      local month_end = ym(2024,12)

      local cutpoints
      forvalues m = `month_start'/`month_end' {
      local cutpoints = `cutpoints' `=dofm(`m')'
      }

      *Splitting survival time by calendar month

      stsplit calmonth, at(`cutpoints')
      gen cal_month = mofd(_t0)
      format cal_month %tm

      *Calculating incidence rate per 100,000 person-years
      strate cal_month, per(100000)

      Thank you very much.

      Comment


      • #4
        There are several problems here. The most fundamental one is that you have -stset- your data in terms of the variable follow, which is denominated in years. But your -stsplit- cutoffs are dates at the beginning of each month, and these are numbers like 23700. So none of these cutoffs fall within the range of the follow-variable because the follow variable is of order of magnitude. The cutoffs have to be the same kind of time variables as the failure time variable.

        Now, that is a bit of a tall order here, because your participants start at different times, and so there is no uniform set of cutoffs that you can come up with on the dimension of the follow variable. You have to revise your -stset- command to use the actual dates of origin and failure. But it is not possible to do that from the data you have provided. In all of your observations we have diagnosis_date == start_date, and start_date < end_date. But diagnosis is the failure event! So your data do not provide any possible origin date (which, must precede the failure date, because, by definition, it is the date at which the participant first becomes at risk for the failure event.)

        Your start and end date variables are said to be when the person enters and leaves the study. But if the study design is such that the person enters the study upon diagnosis, then it is literally impossible to calculate an incidence rate for that diagnosis with that study design. To get an incidence rate, you must start with people not having the failure event, and then some of the people develop it during the study observation period. But you don't have that, or at least it is not in the data you are showing.

        Comment


        • #5
          Dear Clyde, Thanks very much. I generated the example data myself as I am working in a secure environment and could not create sample of the data. However, the actual is as you have stated. Many thanks and kind regards.

          Comment

          Working...
          X