Incidence rate over calendar months

Naa Naadu

Join Date: Aug 2021
Posts: 21

Incidence rate over calendar months

11 Aug 2025, 10:47

I have a data which include people diagnosed with cancer from 1st January 2021 to 31st December 2024. I want to calculate the incidence rate over calendar months during the follow up period in person-years. This is sample of my data:

id	cancer	diagnosis_date	enter_date	end_date	follow_years
1	0	07/11/2020	07/11/2020	07/11/2020	0.80
2	0	07/01/2010	07/01/2010	07/01/2010	0.60
3	1	07/03/1999	07/03/1999	07/03/1999	0.70
4	1	07/05/1988	07/05/1988	07/05/1988	0.60
5	0	07/07/1977	07/07/1977	07/07/1977	0.55
6	0	07/09/1966	07/09/1966	07/09/1966	0.50
7	1	07/11/1955	07/11/1955	07/11/1955	0.45
8	1	07/01/1945	07/01/1945	07/01/1945	0.40
9	1	07/032021	07/03/1934	07/03/1934	0.35
10	0	07/032021	07/05/1923	07/05/1923	0.30

Data is one row per patient.

id: is patient ID
cancer: 1 if diagnosed with cancer, 0 if not.
diagnosis_date: is date of diagnosis of cancer
enter_date: when they enter the study
end_date: when they leave the study
follow_years: is follow up period in years., calculated by subtracting enter_date from end_date and dividing by 365.

I then used the following codes to calculate incidence rates over calendar month in person-years. However, i am not getting the right results. I am getting one row of results with calendar month as 1960m1 and incidence rates as 23657. I will be very grateful for help on this.

*Declaring the survival data

stset follow, failure(cancer==1) id(id)

*Generating monthly cut points (January 2021 to Dec 2024)

local month_start = ym(2021,01)
local month_end = ym(2024,12)

local cutpoints
forvalues m = `month_start'/`month_end' {
local cutpoints = `cutpoints' `=dofm(`m')'
}

*Splitting survival time by calendar month

stsplit calmonth, at(`cutpoints')
gen cal_month = mofd(_t0)
format cal_month %tm

*Calculating incidence rate per 100,000 person-years
strate cal_month, per(100000)

Thank you very much.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30169
#2

11 Aug 2025, 14:04

Please repost your example data: there is something wrong with it. The example data you show ahas the diagnosis_date, enter_date and end_date all equal to each other in every observation except the last two. And in those two, the diagnosis date is simply invalid.

When posting your example data, please use the -dataex- command. If you are running version 16 or later, or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment

Naa Naadu

Join Date: Aug 2021
Posts: 21

11 Aug 2025, 15:41

Thanks very much.

Please find attached updated information.

I have a data which include people diagnosed with cancer from 1st January 2021 to 31st December 2024. I want to calculate the incidence rate over calendar months during the follow up period in person-years. This is sample of my data:

copy starting from the next line ------------ ----------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int id float(cancer diagnosis_date enter_date end_date follow)
1 1 22803 22803 23064    .714579
2 0 22866 22866 23158   .7994524
3 0 23287 23287 23741  1.2429843
4 1 23098 23098 23507   1.119781
5 0 23120 23120 23539  1.1471595
6 0 22584 22584 22736   .4161533
7 1 22323 22323 22344  .05749487
618 0 23287 23287 23741  1.2429843
8 1 22966 22966 23309   .9390828
9 1 22583 22583 22734   .4134155
10 0 22287 22287 22290 .008213553
11 1 22300 22300 22310 .027378507
12 1 22895 22895 23202   .8405202
13 0 23181 23181 23631   1.232033
14 1 23588 23588 23741   .4188912
15 1 22881 22881 23181   .8213552
16 1 22662 22662 22852  .52019167
17 0 22320 22320 22340  .05475701
18 0 22439 22439 22518   .2162902
19 1 22816 22816 23084    .733744
20 1 22287 22287 22290 .008213553
21 1 23654 23654 23741    .238193
22 0 23290 23290 23741  1.2347707
23 0 23622 23622 23741   .3258042
24 0 22894 22894 23200   .8377823
25 0 23455 23455 23741   .7830253
26 1 23155 23155 23592  1.1964408
27 0 22286 22286 22289 .008213553
28 1 22994 22994 23350   .9746749
30 1 23569 23569 23741   .4709103
31 1 23645 23645 23741  .26283368
32 0 23720 23720 23741  .05749487
33 1 22911 22911 23226    .862423
34 1 22468 22468 22561  .25462013
35 0 23551 23551 23741  .52019167
36 1 23407 23407 23741   .9144422
37 1 23303 23303 23741  1.1991787
38 0 23222 23222 23692    1.28679
39 1 23209 23209 23673  1.2703627
40 1 23738 23738 23741 .008213553
41 1 22897 22897 23205    .843258
42 1 22458 22458 22546  .24093087
43 1 22450 22450 22535   .2327173
44 0 22901 22901 23211   .8487337
45 1 22532 22532 22658   .3449692
46 0 22431 22431 22506   .2053388
47 1 22630 22630 22805   .4791239
48 0 23706 23706 23741  .09582478
49 1 22789 22789 23043   .6954141
50 0 23233 23233 23709   1.303217
51 1 23055 23055 23442  1.0595483
52 1 22807 22807 23070   .7200547
53 1 23563 23563 23741   .4873374
54 0 22400 22400 22459   .1615332
55 0 23004 23004 23366    .991102
56 1 22781 22781 23031   .6844627
57 1 22971 22971 23316   .9445585
58 1 23297 23297 23741  1.2156057
59 1 22389 22389 22443  .14784394
60 0 22876 22876 23173   .8131417
61 0 22568 22568 22712   .3942505
62 1 23248 23248 23731   1.322382
63 1 23562 23562 23741   .4900753
64 1 22565 22565 22707   .3887748
65 0 22573 22573 22719   .3997262
66 0 22997 22997 23355   .9801506
67 1 23652 23652 23741   .2436687
68 1 22881 22881 23181   .8213552
69 0 23450 23450 23741   .7967146
70 0 22531 22531 22656   .3422314
71 1 22962 22962 23302   .9308693
72 0 23651 23651 23741  .24640657
73 1 22869 22869 23163   .8049281
74 1 23718 23718 23741  .06297057
75 1 22486 22486 22589  .28199863
76 0 22923 22923 23244   .8788501
77 1 23588 23588 23741   .4188912
78 1 23268 23268 23741  1.2950034
80 0 23692 23692 23741  .13415469
81 1 22882 22882 23183   .8240931
83 1 22606 22606 22769   .4462697
84 1 23344 23344 23741  1.0869268
85 1 23015 23015 23382  1.0047913
86 1 22353 22353 22389  .09856263
87 1 22819 22819 23088   .7364818
88 1 22807 22807 23070   .7200547
89 1 23359 23359 23741   1.045859
90 1 23630 23630 23741   .3039014
91 1 22295 22295 22302 .019164955
92 0 22504 22504 22616   .3066393
94 1 23572 23572 23741   .4626968
95 0 22466 22466 22559  .25462013
96 0 22633 22633 22809   .4818617
97 1 22927 22927 23250   .8843258
98 0 23542 23542 23741   .5448323
99 0 23508 23508 23741   .6379192
100 1 22662 22662 22853   .5229295
101 1 23586 23586 23741   .4243669
102 1 23532 23532 23741   .5722108
103 1 22419 22419 22488   .1889117
end
format %td diagnosis_date
format %td enter_date
format %td end_date

copy up to and including the previous line ------ -----------

Data is one row per patient.

id: is patient ID
cancer: 1 if diagnosed with cancer, 0 if not.
diagnosis_date: is date of diagnosis of cancer
enter_date: when they enter the study
end_date: when they leave the study
follow_years: is follow up period in years., calculated by subtracting enter_date from end_date and dividing by 365.

I then used the following codes to calculate incidence rates over calendar month in person-years. However, i am not getting the right results. I am getting one row of results with calendar month as 1960m1 and incidence rates as 23657. I will be very grateful for help on this.

*Declaring the survival data

stset follow, failure(cancer==1) id(id)

*Generating monthly cut points (January 2021 to Dec 2024)

local month_start = ym(2021,01)
local month_end = ym(2024,12)

local cutpoints
forvalues m = `month_start'/`month_end' {
local cutpoints = `cutpoints' `=dofm(`m')'
}

*Splitting survival time by calendar month

stsplit calmonth, at(`cutpoints')
gen cal_month = mofd(_t0)
format cal_month %tm

*Calculating incidence rate per 100,000 person-years
strate cal_month, per(100000)

Thank you very much.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30169
#4

11 Aug 2025, 19:01

There are several problems here. The most fundamental one is that you have -stset- your data in terms of the variable follow, which is denominated in years. But your -stsplit- cutoffs are dates at the beginning of each month, and these are numbers like 23700. So none of these cutoffs fall within the range of the follow-variable because the follow variable is of order of magnitude. The cutoffs have to be the same kind of time variables as the failure time variable.

Now, that is a bit of a tall order here, because your participants start at different times, and so there is no uniform set of cutoffs that you can come up with on the dimension of the follow variable. You have to revise your -stset- command to use the actual dates of origin and failure. But it is not possible to do that from the data you have provided. In all of your observations we have diagnosis_date == start_date, and start_date < end_date. But diagnosis is the failure event! So your data do not provide any possible origin date (which, must precede the failure date, because, by definition, it is the date at which the participant first becomes at risk for the failure event.)

Your start and end date variables are said to be when the person enters and leaves the study. But if the study design is such that the person enters the study upon diagnosis, then it is literally impossible to calculate an incidence rate for that diagnosis with that study design. To get an incidence rate, you must start with people not having the failure event, and then some of the people develop it during the study observation period. But you don't have that, or at least it is not in the data you are showing.
Comment
Naa Naadu

Join Date: Aug 2021

Posts: 21
#5

12 Aug 2025, 04:22

Dear Clyde, Thanks very much. I generated the example data myself as I am working in a secure environment and could not create sample of the data. However, the actual is as you have stated. Many thanks and kind regards.
Comment

Announcement

Incidence rate over calendar months

Comment

Comment

Comment

Comment