I have a data which include people diagnosed with cancer from 1st January 2021 to 31st December 2024. I want to calculate the incidence rate over calendar months during the follow up period in person-years. This is sample of my data:
Data is one row per patient.
id: is patient ID
cancer: 1 if diagnosed with cancer, 0 if not.
diagnosis_date: is date of diagnosis of cancer
enter_date: when they enter the study
end_date: when they leave the study
follow_years: is follow up period in years., calculated by subtracting enter_date from end_date and dividing by 365.
I then used the following codes to calculate incidence rates over calendar month in person-years. However, i am not getting the right results. I am getting one row of results with calendar month as 1960m1 and incidence rates as 23657. I will be very grateful for help on this.
*Declaring the survival data
stset follow, failure(cancer==1) id(id)
*Generating monthly cut points (January 2021 to Dec 2024)
local month_start = ym(2021,01)
local month_end = ym(2024,12)
local cutpoints
forvalues m = `month_start'/`month_end' {
local cutpoints = `cutpoints' `=dofm(`m')'
}
*Splitting survival time by calendar month
stsplit calmonth, at(`cutpoints')
gen cal_month = mofd(_t0)
format cal_month %tm
*Calculating incidence rate per 100,000 person-years
strate cal_month, per(100000)
Thank you very much.
id | cancer | diagnosis_date | enter_date | end_date | follow_years |
1 | 0 | 07/11/2020 | 07/11/2020 | 07/11/2020 | 0.80 |
2 | 0 | 07/01/2010 | 07/01/2010 | 07/01/2010 | 0.60 |
3 | 1 | 07/03/1999 | 07/03/1999 | 07/03/1999 | 0.70 |
4 | 1 | 07/05/1988 | 07/05/1988 | 07/05/1988 | 0.60 |
5 | 0 | 07/07/1977 | 07/07/1977 | 07/07/1977 | 0.55 |
6 | 0 | 07/09/1966 | 07/09/1966 | 07/09/1966 | 0.50 |
7 | 1 | 07/11/1955 | 07/11/1955 | 07/11/1955 | 0.45 |
8 | 1 | 07/01/1945 | 07/01/1945 | 07/01/1945 | 0.40 |
9 | 1 | 07/032021 | 07/03/1934 | 07/03/1934 | 0.35 |
10 | 0 | 07/032021 | 07/05/1923 | 07/05/1923 | 0.30 |
Data is one row per patient.
id: is patient ID
cancer: 1 if diagnosed with cancer, 0 if not.
diagnosis_date: is date of diagnosis of cancer
enter_date: when they enter the study
end_date: when they leave the study
follow_years: is follow up period in years., calculated by subtracting enter_date from end_date and dividing by 365.
I then used the following codes to calculate incidence rates over calendar month in person-years. However, i am not getting the right results. I am getting one row of results with calendar month as 1960m1 and incidence rates as 23657. I will be very grateful for help on this.
*Declaring the survival data
stset follow, failure(cancer==1) id(id)
*Generating monthly cut points (January 2021 to Dec 2024)
local month_start = ym(2021,01)
local month_end = ym(2024,12)
local cutpoints
forvalues m = `month_start'/`month_end' {
local cutpoints = `cutpoints' `=dofm(`m')'
}
*Splitting survival time by calendar month
stsplit calmonth, at(`cutpoints')
gen cal_month = mofd(_t0)
format cal_month %tm
*Calculating incidence rate per 100,000 person-years
strate cal_month, per(100000)
Thank you very much.
Comment