Calculating person time and incidence rates

CEdward

Join Date: Nov 2014

Posts: 131
#1

Calculating person time and incidence rates

28 Dec 2014, 13:50

Hi all,

I wanted to know how to calculate person time and incidence rates given data that is set up in a similar fashion:

ID 001 Hearttransplantdate 3/23/1992 Testdate 12/30/2003 Cardiovascular disease (CVD) 0
ID 001 Hearttransplantdate 3/23/1992 Testdate 5/2/2004 CVD 0
ID 001 Hearttransplantdate 3/23/1992 Testdate 5/13/2004 CVD 0
ID 001 Hearttransplantdate 3/23/1992 Testdate 7/19/2004 CVD 1
ID 002
ID 002
ID 003
ID 003
ID 003
.
.
.

My failure event would be cardiovascular disease (indicated by 1) or the last time they had a test date. My origin would be when they had the transplant date. I want to calculate the person-time in months.

Last edited by CEdward; 28 Dec 2014, 14:05.
Tags: None
Daniel Richards

Join Date: Dec 2014

Posts: 14
#2

28 Dec 2014, 14:10

Hi Jack,

Not sure how big your data is, but you might want to import to excel and have a play around with it before putting it into STATA. You could use pivot tables to get the Min date for each translplantdate and max date for each Testdate by ID. Then export to Stata after that. Otherwise, the Introduction to Survival Analysis book has chapters on how to set survival time etc. Its a good read, here is the link

http://www.stata.com/bookstore/survi...-introduction/

Cheers,
Dan
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#3

28 Dec 2014, 14:12

Thanks for the reply Daniel. Anybody else know how to do this?
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4458
#4

28 Dec 2014, 14:41

the structure of the data shown in #1 above makes no sense to me; do you really have variable (name not shown) that is equal to "ID" in every observation? at any rate; look at the help for -snapspan-; when you re-post, please show us (following the instructions in the FAQ what your data really look like
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#5

28 Dec 2014, 15:04

I am confused by what you are asking Rich. What do you mean by "do you really have variable (name not shown)...observation?".
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#6

28 Dec 2014, 15:17

Originally posted by Rich Goldstein View Post

the structure of the data shown in #1 above makes no sense to me; do you really have variable (name not shown) that is equal to "ID" in every observation? at any rate; look at the help for -snapspan-; when you re-post, please show us (following the instructions in the FAQ what your data really look like

Also, I don't think I should be using the snapspan command. Indeed, I do not know the time span information and I have measurements which differ at each test date. However, my data violates the assumption that these measurements (i.e. height/weight) are constant b/w two dates.

Last edited by CEdward; 28 Dec 2014, 15:21.
Comment

Svend Juul

Join Date: Apr 2014
Posts: 515

28 Dec 2014, 15:58

Jack:
I believe that the confusion about the structure of your data comes from the fact that you intermingle variable names with variable values. Hopefully that is not the real structure of your data. Here is how I believe (or hope) that your first four observations look:

Code:

clear
 input id str10 s_htdate str10 s_testdate cvd
 1 "3/23/1992" "12/30/2003" 0
 1 "3/23/1992" "5/12/2004" 0
 1 "3/23/1992" "5/13/2004" 0
 1 "3/23/1992" "7/19/2004" 1
 end

.  list , clean
       id    s_htdate   s_testdate   cvd 
  1.    1   3/23/1992   12/30/2003     0 
  2.    1   3/23/1992    5/12/2004     0 
  3.    1   3/23/1992    5/13/2004     0 
  4.    1   3/23/1992    7/19/2004     1

I gave your dates an s_ prefix because I had to enter them as strings. But to proceed you need them as Stata dates. I use the less ambiguous %td format:

Code:

gen htdate = date(s_htdate,"MDY")
 gen testdate = date(s_testdate,"MDY")
 format htdate testdate %td
drop s_*
gen htdate = date(s_htdate,"MDY")
 gen testdate = date(s_testdate,"MDY")
 format htdate testdate %td
 drop s_*

.  list , clean
       id   cvd      htdate    testdate 
  1.    1     0   23mar1992   30dec2003 
  2.    1     0   23mar1992   12may2004 
  3.    1     0   23mar1992   13may2004 
  4.    1     1   23mar1992   19jul2004

Now, first find the enddate = last testdate for each id. Next, replace the enddate by the testdate when cvd occurred:

Code:

sort id testdate
 by id: egen enddate = max(testdate)
 format enddate %td
 replace enddate = testdate if cvd==1

.  list, clean
       id   cvd      htdate    testdate     enddate 
  1.    1     0   23mar1992   30dec2003   19jul2004 
  2.    1     0   23mar1992   12may2004   19jul2004 
  3.    1     0   23mar1992   13may2004   19jul2004 
  4.    1     1   23mar1992   19jul2004   19jul2004

I would definitely not encourage to do these things in Excel, as suggested in post #2.

Hope this helps

Comment

Rich Goldstein

Join Date: Mar 2014

Posts: 4458
#8

28 Dec 2014, 19:29

Jack,

while I expect that Svend is correct about the actual structure of your data, the only way to be sure is to show us (using the "code" delimiters (reminder - read the FAQ))
Comment

CEdward

Join Date: Nov 2014
Posts: 131

05 Jan 2015, 01:07

Hi Svend, this is an excellent response to my post and it is exactly what I am after. But, how would I now determine the time in months between the enddate and the htdate?

Originally posted by Svend Juul View Post

Code:

clear
input id str10 s_htdate str10 s_testdate cvd
1 "3/23/1992" "12/30/2003" 0
1 "3/23/1992" "5/12/2004" 0
1 "3/23/1992" "5/13/2004" 0
1 "3/23/1992" "7/19/2004" 1
end

. list , clean
id s_htdate s_testdate cvd
1. 1 3/23/1992 12/30/2003 0
2. 1 3/23/1992 5/12/2004 0
3. 1 3/23/1992 5/13/2004 0
4. 1 3/23/1992 7/19/2004 1

I gave your dates an s_ prefix because I had to enter them as strings. But to proceed you need them as Stata dates. I use the less ambiguous %td format:

Code:

gen htdate = date(s_htdate,"MDY")
gen testdate = date(s_testdate,"MDY")
format htdate testdate %td
drop s_*
gen htdate = date(s_htdate,"MDY")
gen testdate = date(s_testdate,"MDY")
format htdate testdate %td
drop s_*

. list , clean
id cvd htdate testdate
1. 1 0 23mar1992 30dec2003
2. 1 0 23mar1992 12may2004
3. 1 0 23mar1992 13may2004
4. 1 1 23mar1992 19jul2004

Now, first find the enddate = last testdate for each id. Next, replace the enddate by the testdate when cvd occurred:

Code:

sort id testdate
by id: egen enddate = max(testdate)
format enddate %td
replace enddate = testdate if cvd==1

. list, clean
id cvd htdate testdate enddate
1. 1 0 23mar1992 30dec2003 19jul2004
2. 1 0 23mar1992 12may2004 19jul2004
3. 1 0 23mar1992 13may2004 19jul2004
4. 1 1 23mar1992 19jul2004 19jul2004

I would definitely not encourage to do these things in Excel, as suggested in post #2.

Hope this helps

Comment

Svend Juul

Join Date: Apr 2014

Posts: 515
#10

05 Jan 2015, 02:42

Find the time in days by:

Code:

generate pdays=enddate-htdate

With the time in days, any transformation to months, years, or millenia is trivial. For simple estimation of incidence rates, see:

Code:

help strate help stptime
Comment
CEdward

Join Date: Nov 2014

Posts: 131
06 Jan 2015, 19:42

This topic by CEdward has been deleted by CEdward

#10

06 Jan 2015, 12:51

Also, I am not sure if I should snapspan my data. The reason being that measurements are taken intermittently with different dates associated with those measurements, but the snapspan function makes me assume that those measurements stay constant between dates, which isn't the case for my data.
CEdward

Join Date: Nov 2014

Posts: 131
#11

06 Jan 2015, 19:43

Originally posted by Svend Juul View Post

Find the time in days by:

Code:

generate pdays=enddate-htdate

With the time in days, any transformation to months, years, or millenia is trivial. For simple estimation of incidence rates, see:

Code:

help strate help stptime

Hi Svend, when we are using the spapspan command, is it possible to apply if qualifers (e.g. Months if (Months !=.))

Last edited by CEdward; 06 Jan 2015, 20:38.
Comment
Svend Juul

Join Date: Apr 2014

Posts: 515
#12

07 Jan 2015, 06:50

Jack,

According to the documentation you cannot use the -if- qualifier with -snapspan-. But I don't understand what you are trying to accomplish. What do you mean by "Months if (Months !=.)?
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#13

07 Jan 2015, 09:39

Perhaps you could tell me better if the data above (which is similar to mine) is an example of data in spanspan. For each id the individual has measurements that are taken on them on various different dates and on those same dates the practitioner is trying to determine if they have CVD. Would that be data in snapspan?
Comment

Svend Juul

Join Date: Apr 2014
Posts: 515

#14

07 Jan 2015, 11:20

I see no point in using -snapspan- for these data. You can -stset- them as they are and go ahead with, for example, -sts graph- or -stcox-.

There are worries, however, related to the circumstances leading to testing. The event of interest (cvd) is determined by a test. The interpretation is different if tests are taken at predetermined points in time or if they are taken due to changes in the patient's condition.

Code:

.  stset testdate , failure(cvd==1) origin(htdate) id(id)

                id:  id
     failure event:  cvd == 1
obs. time interval:  (testdate[_n-1], testdate]
 exit on or before:  failure
    t for analysis:  (time-origin)
            origin:  time htdate

------------------------------------------------------------------------------
        4  total observations
        0  exclusions
------------------------------------------------------------------------------
        4  observations remaining, representing
        1  subject
        1  failure in single-failure-per-subject data
     4501  total analysis time at risk and under observation
                                              at risk from t =         0
                                   earliest observed entry t =         0
                                        last observed exit t =      4501

. list, clean

       id   cvd      htdate    testdate     enddate   _st   _d     _t    _t0 
  1.    1     0   23mar1992   30dec2003   19jul2004     1    0   4299      0 
  2.    1     0   23mar1992   12may2004   19jul2004     1    0   4433   4299 
  3.    1     0   23mar1992   13may2004   19jul2004     1    0   4434   4433 
  4.    1     1   23mar1992   19jul2004   19jul2004     1    1   4501   4434

Announcement