stset multiple failure data

LarsFolkestad

Join Date: Sep 2014

Posts: 165
#1

stset multiple failure data

22 Dec 2015, 15:53

Dear List
I have asked this question in different ways before, but it turns out I'm sure what i am doing right or wrong. I think my problem is how i stset my multiple failure data.

i have the following variables:
id
age_at_start_obs
age_at_end_of_obs
age_at_event
event

I want to stset my data so that i can calculate the incidence rates pr age band using fx stptime or by splitting the data by using stsplit

A dummy dataset

Code:

input id age_at_start_obs age_at_end_of_obs age_at_event event 1 50 60 55 1 1 50 60 56 1 1 50 60 59 1 2 40 45 . 0 2 40 45 44 1 3 75 80 . 0 3 75 80 76 1 3 75 80 77 1 end

I am not sure how to best stset these data to achieve my goal.
I have tried the following

Code:

replace age_at_event=age_at_end_of_obs if age_at_event==. stset age_at_event, id(id) fail(event) exit(age_at_end_of_obs) enter(age_at_start_obs) stptime, at(35(1)81)

And this seems to give me the results that i what, but i am still wondering:
Is this the correct way to do it?

But when i stsplit my data:

Code:

stsplit years, every(1) replace event=0 if event==. tab years event

id=1 i would suspect that this individual would have 10 lines in the split dataset, but there are only 9 (age_at_event=50-59).
Could anyone try to explain why that is?

Hope you can look through it and see if I'm off by a mile.

Thank you
Lars

Last edited by LarsFolkestad; 22 Dec 2015, 16:00. Reason: added stsplit command
Tags: None
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

23 Dec 2015, 13:18

Thanks for providing a small data set, but the FAQ also ask that you show results which illustrate your problem. Here are the results of your stsplit statement for id 1.

Code:

. list _t0 _t years event if id==1 +--------------------------+ | _t0 _t years event | |--------------------------| 1. | 50 51 50 . | 2. | 51 52 51 . | 3. | 52 53 52 . | 4. | 53 54 53 . | 5. | 54 55 54 1 | |--------------------------| 6. | 55 56 55 1 | 7. | 56 57 56 . | 8. | 57 58 57 . | 9. | 58 59 58 1 | +--------------------------+

Nothing is missing. There are nine periods of followup for id = 1, ending at age 59. The The value 58 of "years" in line 9 is the coded value for the interval \(58 < age \le 59\) .There is no experience for id 1 in the interval \(59 < age \le 60\), so no observation is created for that interval. See pp. 395 & 399 of the Manual.

Last edited by Steve Samuels; 23 Dec 2015, 13:40.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
LarsFolkestad

Join Date: Sep 2014

Posts: 165
#3

24 Dec 2015, 03:23

Thank you Steve!
I just thought that when i stset the data using both Enter and exit 'date' and seeing the exit was for age=60 for id=1 that when i split the data i would have an age_band from 59-60 with on event (in other words the last year of observation).

Just for my peace of mind, can you see anything wrong with the setup/code i have used? The question beeing: how many events pr person year.

Thank you.
Lars
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#4

24 Dec 2015, 07:44

I don't see anything wrong with the stset command itself. If you have actual dates of entry, exit, and event (even calendar month), use those instead of age in stset. Then, let stptime do the age category analysis. This will give more exact PY calculations. You will need a lot of events per year to estimate single year rates with good precision. power exponential can provide guidance about this.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
LarsFolkestad

Join Date: Sep 2014

Posts: 165
#5

24 Dec 2015, 14:08

Thank you Steve!
X-mas saved.
Happy hollidays to you.
Lars
Comment
LarsFolkestad

Join Date: Sep 2014

Posts: 165
#6

06 Feb 2016, 01:15

Sorry to have to bring this back up again, but after working with my 3500+ individual dataset, with multiple failures i seem to have come across something that i need to work around.

The problem can be exemplified by this data set

Code:

input id age_at_start_obs age_at_end_of_obs age_at_event event 1 50 60 55 1 1 50 60 56 1 1 50 60 57 1 2 40 45 44 1 3 75 80 76 1 3 75 80 77 1 end

age in years and event = 1 if failure 0 if no failure.

My research question is how many events are there pr person years.
i i simply subtract age_at_start_obs from age_at_end_of_obs, there would be 10 years for id 1, 5 years for id 2 and 5 years for id 3. This sums to a total of: 20 years at risk for the entire group.
Now, if i stset the data and then split them:

Code:

stset age_at_event, fail(event==1) id(id) enter(time age_at_start_obs) exit(time age_at_end_of_obs)

the total analysis time at risk and under observation is 13 (id1: 5+1+1 years, id2: 4 years and id3: 2years)

when i stsplit

Code:

stsplit years, every(1)

i create 7 new observations (periods)

Now, is there a way for me to stset my multiple failure data that can takes into account that there are time left in the dataset after the last failure for some individuals.

it is as if there is a line missing for each id with no failures covering the last stint of time.
The data structure is as shown in the example, but with a lot more observations pr individual.

hope you can understand and help me get the rest out of my data.

Lars
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

07 Feb 2016, 19:56

There is no way to stset your data to do what you want, and for a good reason: the denominator you want to create is not person-years. It is instead the total of potential observation time, a very different thing A "rate" calculated this way does not estimate any well-defined survival parameter of a survival distribution.

Consider single-record data with an exponential distribution, in which the hazard function \(\lambda(t)\) is a constant \(\lambda\). Suppose \(D\) is the number of observed failures and \(t_i\) is the recorded failure time or censoring time, in years, for observation \(i\). Then \(T = \sum_i t_i\) is the total of person-years that Stata displays in stsum.

It is proved in every survival book that the maximum likelihood estimate of \(\lambda\) is:

\[
\widehat{\lambda} = \frac{D}{T}
\]
Notice, no effect of potential observation time after \(t_i\). What parameter of the exponential distribution will your "rate" estimate? There is none that I can name.

The problem with a denominator that counts potential observation time is easily illustrated with a thought experiment. Suppose that potential age-at-end of observation for ID 3 in your example is 100 instead of 80. Even though the actual observed data for ID 3 doesn't change, your computed rate changes from 6/20 to 6/40.

Last edited by Steve Samuels; 07 Feb 2016, 20:27.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
LarsFolkestad

Join Date: Sep 2014

Posts: 165
#8

08 Feb 2016, 09:52

Thank you Steve for this elaborate answere.

Now lets make a different thought experiment: lets say that ID3 exits the study ages 100, and the latest event in record is at 77. ID4 enteres at age 75 and and exits st age 100 and have 1 failure at age 80 and age 90.

I want to calculate the incidense rate for events from 75-100 pr 25 persontransport: two id's complete this age span, adding to 50 person years and there are 4 failures, this in my mind would be 2 per 25 person years in the age span from 75-100. Concluding that you from the ages 75-100 will have 2 events on average.

As they are now, id3 will contribute with 2 failures in 2 years and id4 with 2 failures and 15 years of time in the data set adding to 17 person years - but we know that the persons did not have any events after their last and we observed them for 22 and 10 years after the last recorded event. The now counted incidense rates are ca 6 Per 25 years in the age span from 75-100.

I if ad an observation in the example data for each id with age_at_event=age_at_end_of_obs
And event=0, than when i stset, estimate stptime And stsplit i get the same as my calculated event rates Per 25 person years.

It is impossible to die from the event in this example, if that makes a difference.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#9

08 Feb 2016, 16:05

You can calculate person-years in an age-interval for a person, but it is the actual time observed in the interval, not the length of the interval.

Let me put it another way. The only way an analyst can know the maximum age that person could have potentially attained in a study is because they know the date that followup ceased and the age of the person on that date. If the date that followup ends is changed, for budgetary or other reasons, the maximum attainable age changes. Sppose a person enteedr a study on January 1, 1990 at age 50 and follow-up is planned to end on Dec 31, 1999. The maximum age the person can attain in the study is between 60 & 61. However that person could die or leave the study in 1991. So saying, as you do, that the person has 10 person-years in the study is simply incorrect. 10 is the maximum person-years that the subject could have ihad in the study, not the actual.

As my arguments apparently don't convince you, I won't try further. This is my last post in the thread.

Last edited by Steve Samuels; 08 Feb 2016, 16:41.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
LarsFolkestad

Join Date: Sep 2014

Posts: 165
#10

09 Feb 2016, 06:30

i fully respect that you sign of this thread. Thank you for the advices given throughout.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#11

09 Feb 2016, 09:17

You're welcome. I should in the first place have asked you the reference for your definition of "person-years".

Last edited by Steve Samuels; 09 Feb 2016, 09:42.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

stset multiple failure data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment