Question about the stset options for multiple failure data

Nazli Uludere Aragon

Join Date: Dec 2016

Posts: 9
#1

Question about the stset options for multiple failure data

10 May 2025, 12:42

Dear StataListers,

I am working with panel data on outcomes after landowners complete time in conservation programs. The nature of my data can be characterized as multiple failure, because a subset of landowners can enter /exit the program more than once, and fail more than once after exiting (failure = discontinuation of conservation action). The failures are ordered. The Stata manual and Mario Cleves's following piece (https://www.stata.com/support/faqs/statistics/multiple-failure-time-data/) have been very helpful in setting up my data for analysis.

But, can you help with the intution behind the different "stset" options in Mario Cleve's piece, specifically in sections "3.2.3 The conditional risk set model (time from entry)" versus "3.2.4 The conditional risk set model (time from the previous event)" both discuss alternatives to working with ordered failure data, and I can setup my data both ways though the second one makes more sense (resetting the clock). The difference is in the first case id() option is specified in stset, whereas in the latter id() is not specified, instead errors are clustered by id(). Stata manual notes "Specifying id() never hurts". I think I understand why it is necessary to specify id() in the first case, but not 100% sure why it is omitted for stset in the second alternative. Is it redundant?

Thank you for your time.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#2

10 May 2025, 15:02

In the second alternative, specifying -id()- is not redundant: it cause Stata to behave differently from what you want. When -id()- is specified, Stata's default behavior is to treat the group of observations with a common value of the id variable as belonging to the same person, but it also treats that person as exiting from the analysis at the first failure. So subsequent failures would never be recognized. You can override that behavior by, in addition to specifying -id()-, specifying -exit(time .)-, which tells Stata not to exit the person after any failure but to continue to track the person as far out in time as their data goes. But in that mode, the subsequent failure times are reckoned from the time the person enters the study, so you are getting time from entry, not time from previous event. Those are the only two possibilities when -id()- is specified, and neither is what you want.

To get what you want you have to not tell Stata to group observations on the -id()- variable, and the time variable must be set equal to the time since the previous event (if any, or from entry if no prior events exist). but rather to just treat every observation as a separate event. As a result, you will have "tricked" Stata into counting each event's time from the preceding event. That has one problem: the standard errors will be wrong because the correlation within-persons has not been accounted for. But that is easily overcome by clustering the errors on the id variable.
1 like
Comment
Nazli Uludere Aragon

Join Date: Dec 2016

Posts: 9
#3

12 May 2025, 12:18

Hi Clyde, thank you. That makes perfect sense. You have explained it very clearly. I appreciate it! Nazli
Comment

Announcement

Question about the stset options for multiple failure data

Comment

Comment