Error in documentation for stset?

Harlan Sayles

Join Date: May 2016

Posts: 8
#1

Error in documentation for stset?

12 Apr 2023, 15:01

I am using Stata version 17 with Windows 10 and I have been looking at the pdf documentation for the stset command, specifically the "if() versus if exp" example on pages 469-470. There are two stset commands in this section and I would expect both commands to "stset" 2 of the three records (i.e. _st == 1 for records where x1!=22. Based on the documentation, for the first command, I expect _t0 = 0 and 7 and _t = 7 and 11 for the two records, respectively, while for the second command, I expect _t0 = 0 and 9 and _t = 7 and 11 for the two records, respectively.

The code below produces the simple three line dataset used in the example and runs the two stset commands. Both commands produce the same result and neither is what I expected based on the text of the example. I tried adding "id(patno)" as an option to both commands to account for the fact that these records are from a single subject, but that result was not what I expected either (only the first record was "stset", i.e. _st == 1).

What commands would be necessary to produce the results described in the example, especially the "correct" result that is supposed to come from the second stset command?

Code:

clear set obs 3 foreach var in patno t x1 x2 code { gen `var' = . } replace patno = 3 replace t = 7 in 1 replace t = 9 in 2 replace t = 11 in 3 replace x1 = 20 in 1 replace x1 = 22 in 2 replace x1 = 21 in 3 replace x2 = 5 replace code = 14 in 1 replace code = 23 in 2 replace code = 29 in 3 list stset t if x1!=22, failure(code==14) list stset t, if(x1!=22) failure(code==14) list
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30075
#2

12 Apr 2023, 15:23

I think you are misunderstanding what is written in the documentation. It explains there that the difference between the ordinary -if- qualifier and the -if(exp)- option is that the formula excludes observations from the entire -stset- process (as if they weren't even part of the data set), whereas the latter retains those observations for the purpose of calculating "derived variables." They don't explain what "derived" variables means, but I believe they are talking about the variables calculated with the -ever()- -never()-, etc. options. Your -stset- commands don't involve any of those, so the two commands should, and do, produce identical results. The results they produce are the same as you would get if you had a -drop if x1 == 22- command before you ran the -xtset-.

Only observations 1 and 3 are considered: observation 2 is, for present purposes, non-existent. In observation 1, the patient experiences a failure (code == 14) at time 7. So _t0 is 0, and _t = 7 and _d = 1. In observation 3, the patient does not fail by time t = 11, so _t0 = 0 and _t = 11 and _d = 0. All as expected.

If you add the -id(patno)- option to the command, then the handling of observation 3 is different: because patno 3 already failed at time 7, observation 3, which is for time 11 is disregarded because the patient has already failed before then. If you wish to model a process in which the same patient can fail multiple times, you need to specify the -exit(time .)- option to suppress this behavior.
2 likes
Comment
Harlan Sayles

Join Date: May 2016

Posts: 8
#3

13 Apr 2023, 12:32

Clyde - thank you for the quick reply. I think I've figured out the error. They should have included the -id(patno)- option to identify that the observations are coming from the same person, but they also should have used code==29 (not 14) for the -failure- option. By using -id(patno)- and -failure(code==29)-, I get the result that is described in the text. The first -stset- command incorrectly assigns the start time for the second record to be 7 (with no gap) whereas the second -stset- command applies the proper value of 9 (with a gap).
Comment

Announcement

Error in documentation for stset?

Comment

Comment