STCOX regression with time-varying dummy variables -- please help!

Josh Guetzkow

Join Date: Mar 2015

Posts: 5
#1

STCOX regression with time-varying dummy variables -- please help!

02 Mar 2015, 13:06

Hi, I am at my wit’s end and hoping that somebody here can help with a problem I’m having getting a hazard model to run properly in STATA. The problem involves a time-varying independent dummy 'treatment' variable. Here are some relevant details:

The study is on the relationship between family visits in prison and prison violence. Essentially, we’re trying to look at whether risk of being violent in prison is lower (or higher) after family visits. We are not that interested in the timing between visits and violence, just changes in risk.

The data are prisoner-level data taken over a 2-year observation period (2009-20010). We have left and right censoring (some prisoners were already in prison when the observation period began, and some prisoners were still in prison when the observation period ended). The main problem we seem to be having is that many prisoners have multiple visits, some of them actually a lot, sometimes multiple visits in a week.The mean number of visits is 22, with a maximum of 121, minimum 0.

For every prisoner, we have dates for entry into prison, exit from prison, visits and violent events, plus a bunch of static controls. The data are organized by prisoner-event. So for each prisoner, we have a row for entry data and control variables, start of observation period, visit dates (if any), date of violent event, and exit from observation (either left prison or observation period ended).

So the data are structured as prisoner-event data, which looks like this (with value labels shown instead of numeric values):

IDvar timevar event_type age_at_entry crime_type visit_dummy
Prisoner1 1/1/09 entry 26 violent 0
Prisoner1 2/1/09 visit 26 violent 1
Prisoner1 3/1/09 visit 26 violent 1
Prisoner1 3/15/09 violence 26 violent 0
Prisoner1 6/1/09 release 26 violent 0

Prisoner2 2/16/10 entry 35 property 0
Prisoner2 3/8/10 visit 35 property 1
Prisoner2 9/1/10 release 35 property 0
.
.
There are 28,463 unique prisoners in the data and 366,477 observations (rows). There were 3440 violent events. Some prisoners have repeated risk for violence, but for now we are only including their first event. If we can get it to work then we might include subsequent events.

Here is the stset command and output:

stset timevar, failure(event_type==4) id(IDvar) origin(event_type ==1) enter(event_type ==5) exit(event_type ==6)

id: IDvar
failure event: event_type == 4
obs. time interval: (timevar[_n-1], timevar]
enter on or after: event_type ==5
exit on or before: event_type ==6
t for analysis: (time-origin)
origin: event_type ==1

----------------------------------------------------------------------
366477 total obs.
37241 multiple records at same instant PROBABLE ERROR
(timevar[_n-1]==timevar)
20475 obs. end on or before enter()
34 obs. begin on or after exit
----------------------------------------------------------------------
308727 obs. remaining, representing
28460 subjects
3440 failures in multiple failure-per-subject data
7862465 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 12710

Here are two cox regressions. The first one has two time-constant variables, age and offense type (as shown in data above). The second one adds the visit_dummy variable shown in the example data above (=1 when there is a visit, 0 otherwise).

. stcox age crime_type

failure _d: event_type== 4
analysis time _t: (timevar-origin)
origin: event_type ==1
enter on or after: event_type ==5
exit on or before: event_type ==6
id: IDNT_MAASAR

Iteration 0: log likelihood = -30421.854
Iteration 1: log likelihood = -30084.938
Iteration 2: log likelihood = -30077.562
Iteration 3: log likelihood = -30077.553
Refining estimates:
Iteration 0: log likelihood = -30077.553

Cox regression -- Breslow method for ties

No. of subjects = 28429 Number of obs = 308477
No. of failures = 3440
Time at risk = 7857475
LR chi2(2) = 688.60
Log likelihood = -30077.553 Prob > chi2 = 0.0000

----------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------------+--------------------------------------------------
age | .9610433 .0017798 -21.46 0.000 .9575613 .964538
crime| 1.462947 .051784 10.75 0.000 1.364893 1.568046

So far, so good. Higher age associated with lower risk of violence, violent crime conviction associated with higher risk of violence.

Now here is the same model with the time-varying, frequently repeated visit dummy:

. stcox age crime_type visit_dummy

failure _d: event_type == 4
analysis time _t: (timevar-origin)
origin: event_type ==1
enter on or after: event_type ==5
exit on or before: event_type ==6
id: IDNT_MAASAR

Iteration 0: log likelihood = -30421.854
Iteration 1: log likelihood = -25230.114
Iteration 2: log likelihood = -25092.321
Iteration 3: log likelihood = -25049.136
Iteration 4: log likelihood = -25033.56
Iteration 5: log likelihood = -25027.871
Iteration 6: log likelihood = -25025.783
Iteration 7: log likelihood = -25025.015
Iteration 8: log likelihood = -25024.733
Iteration 9: log likelihood = -25024.629
Iteration 10: log likelihood = -25024.591
Iteration 11: log likelihood = -25024.577
Iteration 12: log likelihood = -25024.572
Iteration 13: log likelihood = -25024.57
Iteration 14: log likelihood = -25024.569
.
.
.
Iteration 50: log likelihood = -25024.569

Refining estimates:
Iteration 0: log likelihood = -25024.569

Cox regression -- Breslow method for ties

No. of subjects = 28429 Number of obs = 308477
No. of failures = 3440
Time at risk = 7857475
LR chi2(2) = 10794.57
Log likelihood = -25024.569 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------------+---------------------------------------------------------
age | .9489049 .0018013 -27.63 0.000 .9453809 .9524419
crime | 1.910263 .0677006 18.26 0.000 1.782076 2.047671
visit_dummy | 7.90e-24 . . . . .
-----------------------------------------------------------------------------

The hazard ratio coefficient for visit_dummy is tiny, but more problematically it doesn’t compute standard errors. Is it possible that there are simply too many visits for it to deal with? Or too many ties (as indicated by the “PROBABLY ERROR” warning I got in the stset command?).

I have looked everywhere for resources on how to deal with this. It seems that time-varying dummy variables are not frequently used in survival models. I have seen suggestions that data with time-varying variables should be separated into separate observations for each status change. But having a visit doesn't really represent a lasting change in status, so this doesn't seem relevant in this case. Also given the number of visits, doing so would make the dataset very large. I would essentially have to make the data a prisoner-day dataset (or I might get by with prisoner-week). But I thought that STATA’s flexibility with dealing with stdata would obviate this necessity. Or maybe the modeling strategy is wrong and I don't even need a survival model?.

I would have thought that this would be a fairly common modeling issue in medical studies, since it is analogous to patients being treated, sometimes repeatedly, with some medication at particular intervals (except in my case it is visits rather than medications). But I couldn’t find anything.

Any suggestions you wizards of stata have will be extremely appreciated!!

EDIT: Sorry this got posted twice. The first time, it told me there was an error so I re-posted. Can't see how to delete one of them.

Last edited by Josh Guetzkow; 02 Mar 2015, 13:59.
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

02 Mar 2015, 15:43

Hello, Josh,

There is an interesting article from the Stata Journal on categorical time-varying covariates (http://www.stata-journal.com/sjpdf.h...iclenum=st0066).

Hopefully it might be helpful.

Best,

Marcos

Best regards,

Marcos
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#3

02 Mar 2015, 18:25

I can't read your post with its current formatting Repost with data listings, Stata statements, and Stata results inside code delimiters. See FAQ Section 12 http://www.statalist.org/forums/help. In the meantime, you look at: http://hsphsun3.harvard.edu/cgi-bin/...ticle-805.html

Last edited by Steve Samuels; 02 Mar 2015, 18:33.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

02 Mar 2015, 22:38

Josh:
I second all the previous comments, especially Steve's remark concerning the limited readabiity of your Stata session.
Anyway, have you tried to let Stata calculate the categorical variable for visit (i.e.: i.visit instead of visit_dummy) and run Cox regression with this predictor?

Kind regards,
Carlo
(Stata 19.0)
Comment
Josh Guetzkow

Join Date: Mar 2015

Posts: 5
#5

03 Mar 2015, 09:06

Hi, thank you for your responses. I am going to re-post my original message with better formatting and then at the bottom I will reply to your messages one-by-one. Apologies for the mess and I appreciate your patience. Oh, and I should note that I am using STATA SE 12.0

HERE IS THE ORIGINAL POST:

Hi, I am at my wit’s end and hoping that somebody here can help with a problem I’m having getting a hazard model to run properly in STATA. The problem involves a time-varying independent dummy 'treatment' variable. Here are some relevant details:

The study is on the relationship between family visits in prison and prison violence. Essentially, we’re trying to look at whether risk of being violent in prison is lower (or higher) after family visits. We are not that interested in the timing between visits and violence, just changes in risk.

The data are prisoner-level data taken over a 2-year observation period (2009-20010). We have left and right censoring (some prisoners were already in prison when the observation period began, and some prisoners were still in prison when the observation period ended). The main problem we seem to be having is that many prisoners have multiple visits, some of them actually a lot, sometimes multiple visits in a week.The mean number of visits is 22, with a maximum of 121, minimum 0.

For every prisoner, we have dates for entry into prison, exit from prison, visits and violent events, plus a bunch of static controls. The data are organized by prisoner-event. So for each prisoner, we have a row for entry data and control variables, start of observation period, visit dates (if any), date of violent event, and exit from observation (either left prison or observation period ended).

So the data are structured as prisoner-event data, which looks like this (with value labels shown instead of numeric values):

Code:

| IDvar timevar event_type age crime_type visit_dummy | |----------------------------------------------------------------------------| 13. | 727411174523 18 Oct 10 Origin 32.85753 0 0 | 14. | 727411174523 30 Oct 10 Visit 32.85753 0 1 | 15. | 727411174523 13 Nov 10 Visit 32.85753 0 1 | |----------------------------------------------------------------------------| 16. | 727411174523 23 Nov 10 Visit 32.85753 0 1 | 17. | 727411174523 14 Dec 10 Visit 32.85753 0 1 | 18. | 727411174523 18 Oct 10 Entry 32.85753 0 0 | 19. | 727411174523 31 Dec 10 Release 32.85753 0 0 | 20. | 775827518064 23 Mar 82 Origin 24.58356 1 0 | 21. | 775827518064 28 Feb 09 Visit 24.58356 1 1 | 22. | 775827518064 08 Aug 10 Visit 24.58356 1 1 | 23. | 775827518064 29 Oct 09 Violence 24.58356 1 0 | |----------------------------------------------------------------------------| 24. | 775827518064 01 Jan 09 Entry 24.58356 1 0 | 25. | 775827518064 31 Dec 10 Release 24.58356 1 0 |

There are 28,463 unique prisoners in the data and 366,477 observations (rows). There were 3440 violent events. Some prisoners have repeated risk for violence, but for now we are only including their first event. If we can get it to work then we might include subsequent events.

Here is the stset command and output:

Code:

stset timevar, failure(event_type==4) id(IDvar) origin(event_type ==1) enter(event_type ==5) exit(event_type ==6) id: IDvar failure event: event_type == 4 obs. time interval: (timevar[_n-1], timevar] enter on or after: event_type ==5 exit on or before: event_type ==6 t for analysis: (time-origin) origin: event_type ==1 ---------------------------------------------------------------------- 366477 total obs. 37241 multiple records at same instant PROBABLE ERROR (timevar[_n-1]==timevar) 20475 obs. end on or before enter() 34 obs. begin on or after exit ---------------------------------------------------------------------- 308727 obs. remaining, representing 28460 subjects 3440 failures in multiple failure-per-subject data 7862465 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 12710

Here are two cox regressions. The first one has two time-constant variables, age and offense type (as shown in data above). The second one adds the visit_dummy variable shown in the example data above (=1 when there is a visit, 0 otherwise).

Code:

. stcox newage newviolent failure _d: fromdataset == 4 analysis time _t: (timevar-origin) origin: fromdataset==1 enter on or after: fromdataset==5 exit on or before: fromdataset==6 id: IDNT_MAASAR Iteration 0: log likelihood = -30421.854 Iteration 1: log likelihood = -30084.938 Iteration 2: log likelihood = -30077.562 Iteration 3: log likelihood = -30077.553 Refining estimates: Iteration 0: log likelihood = -30077.553 Cox regression -- Breslow method for ties No. of subjects = 28429 Number of obs = 308477 No. of failures = 3440 Time at risk = 7857475 LR chi2(2) = 688.60 Log likelihood = -30077.553 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------------+---------------------------------------------------------------- newage | .9610433 .0017798 -21.46 0.000 .9575613 .964538 newviolent_offenxe | 1.462947 .051784 10.75 0.000 1.364893 1.568046 ------------------------------------------------------------------------------------

So far, so good. Higher age associated with lower risk of violence, violent crime conviction associated with higher risk of violence.

Now here is the same model with the time-varying, frequently repeated visit dummy:

Code:

. stcox newage newviolent newvisit failure _d: fromdataset == 4 analysis time _t: (timevar-origin) origin: fromdataset==1 enter on or after: fromdataset==5 exit on or before: fromdataset==6 id: IDNT_MAASAR Iteration 0: log likelihood = -30421.854 Iteration 1: log likelihood = -25230.114 Iteration 2: log likelihood = -25092.321 Iteration 3: log likelihood = -25049.136 Iteration 4: log likelihood = -25033.56 Iteration 5: log likelihood = -25027.871 Iteration 6: log likelihood = -25025.783 Iteration 7: log likelihood = -25025.015 Iteration 8: log likelihood = -25024.733 Iteration 9: log likelihood = -25024.629 Iteration 10: log likelihood = -25024.591 Iteration 11: log likelihood = -25024.577 Iteration 12: log likelihood = -25024.572 Iteration 13: log likelihood = -25024.57 Iteration 14: log likelihood = -25024.569 . . . Iteration 50: log likelihood = -25024.569 Refining estimates: Iteration 0: log likelihood = -25024.569 Cox regression -- Breslow method for ties No. of subjects = 28429 Number of obs = 308477 No. of failures = 3440 Time at risk = 7857475 LR chi2(2) = 10794.57 Log likelihood = -25024.569 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------------+---------------------------------------------------------------- newage | .9489049 .0018013 -27.63 0.000 .9453809 .9524419 newviolent_offenxe | 1.910263 .0677006 18.26 0.000 1.782076 2.047671 newvisit | 7.90e-24 . . . . . ------------------------------------------------------------------------------------

The hazard ratio coefficient for visit_dummy is tiny, but more problematically it doesn’t compute standard errors. Is it possible that there are simply too many visits for it to deal with? Or too many ties (as indicated by the “PROBABLY ERROR” warning I got in the stset command?).

I have looked everywhere for resources on how to deal with this. It seems that time-varying dummy variables are not frequently used in survival models. I have seen suggestions that data with time-varying variables should be separated into separate observations for each status change. But having a visit doesn't really represent a lasting change in status, so this doesn't seem relevant in this case. Also given the number of visits, doing so would make the dataset very large. I would essentially have to make the data a prisoner-day dataset (or I might get by with prisoner-week). But I thought that STATA’s flexibility with dealing with stdata would obviate this necessity. Or maybe the modeling strategy is wrong and I don't even need a survival model?.

I would have thought that this would be a fairly common modeling issue in medical studies, since it is analogous to patients being treated, sometimes repeatedly, with some medication at particular intervals (except in my case it is visits rather than medications). But I couldn’t find anything.

-----------------------------------------------

HERE ARE MY REPLIES TO YOUR COMMENTS:

Originally posted by Marcos Almeida View Post

There is an interesting article from the Stata Journal on categorical time-varying covariates .

Hi Marcos, thank you. I have read that article and actually alluded to the solution it gives in my query. The example given there is a surgery treatment variable, where you divide the observations by duplicating them with one as a 'pre-treatment' observation and the other as a 'post-treatment' status. This is what I referred to as a 'status change' in my message -- but I should have made myself clearer.

I am not sure this approach is appropriate for a number of reasons. To begin with, many prisoners have multiple visits. In that case, once they have a visit, then their status is post-visit and any subsequent visits would not be accounted for. I guess I could divide them up into weeks and months and return them to a 'pre-visit' status in weeks or months when they don't have a visit. But this assumes that I have some knowledge of how long the 'treatment' of a visit is supposed to have an affect. But how do I know that they can be returned to pre-visit status after a week or a month? Also, doing it that way would make the dataset much larger, though I'm willing if necessary. It just seems like a cumbersome solution that makes problematic assumptions -- I was hoping for something more elegant.

In principle I guess I could count each additional visit as a new type of status. Maybe that's how to deal with it? But in that case wouldn't I need a dummy variable for each new visit (which would end up being 121 dummy variables)...?

Originally posted by Steve Samuels View Post

See FAQ Section 12 http://www.statalist.org/forums/help. In the meantime, you look at: http://hsphsun3.harvard.edu/cgi-bin/...ticle-805.html

Thanks for the code link. I could not open the second link you pasted.

Originally posted by Carlo Lazzaro View Post

have you tried to let Stata calculate the categorical variable for visit (i.e.: i.visit instead of visit_dummy) and run Cox regression with this predictor?

So, given the way the data are structured, I'm not sure how I would do this. The data are not structured horizontally (single-record data), so there is no variable for 'visit' that stata can calculate. Instead, the data are multiple record, so essentially each event has its own observation and there is a variable with dates and a variable denoting what event occurred on that date, and they are tied together by the ID number. This is following examples #6-#10 in the STATA Survival Analysis Manual on pages 474-479, with our data being most parallel to the data given in example #10. In the example in the manual, they have a variable 'date' which is like our 'timevar' and a variable 'code' which is like out 'event_type'.

So I would need STATA to calculate a categorical variable out of the 'event_type' variable where I guess it would be equal to 1 when the event is a violent event and 0 otherwise. But that's what our violent event dummy variable does. But maybe I'm missing something?

Thank you everyone for your input. I hope with the nicer formatting it will be easier for you to see what's going on.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#6

03 Mar 2015, 10:07

Josh:
what follows might well be a late afternoon slip-up, but I would take a look at shared frailty model, Example 10, -stcox- entry, Stata 13.1 .pdf manual.

Kind regards,
Carlo
(Stata 19.0)
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

03 Mar 2015, 16:29

Thanks for reformatting. I apologize for the bad link; I should have checked it after posting. Here it is entered as plain text, without using the forum editor link feature.

http://hsphsun3.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1310/Author/article-805.html

Unfortunately it doesn't address your question, now that I've read it.

Last edited by Steve Samuels; 03 Mar 2015, 16:41.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Josh Guetzkow

Join Date: Mar 2015

Posts: 5
#8

04 Mar 2015, 06:08

Originally posted by Carlo Lazzaro View Post

what follows might well be a late afternoon slip-up, but I would take a look at shared frailty model Example 10 -stcox- entry Stata 13.1 .pdf manual.

Carlo, thank you. From what I understand, shared frailty models are akin to fixed and random effects models in standard regression. This from Roberto Guteirrez in the Stata Journal 2002, vol. 2, no. 1, p22: "Frailty models are the survival data analog to regression models, which account for heterogeneity and random effects. A frailty is a latent multiplicative effect on the hazard function and is assumed to have unit mean and variance θ, which is estimated along with the other model parameters. A frailty model is an heterogeneity model where the frailties are assumed to be individual- or spell-specific. A shared frailty model is a random effects model where the frailties are common (or shared) among groups of individuals or spells and are randomly distributed across groups."

So, I don't think it's relevant in my case.

In the meantime, I have gotten advice from elsewhere to model this as a discrete-time survival model with days as the unit of time. So in this approach, I would indeed transform the data into prisoner-day observations, with dummy variables for each type of event (visit or violence) that is equal to 1 only on the day of the event and use logistic regression.

With such data, I could generate different dummy variables corresponding to "had a visit in the last 7 days" or "had a visit in the last 30 days" and see if that affects risk of violence. Or I wonder if I could simply have a count variable that starts at zero and increases by 1 every day until they have a visit, at which point it resets to 0. This would act as a simple continuous "days since last visit (or entry into prison)" variable. Though one problem with that is it would confound exposure time in prison to time since last visit, especially for people who don't have a visit. I will have to give it more thought.

If you have any other ideas, I'd be happy to hear. Thanks again!!

Originally posted by Steven Samuels View Post

Unfortunately it doesn't address your question, now that I've read it.

Too bad. Thanks for your efforts!! (BTW the link still wouldn't open for me... but never mind).
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#9

04 Mar 2015, 06:54

(Sigh): try this one: http://www.stata.com/statalist/archi.../msg00323.html

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

#10

04 Mar 2015, 07:26

Josh:
what follows may be quite unuseful, but I would take a look at it as a source of further inspiration, at least:

Code:

use http://www.stata-press.com/data/r13/catheter, clear
save "C:\Users\user\Desktop\Josh.dta"
g visit=1 in 1/40
replace visit=0 in 41/50
replace visit=2 if visit==.
rename infect crime
stset time, failure(crime) scale(1)
rename patient inmate
stcox age female visit , shared( inmate )

Kind regards,
Carlo
(Stata 19.0)

Comment

Steve Samuels

Join Date: Mar 2014
Posts: 1786

#11

04 Mar 2015, 07:40

Your visit indicator isn't doing the job. It just indicates that a visit took place on a date t. What is crucial is that you must define the time varying covariates at other times of follow-up. Below I create three new "experience" variables at each time: 1) anyv indicates whether there had been any visits up to that time; 2) nvisits, the count of visits up to that time; 3) tfrom_v is the time from the most recent visit, if any. For stcox it's only necessary to define these at all times of failure in the data. This is done with stsplit. In the example, I give visit dates a code of 7.

Code:

clear
input id    time  event
1     0        1
1     12       7
1     30       7
1     46       4
2     0        1
2     5        7
2     40       0
3     0        1
3     30       4
4     0        1
4     20       7
4     29       4
5     0        1
5     5        4
end

/* Create indicators */
gen visit = event==7
gen fail  = event == 4

/* Generate visit experience variables at time t */
bys id (time): gen nvisits = sum(visit)

gen anyv = nvisits>=1

gen tv = time if visit
bys id (time): gen tfrom_v = time-tv[_n-1]
label var tfrom_v "Time from most recent visit"
drop tv  // not needed further

stset time, fail(fail) id(id)

/* Get values for these variables at all failure times */
stsplit, at(failures)
sort id time
list, sepby(id)

stcox anyv, hr nolog


Log likelihood  =   -3.3620702                     Prob > chi2     =    0.2262

------------------------------------------------------------------------------
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        anyv |   .2372227   .2931778    -1.16   0.244     .0210463    2.673843
------------------------------------------------------------------------------

Last edited by Steve Samuels; 04 Mar 2015, 07:43.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#12

05 Mar 2015, 09:01

I have to apologize. I left out a crucial snapspan statement in my example. The truth is that I started doing Cox analyses long before I took up Stata, and I usually create data sets by hand: generate time-dependent values at all followup times; merge with the failure times; and keep only the merged values. The process sometimes generates millions of observations, but this has never caused problems.

I discovered the error by printing out the risk sets generated by stsplit, something I should have done earlier. Risk sets are a crucial concept for Cox models; there is one per failure time and it consists of all individuals at risk at that failure time . stcox compares covariate values for failure(s) and non-failures in each risk set.

Here is the corrected do file. I've changed the example a bit and added a covariate "x", which is just the id times 100. The Manual entry for snapspan is a little murky. One must list variables whose values at a given time, apply up through that time. This includes the failure event (which ends the time interval) and baseline variables. All omitted variables apply to times after they occur: these are the time-varying covariates.

Code:

clear input id time event x 1 0 1 100 1 12 7 100 1 15 7 100 1 20 4 100 2 0 1 200 2 4 7 200 2 40 0 200 3 0 1 300 3 30 6 300 4 0 1 400 4 20 7 400 4 29 4 400 5 0 1 400 5 5 4 500 end list , sepby(id) gen visit = event==7 gen fail = event == 4 /* Create indicators */ /* Generate visit experience variables at time t */ bys id (time): gen nvisits = sum(visit) gen anyv = nvisits>=1 gen tv = time if visit bys id (time): gen tfrom_v = time-tv[_n-1] label var tfrom_v "Time from most recent visit" drop tv snapspan id time fail x, replace gen(t0) drop if t0==. order id t0 time list, sepby(id) stset time, fail(fail) id(id) stsplit, at(failures) riskset(riskset) order riskset id sort riskset id list riskset time id x nvisits anyv tfrom_v fail if riskset!=., sepby(riskset) stcox anyv stcox nvisits

Last edited by Steve Samuels; 05 Mar 2015, 09:32.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Josh Guetzkow

Join Date: Mar 2015

Posts: 5
#13

05 Mar 2015, 09:37

Originally posted by Steve Samuels View Post

I have to apologize.

Steve, really, apologize? You've been really extremely super helpful, and I can't thank you enough for taking the time to write this up and spell it all out. It will probably take me some time to absorb and implement. But again, thank you!! Oh, and your link finally worked!

Originally posted by Carlo Lazzaro View Post

what follows may be quite unuseful, but I would take a look at it as a source of further inspiration, at least

Thank you for suggesting this exercise -- I'm sure it will be helpful.

Good day to you both!
Cheers,
Josh
Comment

Announcement