stcox & time-varying interactions

Anthony J. DeMattee

Join Date: Jun 2018
Posts: 7

stcox & time-varying interactions

26 Jun 2018, 10:34

Greetings Statalist,

I'm a long-time reader and first-time writer. Love this resource as a complement to the great stuff in Stata and elsewhere on the web.

I have a question regarding stcox on v15.1. To simplify my question I'm using the drugtr2 and change of the factor variables to a simple binary variable.

Code:

webuse drugtr2, clear        //setup
stset time, failure(cured)    //declare survival-time data

gen d1=0                     //create binary drug1 variable
    replace d1=1 if drug1>0

My first question, Is it appropriate to use binary variables as time-varying coefficients? If so, how does the interaction with time--e.g. (_t)--change the interpretation of the binary variable?

I use the tvc() option to fit the data. I run two models with the same interaction. The first is hardcoded (interact = d1*age); the second uses the familiar # feature (i.d1#c.age); the third uses ## (i.d1##c.age).

Code:

gen interact = d1*age         //create interaction terms

*Call the model using tvc() option
stcox, estimate tvc(i.d1 c.age c.interact) nohr
estimates store hard_tvc

stcox, estimate tvc(i.d1 c.age i.d1#c.age) nohr
estimates store hash_tvc

stcox, estimate tvc(i.d1##c.age) nohr
estimates store dblhash_tvc

------------------------------------------------------------
                      (1)             (2)             (3)  
               Hard_tvc()         #_tvc()        ##_tvc()  
------------------------------------------------------------
tvc                                                        
0.d1                0.000           0.000           0.000  
                      (.)             (.)             (.)  

1.d1               -0.165          -0.165          -0.165  
                  (0.144)         (0.144)         (0.144)  

age                -0.017***       -0.017***       -0.017***
                  (0.004)         (0.004)         (0.004)  

interact            0.008                                  
                  (0.005)                                  

1.d1#c.age                          0.008           0.008  
                                  (0.005)         (0.005)  
------------------------------------------------------------
N                     857             857             857  
AIC               205.984         205.984         205.984  
BIC               220.244         220.244         220.244  
df_m                3.000           3.000           3.000  
------------------------------------------------------------
Standard errors in parentheses
Model 1: stcox, estimate tvc(i.d1 c.age c.interact) nohr
Model 2: stcox, estimate tvc(i.d1 c.age i.d1#c.age) nohr
Model 3: stcox, estimate tvc(i.d1##c.age) nohr
+ p<0.10, * p<0.05, ** p<0.01, *** p<0.001

My second question, is it appropriate to use interactions with time-varying coefficients? If so, should the interaction be with the time-invariant or time-variant binary variable?

I stsplit the data following Stata's helpful documentation.

Code:

//now split the data following example in: 'help tvc_note'
*Step 1
generate id = _n            //generate subject identifier
streset, id(id)                
*Step 2
stsplit, at(failures)        //split data at failure times
*Step 3
generate age_tvc = age*(_t)    //generate variable-time interactions
generate d1_tvc = d1*(_t)
generate drug2_tvc = drug2*(_t)
generate int_tvc = interact*(_t)

But when I call the model I get an error message. This makes sense after seeing what happens to the d1 when interacted with time.

Code:

*Call the model using stsplit data and variable-time interacations
. stcox i.d1_tvc c.age_tvc c.int_tvc, nohr

         failure _d:  cured
   analysis time _t:  time
                 id:  id
d1_tvc:  factor variables may not contain noninteger values

tab d1_tvc

My final question then, which model is the best representation of a time-varying interaction? Any of the two that replicate the tvc() output above, or one of the two that doesn't?

Code:

*Call the model using stsplit data and variable-time interacations
stcox c.d1_tvc c.age_tvc c.int_tvc, nohr
estimates store hard_split

stcox c.d1_tvc c.age_tvc i.d1#c.age_tvc, nohr
estimates store hash_split

stcox i.d1##c.age_tvc, nohr
estimates store NOd1_tvc

stcox c.d1_tvc i.d1##c.age_tvc, nohr
estimates store dblhash_split

----------------------------------------------------------------------------
                      (1)             (2)             (3)             (4)  
               Hard_split       #_stsplit        NO d1_tv      ##_stsplit  
----------------------------------------------------------------------------
d1_tvc             -0.165          -0.165                          -0.465*  
                  (0.144)         (0.144)                         (0.211)  

age_tvc            -0.017***       -0.017***       -0.013***       -0.020***
                  (0.004)         (0.004)         (0.003)         (0.005)  

int_tvc             0.008                                                  
                  (0.005)                                                  

1.d1#c.age~c                        0.008           0.000           0.012*  
                                  (0.005)         (0.003)         (0.006)  

0.d1                                                0.000           0.000  
                                                      (.)             (.)  

1.d1                                                0.698           2.236*  
                                                  (0.798)         (1.049)  
----------------------------------------------------------------------------
N                     857             857             857             857  
AIC               205.984         205.984         206.503         203.201  
BIC               220.244         220.244         220.763         222.215  
df_m                3.000           3.000           3.000           4.000  
----------------------------------------------------------------------------
Standard errors in parentheses
Model 1: stcox c.d1_tvc c.age_tvc c.int_tvc, nohr
Model 2: stcox c.d1_tvc c.age_tvc i.d1#c.age_tvc, nohr
Model 3: stcox i.d1##c.age_tvc, nohr
Model 4: stcox c.d1_tvc i.d1##c.age_tvc, nohr
+ p<0.10, * p<0.05, ** p<0.01, *** p<0.001

I look forward to your helpful guidance on this matter.

Kindly,
Anthony

Attached Files

Statalist2018.06.26.do (1.9 KB, 1 view)

Last edited by Anthony J. DeMattee; 26 Jun 2018, 10:38.

Tags: categorical, interaction, regression, stcox, time-varying coefficients

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

03 Jul 2018, 16:38

To answer your questions:

1. It is appropriate to use binary variables as time-varying covariates. Note however that the covariates Z don't vary with time; rather the interaction variables U = Z * t vary with time. Other time-varying covariates are those found with multiple record data.

2. It is appropriate to use interactions with tvc variables. (These then become 3-way interactions). .

However your model, although legal Stata syntax, violates a fundamental principle of regression analysis: if a model includes interactions, it must also include the corresponding main effects.

I notice that Example 4 for stcox violates this principle, but this is not the first time that a manual example displays bad practice.

To answer your secondary question: You must interact with a main effect first (following the principle) and you can optionally interact with the corresponding tvc variable.

3. Since I don't like any of your models, I'm not going to take a preference.

Last edited by Steve Samuels; 03 Jul 2018, 16:41.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#3

10 Jul 2018, 15:43

I owe Anthony an apology. While the principle I elucidated -to always include main effects before interactions- still stands, it is quite possible that a main effect can be zero and an interaction non-zero. This is a matter that can be tested.

More important, I regret my self-righteous tone. Anthony wasn't trying to show an ideal model, he was just asking a question about different ways of specifying interactions with tvc variables. This is an interesting question in itself.

To answer his last question, I will say that I don't much like any of the last four models, but if I had to choose one it would be model 1. However I think that it's harder to write and read than any of the first three models. To me, it lacks the clarity that comes with using the tvc() option.

Last edited by Steve Samuels; 10 Jul 2018, 16:21.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Anthony J. DeMattee

Join Date: Jun 2018

Posts: 7
#4

11 Jul 2018, 04:09

Dear Steve, Thank you for your replies.

I’m using Stata’s own data (i.e. drugtr2) for accessibility. Based on your responses I feel I should reword and reask my questions.
What would be your preferred way to model and code a two-way interaction in a model that fails the proportional hazard assumption? This is the motivation of my post.

How would you interpret an interaction with variables using time-varying coefficients?
If the tvc() option — the two-way interaction is really a three-way interaction if it uses time-varying coefficients, right?;

If following the the stsplit process (i.e. the long way of doing tvc) — the binary variable is no longer binary after it is interacted with time.

Thank you again for sharing your insights and opinions.

Sincerley,
~ajd
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#5

11 Jul 2018, 17:02

1. My preferred way of dealing with models that fail the PH assumption would be to try the contributed command stpm2
2. In the absence of the time-interaction, the meaning of the two-way interaction (of X & Y, say) is the standard one: The HR for a difference in X varies with the value of Y and vice-versa; Now add in the time-multiplier and the interpretation is that the X-Y interactive effect varies with time. Howevr, if you are going to have the two-way XY interaction vary with time; then you need to have the X-time interaction and Y-time interaction in the model as well. So the model would have to include in the main predictor list X, Y, XY, and then have all those in the tvc list.
3. stsplit and tvc() have completely different aims, so I'm not sure what you are asking here.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment

Ashley Sorensen

Join Date: Jan 2019
Posts: 3

10 Jan 2019, 16:14

Hi,

This thread is very helpful.

Following on Steve Samuels' point #2, I also have two time-varying variables that are interacted with each other (resulting in a three-way interaction).

I think I specified the stpm2 model correctly to have the X, Z and XZ variables included in the tvc() option. But my question is, what is the appropriate way to generate the predicted hazard ratios from this model? Does the predict command properly incorporate the interaction terms into the predicted hazard ratios?

See sample code below.

Code:

//---------------------//
// Load example data
//---------------------//

use "http://www.stata-press.com/data/fpsaus/ew_breast_ch7.dta", clear
stset survtime, failure(dead==1) exit(time 5) id(ident)

//---------------------//
// Binary variables
//---------------------//

generate x=0 if dep5==0
replace x=1 if dep5==1

generate z=0 if agediag<40
replace z=1 if agediag>=40

generate x_by_z=x*z

//---------------------//
// How can I incorporate x, z and x*z into the tvc() command, and have the predicted hazard ratios account for this? Is this code correct?
//---------------------//

stpm2 x z x_by_z, scale(hazard) df(3) tvc(x z x_by_z) dftvc(3) eform

predict hr_at0, hrnumerator(x 1 z 0 x_by_z 0) hrdenominator(x 0 z 0 x_by_z 0)
predict hr_at1, hrnumerator(x 1 z 1 x_by_z 1) hrdenominator(x 0 z 1 x_by_z 0)

twoway (line hr_at0 _t, sort) (line hr_at1 _t, sort), yscale(log)

drop hr*

//---------------------//
// But, why does this give the same graphs as a stratified model?
// Here, rather than interaction terms, the models are stratified. Does this mean that the interaction term is not allowed to be time-varying (since it's not included in the tvc() option)?
//---------------------//

stpm2 x if z==0, scale(hazard) df(3) tvc(x) dftvc(3) eform
predict hr_at0, hrnumerator(x 1) hrdenominator(x 0)

stpm2 x if z==1, scale(hazard) df(3) tvc(x) dftvc(3) eform
predict hr_at1, hrnumerator(x 1) hrdenominator(x 0)

twoway (line hr_at0 _t, sort) (line hr_at1 _t, sort), yscale(log)

drop hr*

Announcement

stcox & time-varying interactions

Comment

Comment

Comment

Comment

Comment