Greetings Statalist,
I'm a long-time reader and first-time writer. Love this resource as a complement to the great stuff in Stata and elsewhere on the web.
I have a question regarding stcox on v15.1. To simplify my question I'm using the drugtr2 and change of the factor variables to a simple binary variable.
My first question, Is it appropriate to use binary variables as time-varying coefficients? If so, how does the interaction with time--e.g. (_t)--change the interpretation of the binary variable?
I use the tvc() option to fit the data. I run two models with the same interaction. The first is hardcoded (interact = d1*age); the second uses the familiar # feature (i.d1#c.age); the third uses ## (i.d1##c.age).
My second question, is it appropriate to use interactions with time-varying coefficients? If so, should the interaction be with the time-invariant or time-variant binary variable?
I stsplit the data following Stata's helpful documentation.
But when I call the model I get an error message. This makes sense after seeing what happens to the d1 when interacted with time.
My final question then, which model is the best representation of a time-varying interaction? Any of the two that replicate the tvc() output above, or one of the two that doesn't?
I look forward to your helpful guidance on this matter.
Kindly,
Anthony
I'm a long-time reader and first-time writer. Love this resource as a complement to the great stuff in Stata and elsewhere on the web.
I have a question regarding stcox on v15.1. To simplify my question I'm using the drugtr2 and change of the factor variables to a simple binary variable.
Code:
webuse drugtr2, clear //setup
stset time, failure(cured) //declare survival-time data
gen d1=0 //create binary drug1 variable
replace d1=1 if drug1>0
My first question, Is it appropriate to use binary variables as time-varying coefficients? If so, how does the interaction with time--e.g. (_t)--change the interpretation of the binary variable?
I use the tvc() option to fit the data. I run two models with the same interaction. The first is hardcoded (interact = d1*age); the second uses the familiar # feature (i.d1#c.age); the third uses ## (i.d1##c.age).
Code:
gen interact = d1*age //create interaction terms
*Call the model using tvc() option
stcox, estimate tvc(i.d1 c.age c.interact) nohr
estimates store hard_tvc
stcox, estimate tvc(i.d1 c.age i.d1#c.age) nohr
estimates store hash_tvc
stcox, estimate tvc(i.d1##c.age) nohr
estimates store dblhash_tvc
------------------------------------------------------------
(1) (2) (3)
Hard_tvc() #_tvc() ##_tvc()
------------------------------------------------------------
tvc
0.d1 0.000 0.000 0.000
(.) (.) (.)
1.d1 -0.165 -0.165 -0.165
(0.144) (0.144) (0.144)
age -0.017*** -0.017*** -0.017***
(0.004) (0.004) (0.004)
interact 0.008
(0.005)
1.d1#c.age 0.008 0.008
(0.005) (0.005)
------------------------------------------------------------
N 857 857 857
AIC 205.984 205.984 205.984
BIC 220.244 220.244 220.244
df_m 3.000 3.000 3.000
------------------------------------------------------------
Standard errors in parentheses
Model 1: stcox, estimate tvc(i.d1 c.age c.interact) nohr
Model 2: stcox, estimate tvc(i.d1 c.age i.d1#c.age) nohr
Model 3: stcox, estimate tvc(i.d1##c.age) nohr
+ p<0.10, * p<0.05, ** p<0.01, *** p<0.001
My second question, is it appropriate to use interactions with time-varying coefficients? If so, should the interaction be with the time-invariant or time-variant binary variable?
I stsplit the data following Stata's helpful documentation.
Code:
//now split the data following example in: 'help tvc_note' *Step 1 generate id = _n //generate subject identifier streset, id(id) *Step 2 stsplit, at(failures) //split data at failure times *Step 3 generate age_tvc = age*(_t) //generate variable-time interactions generate d1_tvc = d1*(_t) generate drug2_tvc = drug2*(_t) generate int_tvc = interact*(_t)
Code:
*Call the model using stsplit data and variable-time interacations
. stcox i.d1_tvc c.age_tvc c.int_tvc, nohr
failure _d: cured
analysis time _t: time
id: id
d1_tvc: factor variables may not contain noninteger values
tab d1_tvc
My final question then, which model is the best representation of a time-varying interaction? Any of the two that replicate the tvc() output above, or one of the two that doesn't?
Code:
*Call the model using stsplit data and variable-time interacations
stcox c.d1_tvc c.age_tvc c.int_tvc, nohr
estimates store hard_split
stcox c.d1_tvc c.age_tvc i.d1#c.age_tvc, nohr
estimates store hash_split
stcox i.d1##c.age_tvc, nohr
estimates store NOd1_tvc
stcox c.d1_tvc i.d1##c.age_tvc, nohr
estimates store dblhash_split
----------------------------------------------------------------------------
(1) (2) (3) (4)
Hard_split #_stsplit NO d1_tv ##_stsplit
----------------------------------------------------------------------------
d1_tvc -0.165 -0.165 -0.465*
(0.144) (0.144) (0.211)
age_tvc -0.017*** -0.017*** -0.013*** -0.020***
(0.004) (0.004) (0.003) (0.005)
int_tvc 0.008
(0.005)
1.d1#c.age~c 0.008 0.000 0.012*
(0.005) (0.003) (0.006)
0.d1 0.000 0.000
(.) (.)
1.d1 0.698 2.236*
(0.798) (1.049)
----------------------------------------------------------------------------
N 857 857 857 857
AIC 205.984 205.984 206.503 203.201
BIC 220.244 220.244 220.763 222.215
df_m 3.000 3.000 3.000 4.000
----------------------------------------------------------------------------
Standard errors in parentheses
Model 1: stcox c.d1_tvc c.age_tvc c.int_tvc, nohr
Model 2: stcox c.d1_tvc c.age_tvc i.d1#c.age_tvc, nohr
Model 3: stcox i.d1##c.age_tvc, nohr
Model 4: stcox c.d1_tvc i.d1##c.age_tvc, nohr
+ p<0.10, * p<0.05, ** p<0.01, *** p<0.001
Kindly,
Anthony

Comment