Greetings Statalist,
I'm a long-time reader and first-time writer. Love this resource as a complement to the great stuff in Stata and elsewhere on the web.
I have a question regarding stcox on v15.1. To simplify my question I'm using the drugtr2 and change of the factor variables to a simple binary variable.
My first question, Is it appropriate to use binary variables as time-varying coefficients? If so, how does the interaction with time--e.g. (_t)--change the interpretation of the binary variable?
I use the tvc() option to fit the data. I run two models with the same interaction. The first is hardcoded (interact = d1*age); the second uses the familiar # feature (i.d1#c.age); the third uses ## (i.d1##c.age).
My second question, is it appropriate to use interactions with time-varying coefficients? If so, should the interaction be with the time-invariant or time-variant binary variable?
I stsplit the data following Stata's helpful documentation.
But when I call the model I get an error message. This makes sense after seeing what happens to the d1 when interacted with time.
My final question then, which model is the best representation of a time-varying interaction? Any of the two that replicate the tvc() output above, or one of the two that doesn't?
I look forward to your helpful guidance on this matter.
Kindly,
Anthony
I'm a long-time reader and first-time writer. Love this resource as a complement to the great stuff in Stata and elsewhere on the web.
I have a question regarding stcox on v15.1. To simplify my question I'm using the drugtr2 and change of the factor variables to a simple binary variable.
Code:
webuse drugtr2, clear //setup stset time, failure(cured) //declare survival-time data gen d1=0 //create binary drug1 variable replace d1=1 if drug1>0
My first question, Is it appropriate to use binary variables as time-varying coefficients? If so, how does the interaction with time--e.g. (_t)--change the interpretation of the binary variable?
I use the tvc() option to fit the data. I run two models with the same interaction. The first is hardcoded (interact = d1*age); the second uses the familiar # feature (i.d1#c.age); the third uses ## (i.d1##c.age).
Code:
gen interact = d1*age //create interaction terms *Call the model using tvc() option stcox, estimate tvc(i.d1 c.age c.interact) nohr estimates store hard_tvc stcox, estimate tvc(i.d1 c.age i.d1#c.age) nohr estimates store hash_tvc stcox, estimate tvc(i.d1##c.age) nohr estimates store dblhash_tvc ------------------------------------------------------------ (1) (2) (3) Hard_tvc() #_tvc() ##_tvc() ------------------------------------------------------------ tvc 0.d1 0.000 0.000 0.000 (.) (.) (.) 1.d1 -0.165 -0.165 -0.165 (0.144) (0.144) (0.144) age -0.017*** -0.017*** -0.017*** (0.004) (0.004) (0.004) interact 0.008 (0.005) 1.d1#c.age 0.008 0.008 (0.005) (0.005) ------------------------------------------------------------ N 857 857 857 AIC 205.984 205.984 205.984 BIC 220.244 220.244 220.244 df_m 3.000 3.000 3.000 ------------------------------------------------------------ Standard errors in parentheses Model 1: stcox, estimate tvc(i.d1 c.age c.interact) nohr Model 2: stcox, estimate tvc(i.d1 c.age i.d1#c.age) nohr Model 3: stcox, estimate tvc(i.d1##c.age) nohr + p<0.10, * p<0.05, ** p<0.01, *** p<0.001
My second question, is it appropriate to use interactions with time-varying coefficients? If so, should the interaction be with the time-invariant or time-variant binary variable?
I stsplit the data following Stata's helpful documentation.
Code:
//now split the data following example in: 'help tvc_note' *Step 1 generate id = _n //generate subject identifier streset, id(id) *Step 2 stsplit, at(failures) //split data at failure times *Step 3 generate age_tvc = age*(_t) //generate variable-time interactions generate d1_tvc = d1*(_t) generate drug2_tvc = drug2*(_t) generate int_tvc = interact*(_t)
Code:
*Call the model using stsplit data and variable-time interacations . stcox i.d1_tvc c.age_tvc c.int_tvc, nohr failure _d: cured analysis time _t: time id: id d1_tvc: factor variables may not contain noninteger values tab d1_tvc
My final question then, which model is the best representation of a time-varying interaction? Any of the two that replicate the tvc() output above, or one of the two that doesn't?
Code:
*Call the model using stsplit data and variable-time interacations stcox c.d1_tvc c.age_tvc c.int_tvc, nohr estimates store hard_split stcox c.d1_tvc c.age_tvc i.d1#c.age_tvc, nohr estimates store hash_split stcox i.d1##c.age_tvc, nohr estimates store NOd1_tvc stcox c.d1_tvc i.d1##c.age_tvc, nohr estimates store dblhash_split ---------------------------------------------------------------------------- (1) (2) (3) (4) Hard_split #_stsplit NO d1_tv ##_stsplit ---------------------------------------------------------------------------- d1_tvc -0.165 -0.165 -0.465* (0.144) (0.144) (0.211) age_tvc -0.017*** -0.017*** -0.013*** -0.020*** (0.004) (0.004) (0.003) (0.005) int_tvc 0.008 (0.005) 1.d1#c.age~c 0.008 0.000 0.012* (0.005) (0.003) (0.006) 0.d1 0.000 0.000 (.) (.) 1.d1 0.698 2.236* (0.798) (1.049) ---------------------------------------------------------------------------- N 857 857 857 857 AIC 205.984 205.984 206.503 203.201 BIC 220.244 220.244 220.763 222.215 df_m 3.000 3.000 3.000 4.000 ---------------------------------------------------------------------------- Standard errors in parentheses Model 1: stcox c.d1_tvc c.age_tvc c.int_tvc, nohr Model 2: stcox c.d1_tvc c.age_tvc i.d1#c.age_tvc, nohr Model 3: stcox i.d1##c.age_tvc, nohr Model 4: stcox c.d1_tvc i.d1##c.age_tvc, nohr + p<0.10, * p<0.05, ** p<0.01, *** p<0.001
Kindly,
Anthony
Comment