Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • stcox & time-varying interactions

    Greetings Statalist,

    I'm a long-time reader and first-time writer. Love this resource as a complement to the great stuff in Stata and elsewhere on the web.

    I have a question regarding stcox on v15.1. To simplify my question I'm using the drugtr2 and change of the factor variables to a simple binary variable.
    Code:
    webuse drugtr2, clear        //setup
    stset time, failure(cured)    //declare survival-time data
    
    gen d1=0                     //create binary drug1 variable
        replace d1=1 if drug1>0


    My first question, Is it appropriate to use binary variables as time-varying coefficients? If so, how does the interaction with time--e.g. (_t)--change the interpretation of the binary variable?

    I use the tvc() option to fit the data. I run two models with the same interaction. The first is hardcoded (interact = d1*age); the second uses the familiar # feature (i.d1#c.age); the third uses ## (i.d1##c.age).

    Code:
    gen interact = d1*age         //create interaction terms
    
    *Call the model using tvc() option
    stcox, estimate tvc(i.d1 c.age c.interact) nohr
    estimates store hard_tvc
    
    stcox, estimate tvc(i.d1 c.age i.d1#c.age) nohr
    estimates store hash_tvc
    
    stcox, estimate tvc(i.d1##c.age) nohr
    estimates store dblhash_tvc
    
    ------------------------------------------------------------
                          (1)             (2)             (3)  
                   Hard_tvc()         #_tvc()        ##_tvc()  
    ------------------------------------------------------------
    tvc                                                        
    0.d1                0.000           0.000           0.000  
                          (.)             (.)             (.)  
    
    1.d1               -0.165          -0.165          -0.165  
                      (0.144)         (0.144)         (0.144)  
    
    age                -0.017***       -0.017***       -0.017***
                      (0.004)         (0.004)         (0.004)  
    
    interact            0.008                                  
                      (0.005)                                  
    
    1.d1#c.age                          0.008           0.008  
                                      (0.005)         (0.005)  
    ------------------------------------------------------------
    N                     857             857             857  
    AIC               205.984         205.984         205.984  
    BIC               220.244         220.244         220.244  
    df_m                3.000           3.000           3.000  
    ------------------------------------------------------------
    Standard errors in parentheses
    Model 1: stcox, estimate tvc(i.d1 c.age c.interact) nohr
    Model 2: stcox, estimate tvc(i.d1 c.age i.d1#c.age) nohr
    Model 3: stcox, estimate tvc(i.d1##c.age) nohr
    + p<0.10, * p<0.05, ** p<0.01, *** p<0.001


    My second question, is it appropriate to use interactions with time-varying coefficients? If so, should the interaction be with the time-invariant or time-variant binary variable?

    I stsplit the data following Stata's helpful documentation.
    Code:
    //now split the data following example in: 'help tvc_note'
    *Step 1
    generate id = _n            //generate subject identifier
    streset, id(id)                
    *Step 2
    stsplit, at(failures)        //split data at failure times
    *Step 3
    generate age_tvc = age*(_t)    //generate variable-time interactions
    generate d1_tvc = d1*(_t)
    generate drug2_tvc = drug2*(_t)
    generate int_tvc = interact*(_t)
    But when I call the model I get an error message. This makes sense after seeing what happens to the d1 when interacted with time.
    Code:
    *Call the model using stsplit data and variable-time interacations
    . stcox i.d1_tvc c.age_tvc c.int_tvc, nohr
    
             failure _d:  cured
       analysis time _t:  time
                     id:  id
    d1_tvc:  factor variables may not contain noninteger values
    
    tab d1_tvc


    My final question then, which model is the best representation of a time-varying interaction? Any of the two that replicate the tvc() output above, or one of the two that doesn't?

    Code:
    *Call the model using stsplit data and variable-time interacations
    stcox c.d1_tvc c.age_tvc c.int_tvc, nohr
    estimates store hard_split
    
    stcox c.d1_tvc c.age_tvc i.d1#c.age_tvc, nohr
    estimates store hash_split
    
    stcox i.d1##c.age_tvc, nohr
    estimates store NOd1_tvc
    
    stcox c.d1_tvc i.d1##c.age_tvc, nohr
    estimates store dblhash_split
    
    ----------------------------------------------------------------------------
                          (1)             (2)             (3)             (4)  
                   Hard_split       #_stsplit        NO d1_tv      ##_stsplit  
    ----------------------------------------------------------------------------
    d1_tvc             -0.165          -0.165                          -0.465*  
                      (0.144)         (0.144)                         (0.211)  
    
    age_tvc            -0.017***       -0.017***       -0.013***       -0.020***
                      (0.004)         (0.004)         (0.003)         (0.005)  
    
    int_tvc             0.008                                                  
                      (0.005)                                                  
    
    1.d1#c.age~c                        0.008           0.000           0.012*  
                                      (0.005)         (0.003)         (0.006)  
    
    0.d1                                                0.000           0.000  
                                                          (.)             (.)  
    
    1.d1                                                0.698           2.236*  
                                                      (0.798)         (1.049)  
    ----------------------------------------------------------------------------
    N                     857             857             857             857  
    AIC               205.984         205.984         206.503         203.201  
    BIC               220.244         220.244         220.763         222.215  
    df_m                3.000           3.000           3.000           4.000  
    ----------------------------------------------------------------------------
    Standard errors in parentheses
    Model 1: stcox c.d1_tvc c.age_tvc c.int_tvc, nohr
    Model 2: stcox c.d1_tvc c.age_tvc i.d1#c.age_tvc, nohr
    Model 3: stcox i.d1##c.age_tvc, nohr
    Model 4: stcox c.d1_tvc i.d1##c.age_tvc, nohr
    + p<0.10, * p<0.05, ** p<0.01, *** p<0.001
    I look forward to your helpful guidance on this matter.

    Kindly,
    Anthony
    Attached Files
    Last edited by Anthony J. DeMattee; 26 Jun 2018, 10:38.

  • #2
    To answer your questions:

    1. It is appropriate to use binary variables as time-varying covariates. Note however that the covariates Z don't vary with time; rather the interaction variables U = Z * t vary with time. Other time-varying covariates are those found with multiple record data.

    2. It is appropriate to use interactions with tvc variables. (These then become 3-way interactions). .

    However your model, although legal Stata syntax, violates a fundamental principle of regression analysis: if a model includes interactions, it must also include the corresponding main effects.

    I notice that Example 4 for stcox violates this principle, but this is not the first time that a manual example displays bad practice.

    To answer your secondary question: You must interact with a main effect first (following the principle) and you can optionally interact with the corresponding tvc variable.

    3. Since I don't like any of your models, I'm not going to take a preference.
    Last edited by Steve Samuels; 03 Jul 2018, 16:41.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      I owe Anthony an apology. While the principle I elucidated -to always include main effects before interactions- still stands, it is quite possible that a main effect can be zero and an interaction non-zero. This is a matter that can be tested.

      More important, I regret my self-righteous tone. Anthony wasn't trying to show an ideal model, he was just asking a question about different ways of specifying interactions with tvc variables. This is an interesting question in itself.

      To answer his last question, I will say that I don't much like any of the last four models, but if I had to choose one it would be model 1. However I think that it's harder to write and read than any of the first three models. To me, it lacks the clarity that comes with using the tvc() option.
      Last edited by Steve Samuels; 10 Jul 2018, 16:21.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Dear Steve, Thank you for your replies.

        I’m using Stata’s own data (i.e. drugtr2) for accessibility. Based on your responses I feel I should reword and reask my questions.
        • What would be your preferred way to model and code a two-way interaction in a model that fails the proportional hazard assumption? This is the motivation of my post.
        • How would you interpret an interaction with variables using time-varying coefficients?
          • If the tvc() option — the two-way interaction is really a three-way interaction if it uses time-varying coefficients, right?;
          • If following the the stsplit process (i.e. the long way of doing tvc) — the binary variable is no longer binary after it is interacted with time.
        Thank you again for sharing your insights and opinions.

        Sincerley,
        ~ajd

        Comment


        • #5
          1. My preferred way of dealing with models that fail the PH assumption would be to try the contributed command stpm2
          2. In the absence of the time-interaction, the meaning of the two-way interaction (of X & Y, say) is the standard one: The HR for a difference in X varies with the value of Y and vice-versa; Now add in the time-multiplier and the interpretation is that the X-Y interactive effect varies with time. Howevr, if you are going to have the two-way XY interaction vary with time; then you need to have the X-time interaction and Y-time interaction in the model as well. So the model would have to include in the main predictor list X, Y, XY, and then have all those in the tvc list.
          3. stsplit and tvc() have completely different aims, so I'm not sure what you are asking here.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #6
            Hi,

            This thread is very helpful.

            Following on Steve Samuels' point #2, I also have two time-varying variables that are interacted with each other (resulting in a three-way interaction).

            I think I specified the stpm2 model correctly to have the X, Z and XZ variables included in the tvc() option. But my question is, what is the appropriate way to generate the predicted hazard ratios from this model? Does the predict command properly incorporate the interaction terms into the predicted hazard ratios?

            See sample code below.

            Code:
            //---------------------//
            // Load example data
            //---------------------//
            
            use "http://www.stata-press.com/data/fpsaus/ew_breast_ch7.dta", clear
            stset survtime, failure(dead==1) exit(time 5) id(ident)
            
            //---------------------//
            // Binary variables
            //---------------------//
            
            generate x=0 if dep5==0
            replace x=1 if dep5==1
            
            generate z=0 if agediag<40
            replace z=1 if agediag>=40
            
            generate x_by_z=x*z
            
            //---------------------//
            // How can I incorporate x, z and x*z into the tvc() command, and have the predicted hazard ratios account for this? Is this code correct?
            //---------------------//
            
            stpm2 x z x_by_z, scale(hazard) df(3) tvc(x z x_by_z) dftvc(3) eform
            
            predict hr_at0, hrnumerator(x 1 z 0 x_by_z 0) hrdenominator(x 0 z 0 x_by_z 0)
            predict hr_at1, hrnumerator(x 1 z 1 x_by_z 1) hrdenominator(x 0 z 1 x_by_z 0)
            
            twoway (line hr_at0 _t, sort) (line hr_at1 _t, sort), yscale(log)
            
            drop hr*
            
            //---------------------//
            // But, why does this give the same graphs as a stratified model?
            // Here, rather than interaction terms, the models are stratified. Does this mean that the interaction term is not allowed to be time-varying (since it's not included in the tvc() option)?
            //---------------------//
            
            stpm2 x if z==0, scale(hazard) df(3) tvc(x) dftvc(3) eform
            predict hr_at0, hrnumerator(x 1) hrdenominator(x 0)
            
            stpm2 x if z==1, scale(hazard) df(3) tvc(x) dftvc(3) eform
            predict hr_at1, hrnumerator(x 1) hrdenominator(x 0)
            
            twoway (line hr_at0 _t, sort) (line hr_at1 _t, sort), yscale(log)
            
            drop hr*

            Comment

            Working...
            X