lincom - code works but how does Stata know that I want the HR for only one categorical variable

Denise Vella

Join Date: Aug 2022
Posts: 187

lincom - code works but how does Stata know that I want the HR for only one categorical variable

21 Jul 2023, 05:04

Question:
I following this book learning on survival analysis: https://link.springer.com/book/10.10...-1-4419-6646-9

My question is about the code which was given by the author. Data found below.

Clinic is a categorical variable of 1 or 2. prison is a categorical variable of 0 or 1.

Code:

stcox prison dose clin_pr clin_do, strata(clinic).  //this gives the HR for a combined clinic 

//The book suggests the following to get an output for the HR for prison 1 VS PRISON 0 for CLinic == 2 

//My question: 
*********How does STATA know that it needs to output the HR for clinic ==2 in this code given below

lincom prison+2*clin_pr, hr


*The interaction terms were defined previously as: 
*generate interaction terms 
gen clin_pr=clinic*prison
gen clin_do=clinic*dose

The HR is then the same as (here I understand where the output of HR is presented only for clinic ==2 , but I don't understand why the result given in lincom is equivalent for clinic ==2 .
stcox prison dose if clinic ==2

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double(id clinic status survt prison dose) byte(_st _d) int _t byte _t0 float(clin_pr clin_do) byte(_est_A _est_LRTEST_0)
 1 1 1 428 0 50 1 1 428 0 0 50 1 1
 2 1 1 275 1 55 1 1 275 0 1 55 1 1
 3 1 1 262 0 55 1 1 262 0 0 55 1 1
 4 1 1 183 0 30 1 1 183 0 0 30 1 1
 5 1 1 259 1 65 1 1 259 0 1 65 1 1
 6 1 1 714 0 55 1 1 714 0 0 55 1 1
 7 1 1 438 1 65 1 1 438 0 1 65 1 1
 8 1 0 796 1 60 1 0 796 0 1 60 1 1
 9 1 1 892 0 50 1 1 892 0 0 50 1 1
10 1 1 393 1 65 1 1 393 0 1 65 1 1
11 1 0 161 1 80 1 0 161 0 1 80 1 1
12 1 1 836 1 60 1 1 836 0 1 60 1 1
13 1 1 523 0 55 1 1 523 0 0 55 1 1
14 1 1 612 0 70 1 1 612 0 0 70 1 1
15 1 1 212 1 60 1 1 212 0 1 60 1 1
16 1 1 399 1 60 1 1 399 0 1 60 1 1
17 1 1 771 1 75 1 1 771 0 1 75 1 1
18 1 1 514 1 80 1 1 514 0 1 80 1 1
19 1 1 512 0 80 1 1 512 0 0 80 1 1
21 1 1 624 1 80 1 1 624 0 1 80 1 1
end

Tags: None

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2405

21 Jul 2023, 08:45

Stata doesn't know what you want by using lincom - it is only faithfully doing the algebra you specify.

That said, the book seems to specify prison is coded as 0/1, and clinic as 1/2, yet your code example shows that clinic has been recoded to 0/1. In this case, the correct command to get the HR for clinic=1 vs clinic=0 among prison=1 is

Code:

lincom prison + clin_pr, hr  // note no factor of 2 here for clin_pr

To understand what the coefficients mean, go back to first principles. They are always specifying some level of association for a one-unit increase in the variable. Since the interaction between clinic, coded 1/2, and prison, coded 0/1, creates a new variable, clin_pr coded as a 0/2. This means that for clinic=1, the HR is the value of -_b[prison]-. But to move to clinic=2, you need to recognize that the coefficient for clin_pr is to move one unit in clinic ( from 0 to 1) when prison=1, so you need to multiply by 2 to get from clinical=0 (which doesn't exist, but mathematically is implied) to clinic=2.

That said, it is better to let Stata manage the factor notation and interaction generation for you. In a typical setting without stratification, you could specify your two variables as main effects plus their interaction as

Code:

i.clinic##i.prison // for example
i.clinic i.prison i.clinic#i.prison  // Stata expands to the above to this equivalent form.

Edit: Ok I tried this with the data. You can stratify on clinic, but then you'll have to be very careful about what to include as covariates in your model. This might be a situation where manually constructed interaction terms are a little easier, but if you go with factor notation, you still need to include the main effects of both variables and let Stata omit those for clinic (since they are aliased with stratum). See both models below and convince yourself they are the same, but the last model is incorrect (different standard errors and coefficients, or omitted entirely).

Code:

. stcox prison dose clin_pr clin_do, strat(clinic) nolog

        Failure _d: status==1
  Analysis time _t: survt
       ID variable: id

Stratified Cox regression with Breslow method for ties
Strata variable: clinic

No. of subjects =    238                                Number of obs =    238
No. of failures =    150
Time at risk    = 95,812
                                                        LR chi2(4)    =  35.81
Log likelihood = -596.77891                             Prob > chi2   = 0.0000

------------------------------------------------------------------------------
          _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      prison |   2.966201   1.597644     2.02   0.044     1.032119    8.524553
        dose |   .9657948   .0191197    -1.76   0.079     .9290385    1.004005
     clin_pr |   .5572334   .2385678    -1.37   0.172     .2407759    1.289619
     clin_do |   .9989383   .0145535    -0.07   0.942     .9708174    1.027874
------------------------------------------------------------------------------

. stcox dose i.clinic##i.prison clin_do, strat(clinic) nolog

No. of subjects =    238                                Number of obs =    238
No. of failures =    150
Time at risk    = 95,812
                                                        LR chi2(4)    =  35.81
Log likelihood = -596.77891                             Prob > chi2   = 0.0000

-------------------------------------------------------------------------------
           _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
         dose |   .9657948   .0191197    -1.76   0.079     .9290385    1.004005
     2.clinic |          1  (omitted)
     1.prison |   1.652866   .3118812     2.66   0.008     1.141888      2.3925
              |
clinic#prison |
         2 1  |   .5572334   .2385678    -1.37   0.172     .2407759    1.289619
              |
      clin_do |   .9989383   .0145535    -0.07   0.942     .9708174    1.027874
-------------------------------------------------------------------------------

. stcox i.prison dose i.clinic#i.prison clin_do, strat(clinic) nolog
No. of subjects =    238                                Number of obs =    238
No. of failures =    150
Time at risk    = 95,812
                                                        LR chi2(4)    =  35.81
Log likelihood = -596.77891                             Prob > chi2   = 0.0000

-------------------------------------------------------------------------------
           _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
     1.prison |   1.652866   .3118812     2.66   0.008     1.141888      2.3925
         dose |   .9657948   .0191197    -1.76   0.079     .9290385    1.004005
              |
clinic#prison |
         2 0  |   1.731136   .7411495     1.28   0.200     .7480093    4.006409
         2 1  |   .9646467          .        .       .            .           .
              |
      clin_do |   .9989383   .0145535    -0.07   0.942     .9708174    1.027874
-------------------------------------------------------------------------------

Last edited by Leonardo Guizzetti; 21 Jul 2023, 09:06.

Comment

Denise Vella

Join Date: Aug 2022

Posts: 187
#3

21 Jul 2023, 15:53

This is hard to understand ! Back to reading again
i’ll read on factor kotiation
didnt know stata can generate own interaction terms !
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2405
#4

21 Jul 2023, 19:33

The algebra is explained on page 544 of the 3rd edition of this book, directly below the code you cited. If it helps, there's nothing special about Cox regression when it comes to understanding interactions, they work the same way in all regression models, so you can work with linear regression, say, to start to understand if that helps.
Comment
Denise Vella

Join Date: Aug 2022

Posts: 187
#5

22 Jul 2023, 03:21

I think I’ve understood it. The author gives the algebra equation above , I’ve attempted to explain it (not sure if I understood this)

B1 = prison (therefore numerator =1, denominator = 0)

B3 = clinic-pr which as we want the value for clinic =2. This would be either clinic (2) x prison (1) or clinic (2) x prison (0)

If clinic had a value or 3 or 0 and I wanted estimate the Hr for prison where clinic =2

The code would be

Lincom prison+3*clin_pr,hr

Also, from your experience, from what I understood in post2 correct me if I’m wrong

Do you recommend to generate interaction models to manually do it as the author does in pg 541 third edition ?

Or to let stata generate them:

Code:

i.clinic##i.prison // for example

This is not mentioned in his book perhaps to make it simple for the newbies like myself
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2405
#6

22 Jul 2023, 07:06

Your understanding of the algebra is correct, with one small exception (which might have been a typo). The highlighted bit in red should be clinic=3, if clinic can only take values of 3 or 0. Then the lincom command is correct.

If clinic had a value or 3 or 0 and I wanted estimate the Hr for prison where clinic =2

The code would be

lincom prison+3*clin_pr,hr

As for why factor variables aren't mentioned in their book, I cannot say. Factor variable notation has been in Stata for quite some time. Users of Stata are not required to use factor variables, but it makes the use of factor variables much easier if you do use factor variable notation ~99% of the time, especially when interactions are involved. In this specific instance though, it seems (to me) about the same level of difficulty, so it's really your choice for what makes sense. Since you are learning from the book, you may as well copy what they are doing to aid your understanding. As you become more familiar with Stata and the models you are fitting, you can introduce factor variable notation to your models.
Comment

Announcement

lincom - code works but how does Stata know that I want the HR for only one categorical variable

Comment

Comment

Comment

Comment

Comment