Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • lincom - code works but how does Stata know that I want the HR for only one categorical variable


    Question:
    I following this book learning on survival analysis: https://link.springer.com/book/10.10...-1-4419-6646-9

    My question is about the code which was given by the author. Data found below.

    Clinic is a categorical variable of 1 or 2. prison is a categorical variable of 0 or 1.

    Code:
    stcox prison dose clin_pr clin_do, strata(clinic).  //this gives the HR for a combined clinic 
    
    //The book suggests the following to get an output for the HR for prison 1 VS PRISON 0 for CLinic == 2 
    
    //My question: 
    *********How does STATA know that it needs to output the HR for clinic ==2 in this code given below
    
    lincom prison+2*clin_pr, hr
    
    
    *The interaction terms were defined previously as: 
    *generate interaction terms 
    gen clin_pr=clinic*prison
    gen clin_do=clinic*dose
    The HR is then the same as (here I understand where the output of HR is presented only for clinic ==2 , but I don't understand why the result given in lincom is equivalent for clinic ==2 .
    stcox prison dose if clinic ==2



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(id clinic status survt prison dose) byte(_st _d) int _t byte _t0 float(clin_pr clin_do) byte(_est_A _est_LRTEST_0)
     1 1 1 428 0 50 1 1 428 0 0 50 1 1
     2 1 1 275 1 55 1 1 275 0 1 55 1 1
     3 1 1 262 0 55 1 1 262 0 0 55 1 1
     4 1 1 183 0 30 1 1 183 0 0 30 1 1
     5 1 1 259 1 65 1 1 259 0 1 65 1 1
     6 1 1 714 0 55 1 1 714 0 0 55 1 1
     7 1 1 438 1 65 1 1 438 0 1 65 1 1
     8 1 0 796 1 60 1 0 796 0 1 60 1 1
     9 1 1 892 0 50 1 1 892 0 0 50 1 1
    10 1 1 393 1 65 1 1 393 0 1 65 1 1
    11 1 0 161 1 80 1 0 161 0 1 80 1 1
    12 1 1 836 1 60 1 1 836 0 1 60 1 1
    13 1 1 523 0 55 1 1 523 0 0 55 1 1
    14 1 1 612 0 70 1 1 612 0 0 70 1 1
    15 1 1 212 1 60 1 1 212 0 1 60 1 1
    16 1 1 399 1 60 1 1 399 0 1 60 1 1
    17 1 1 771 1 75 1 1 771 0 1 75 1 1
    18 1 1 514 1 80 1 1 514 0 1 80 1 1
    19 1 1 512 0 80 1 1 512 0 0 80 1 1
    21 1 1 624 1 80 1 1 624 0 1 80 1 1
    end


  • #2
    Stata doesn't know what you want by using lincom - it is only faithfully doing the algebra you specify.

    That said, the book seems to specify prison is coded as 0/1, and clinic as 1/2, yet your code example shows that clinic has been recoded to 0/1. In this case, the correct command to get the HR for clinic=1 vs clinic=0 among prison=1 is

    Code:
    lincom prison + clin_pr, hr  // note no factor of 2 here for clin_pr
    To understand what the coefficients mean, go back to first principles. They are always specifying some level of association for a one-unit increase in the variable. Since the interaction between clinic, coded 1/2, and prison, coded 0/1, creates a new variable, clin_pr coded as a 0/2. This means that for clinic=1, the HR is the value of -_b[prison]-. But to move to clinic=2, you need to recognize that the coefficient for clin_pr is to move one unit in clinic ( from 0 to 1) when prison=1, so you need to multiply by 2 to get from clinical=0 (which doesn't exist, but mathematically is implied) to clinic=2.

    That said, it is better to let Stata manage the factor notation and interaction generation for you. In a typical setting without stratification, you could specify your two variables as main effects plus their interaction as

    Code:
    i.clinic##i.prison // for example
    i.clinic i.prison i.clinic#i.prison  // Stata expands to the above to this equivalent form.
    Edit: Ok I tried this with the data. You can stratify on clinic, but then you'll have to be very careful about what to include as covariates in your model. This might be a situation where manually constructed interaction terms are a little easier, but if you go with factor notation, you still need to include the main effects of both variables and let Stata omit those for clinic (since they are aliased with stratum). See both models below and convince yourself they are the same, but the last model is incorrect (different standard errors and coefficients, or omitted entirely).

    Code:
    . stcox prison dose clin_pr clin_do, strat(clinic) nolog
    
            Failure _d: status==1
      Analysis time _t: survt
           ID variable: id
    
    Stratified Cox regression with Breslow method for ties
    Strata variable: clinic
    
    No. of subjects =    238                                Number of obs =    238
    No. of failures =    150
    Time at risk    = 95,812
                                                            LR chi2(4)    =  35.81
    Log likelihood = -596.77891                             Prob > chi2   = 0.0000
    
    ------------------------------------------------------------------------------
              _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          prison |   2.966201   1.597644     2.02   0.044     1.032119    8.524553
            dose |   .9657948   .0191197    -1.76   0.079     .9290385    1.004005
         clin_pr |   .5572334   .2385678    -1.37   0.172     .2407759    1.289619
         clin_do |   .9989383   .0145535    -0.07   0.942     .9708174    1.027874
    ------------------------------------------------------------------------------
    
    . stcox dose i.clinic##i.prison clin_do, strat(clinic) nolog
    
    No. of subjects =    238                                Number of obs =    238
    No. of failures =    150
    Time at risk    = 95,812
                                                            LR chi2(4)    =  35.81
    Log likelihood = -596.77891                             Prob > chi2   = 0.0000
    
    -------------------------------------------------------------------------------
               _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
    --------------+----------------------------------------------------------------
             dose |   .9657948   .0191197    -1.76   0.079     .9290385    1.004005
         2.clinic |          1  (omitted)
         1.prison |   1.652866   .3118812     2.66   0.008     1.141888      2.3925
                  |
    clinic#prison |
             2 1  |   .5572334   .2385678    -1.37   0.172     .2407759    1.289619
                  |
          clin_do |   .9989383   .0145535    -0.07   0.942     .9708174    1.027874
    -------------------------------------------------------------------------------
    
    . stcox i.prison dose i.clinic#i.prison clin_do, strat(clinic) nolog
    No. of subjects =    238                                Number of obs =    238
    No. of failures =    150
    Time at risk    = 95,812
                                                            LR chi2(4)    =  35.81
    Log likelihood = -596.77891                             Prob > chi2   = 0.0000
    
    -------------------------------------------------------------------------------
               _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
    --------------+----------------------------------------------------------------
         1.prison |   1.652866   .3118812     2.66   0.008     1.141888      2.3925
             dose |   .9657948   .0191197    -1.76   0.079     .9290385    1.004005
                  |
    clinic#prison |
             2 0  |   1.731136   .7411495     1.28   0.200     .7480093    4.006409
             2 1  |   .9646467          .        .       .            .           .
                  |
          clin_do |   .9989383   .0145535    -0.07   0.942     .9708174    1.027874
    -------------------------------------------------------------------------------
    Last edited by Leonardo Guizzetti; 21 Jul 2023, 09:06.

    Comment


    • #3
      This is hard to understand ! Back to reading again
      i’ll read on factor kotiation
      didnt know stata can generate own interaction terms !

      Comment


      • #4
        The algebra is explained on page 544 of the 3rd edition of this book, directly below the code you cited. If it helps, there's nothing special about Cox regression when it comes to understanding interactions, they work the same way in all regression models, so you can work with linear regression, say, to start to understand if that helps.

        Comment


        • #5
          Click image for larger version

Name:	6FE006EC-9659-4274-83E3-5B4D3DA64219.jpeg
Views:	1
Size:	34.6 KB
ID:	1721466



          I think I’ve understood it. The author gives the algebra equation above , I’ve attempted to explain it (not sure if I understood this)







          B1 = prison (therefore numerator =1, denominator = 0)




          B3 = clinic-pr which as we want the value for clinic =2. This would be either clinic (2) x prison (1) or clinic (2) x prison (0)







          If clinic had a value or 3 or 0 and I wanted estimate the Hr for prison where clinic =2




          The code would be

          Lincom prison+3*clin_pr,hr




          Also, from your experience, from what I understood in post2 correct me if I’m wrong




          Do you recommend to generate interaction models to manually do it as the author does in pg 541 third edition ?




          Or to let stata generate them:



          Code:
          i.clinic##i.prison // for example



          This is not mentioned in his book perhaps to make it simple for the newbies like myself



          Comment


          • #6
            Your understanding of the algebra is correct, with one small exception (which might have been a typo). The highlighted bit in red should be clinic=3, if clinic can only take values of 3 or 0. Then the lincom command is correct.

            If clinic had a value or 3 or 0 and I wanted estimate the Hr for prison where clinic =2

            The code would be

            lincom prison+3*clin_pr,hr
            As for why factor variables aren't mentioned in their book, I cannot say. Factor variable notation has been in Stata for quite some time. Users of Stata are not required to use factor variables, but it makes the use of factor variables much easier if you do use factor variable notation ~99% of the time, especially when interactions are involved. In this specific instance though, it seems (to me) about the same level of difficulty, so it's really your choice for what makes sense. Since you are learning from the book, you may as well copy what they are doing to aid your understanding. As you become more familiar with Stata and the models you are fitting, you can introduce factor variable notation to your models.

            Comment

            Working...
            X