Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction term in survival analysis (streg)

    Hello,


    I am posting my first question here to ask how to interpret the interaction term in survival analysis regression.

    I'm working on the survival analysis, using the exponential model.
    And I would like to include the interaction term between my main variable(work1, time-varying) and calendar year to see how the effects of 'work1' vary with time.

    'work1' variable is categorical variable with 3 categories: employed(reference), never employed and previously employed.
    And 'calyear' variable is also a categorical variable with 7 categories: 1980-1984(reference), 1985-1989, and so on..


    So I used the command:

    streg i.work1 i.calyear work1#calyear workex workex2 i.educ2 i.agegroup i.marital3, dist(exponential) robust

    and got the result like this (I cropped the result of the rest of variables, being too long) :

    Click image for larger version

Name:	result.PNG
Views:	2
Size:	48.1 KB
ID:	1495948


    So, I have two questions regarding this result.

    The first question is how to interpret the hazard ratio of each category for the interaction term.

    Since the categories for interaction term that include any of reference category are missing,
    it is not clear for me what the hazard ratio means.

    For example, what does the coefficient of 'never employed#198-1989' mean?
    It might be a relative risk, but compared to what?


    And the second question is,
    My prime interest would be to see how the hazard ratio of each category of 'work1' variable changes over time..
    So I am wondering if there is a way to get the hazard ratio of every category of interaction term?

    I tried (1) margins command after running the regression, and I found 'margins' is not suitable to get what I want in the survival analysis context.

    and (2) including only interaction term without main effect to make Stata show all the categories, by trying this command:

    streg i.work1#i.calyear workex workex2 i.educ2 i.agegroup i.marital3, dist(exponential) robust

    but excluding the main effects themselves might not be appropriate.


    So if there is anyone who can possibly have an answer, it would be very grateful to share your knowledge.
    Thank you so much for your attention!

    I will be looking forward to hearing from you!
    Attached Files

  • #2
    Why do you say -margins- is not suitable here? If you run
    Code:
    margins work1#calyear, hr
    you will get the hazard ratios calculated for every combination of work1 and calyear.

    You could also write a loop to calculate these using -lincom, eform- to get these same results, but that's a lot more work and it's really easy to make mistakes. So -margins- is your friend.

    In terms of understanding the "hazard ratios" you see in the -streg- output, for the interactions, they are not actually hazard ratios. They are ratios of hazard ratios. So for example, for never employed # 1985-1989, the hazard ratio (relative to the double-base category (employed, 1980-1984) is 0.6884133 * the hazard ratio for 1985-1989 vs 1980-1984 * the hazard ratio for never employed vs employed. That is the combined effect of never employed 1985-1989 is, in the hazard ratio metric, only 0.6884133 times the product of the hazard ratio for 1985-1989 alone and the hazard ratio for never_employed alone.

    Comment


    • #3
      I believe the following code will give you what you want.
      Code:
      streg i.calyear i.work1#i.calyear workex workex2 i.educ2 i.agegroup i.marital3, dist(exponential) robust


      This a reparameterisation of your model so that you now get estimated hazard ratios for work (each category compared to the reference) for each level of calendar period. You could get the same estimates using lincom.

      Adding to Clyde's description. The estimates that look as if they might be main effects in your output are the estimated hazard ratios at the reference level of the other term in the interaction. The first row in the output tells us, for example, that the hazard ratio for "never employed" versus "employed" is 1.477 and this is for the 1980-84 (the reference level). If you want the corresponding hazard ratio for for 1985-1989 then it is 1.477*0.688413. My code above should give you the same figure directly. Or you can get it by -lincom-. Or you can get it from -margins-.

      I find the following useful when working with factor variables:

      Code:
      set show baselevels on, permanently

      Comment


      • #4
        Thank you so much for your kind and prompt replies !


        Dear Clyde, this is why I found the -margins- is not suitable for survival analysis, if I understand right :
        (https://www.stata.com/statalist/arch.../msg01214.html)

        I don't think marginal effects make any sense within the context of survival analysis: you have the usual problem that there can be substantial variation in marginal effects between observation and on top of that there can be substantial variation in marginal effects within an observation over time.
        Or I didn't understand properly..
        Anyway I am running -margins- command as you suggested, but without 'hr' option, because Stata gave me this error message, "option hr not allowed"
        And it seems to take more hours on..



        And Dear Clyde and Paul, thank you so much for your kind explanation.
        So basically the interpretation of the main effect and the interaction effect in survival analysis are entirely different from the normal regression, right?


        This is what I understand:

        if I put both of single variables and also the interaction term in the regression at the same time, by using the code :

        Code:
        streg i.work1 i.calyear work1#calyear workex workex2 i.educ2 i.agegroup i.marital3, dist(exponential) robust
        I need to calculate the Real interaction effect by multiplying three coefficients from each variable, as Clyde mentioned,

        for never employed # 1985-1989, the hazard ratio (relative to the double-base category (employed, 1980-1984) is 0.6884133 * the hazard ratio for 1985-1989 vs 1980-1984 * the hazard ratio for never employed vs employed
        which is 0.6884133 * 1.477304 * 1.560006 = 1.586519


        And if I use the code as Paul suggested,
        Code:
         streg i.calyear i.work1#i.calyear workex workex2 i.educ2 i.agegroup i.marital3, dist(exponential) robust
        I can get the same result by multiplying the coefficients of (1985-1989) * (never employed*1985-1989), which is 1.560006*1.016996=1.586519
        Click image for larger version

Name:	result2.PNG
Views:	1
Size:	29.8 KB
ID:	1497470






        Then I tried this code:
        Code:
         streg i.work1#i.calyear workex workex2 i.educ2 i.agegroup i.marital3, dist(exponential) robust
        only including the interaction term, and I was able to get the same result with the calculation I did above.
        Click image for larger version

Name:	result3.PNG
Views:	1
Size:	42.0 KB
ID:	1497469





        I think this is what I wanted to get.
        I indeed appreciate your help!
        Please add any comment if there is something I get wrong.
        Last edited by Jiwan Lee; 09 May 2019, 04:00.

        Comment


        • #5
          Hi Clyde again,

          This is what I got from -margins- command
          Click image for larger version

Name:	margins.PNG
Views:	1
Size:	54.8 KB
ID:	1497493




          where I don't have any clue to interpreting this..

          I think I got this result because I didn't put -hr- option at the end, but as I mentioned, I couldn't put it because of the error message.
          So it would be great if you let me know any other option I can use here.
          Or.. maybe I don't need margins anymore since I already got what I wanted(I guess).

          Thanks,
          Jiwan
          Last edited by Jiwan Lee; 09 May 2019, 06:14.

          Comment


          • #6
            Thanks Clyde and Paul for your responses. This is my new post here, but I have benefited a lot from past support you have been provided in the group. Thanks for support. I have similar interpretation question.

            I am conducting similar analysis but using accelerated time failure (AFT) model because my interpretations are more focused to time to event. I am interested in interpretation of main and interaction effects.

            The aim my of the study is to evaluate the effect of policy change on time to event (in years). I have 3 countries that changed policy from conservative to liberal, and 3 that maintained their conservative policies. The variable group is coded as 1 countries that changed their policies to liberal and 0 for countries that maintained their conservative policies. Then there is time variable spanning from 1990 to 2018. For countries that changed their policy, the timing was around 2008, so I have created period variable for both intervention and control coded 1 if after policy change > 2008 or 0 before policy change <=2008. I am interesting in the effect of the policy change using the interaction coefficient. Here is my code and output


            Click image for larger version

Name:	Screenshot (121).png
Views:	1
Size:	10.1 KB
ID:	1523847



            Click image for larger version

Name:	Screenshot (123).png
Views:	1
Size:	42.6 KB
ID:	1523848


            Here are my questions,

            First, I am right to interpret the interaction term as difference in difference? i.e the difference in difference in median time to event after and before in countries follows liberal and conservative policies? Which is 0.6% change?

            DD= difference after[median time to event in liberal group-median time to event conservative group]-[difference before[median time to event in liberal group-median time to event conservative group]=0.0068


            Second question is interpretations of margins

            Click image for larger version

Name:	Screenshot (125).png
Views:	1
Size:	27.8 KB
ID:	1523849

            My interpretation is average median time to event is higher by 8% in liberal group relative to conservative group before intervention and higher by 22% after policy change. Is this interpretation correct?

            Lastly is how treatment effects estimation differ/compare when utilize margins vs stteffects? Is one better than the other? if yes in what circumstances? see my output below


            Click image for larger version

Name:	Screenshot (127).png
Views:	1
Size:	35.8 KB
ID:	1523850

            From the output above, it looks average time to event is 0.044 years more years in liberal group compared to conservative group.= i.e (0.238% higher)

            Thanks.


            Comment

            Working...
            X