Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing for interaction for survival with two categorical variables

    Dear all,

    I have searched statalist, but have not been able to find my answer. I have found possible solution within the stata.

    I have two variables, variable A (dichotomous: 0/1) and variable B(8 categories, random order, coded 1-8).

    I want to look an interaction between variable A and B for survival (death).

    Intuitively, I am inclined to create 8 dummy variables and test for interaction for each dummy separately. Based on what I observe in differences in HRs between the 8 groups, these interactions make sense. Is this a correct way on going about this?

    Alternatively, I have found this:
    Code:
     stcox i.A i.B i.A##i.B
    Here, I get a P-value for each category separately, yet these differ substantially from the p-values when I use dummy variables. Any help or feedback would be very much appreciated.

    Thank you.

    Best,

    Jasper

  • #2
    Hello Jasper,

    With the use of ## you will get the effects of each dummy separetely plus the interaction.

    When using the double (##) sign, you are supposed to omit typing the same variables again.

    Compared to creating dummies instead of using factor notation, the result may differ. Among the reasons, the baseline (reference).

    My advice is: you'd rather stick to factor notation, That said, make sure the command is correct.

    Hopefully that helps..
    Last edited by Marcos Almeida; 25 Feb 2017, 02:35.
    Best regards,

    Marcos

    Comment


    • #3
      Dear Marcos,

      Thank you very much for your answer. When using the i. prefix, it will take group 1 from factor variable B as a reference. I noticed that it makes considerable difference for the interaction p-value based on which group it takes as a reference. The coding 1-8 is rather arbitrary. How do you determine which group to take as a reference? In the case of dummy variables, it takes the rest of the population as a reference, isn't this more fair? Thank you

      Best,

      Jasper

      Comment


      • #4
        Jasper:
        as an aside to Marcos' helpful advice, please note that -help fvvarlist- and related entry in Stata .pdf manual will answer most of your questions.
        If .variable B- is not ordered (I assume you meant it by "random"), you can keep the coding arbitrary (or, better, you can -label- each code).
        As a rule of thumb, the reference group may be chosen according to the largest or smallest sample stratum size.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo explained the meanders of the so-called "ib." operator.

          Also, in the end of the day (maybe it is better to say "when starting up the preparation for the analysis"), it is up to you to decide which level should be taken as the reference. Shall you "abdicate" to choose, Stata will do the work for you and select the default option.

          Below, an extract from the Stata Manual, echoing Carlo's advice to spend some - quite useful - time reading it.


          Base levels
          When we typed i.group, group = 1 became the base level. When we do not specify otherwise,
          the smallest level becomes the base level.

          You can specify the base level of a factor variable by using the ib. operator. The syntax is

          Base operatora Description
          ib#. use # as base, # = value of variable
          ib(##). use the #th ordered value as baseb
          ib(first). use smallest value as base (default)
          ib(last). use largest value as base
          ib(freq). use most frequent value as base
          ibn. no base level

          aThe i may be omitted. For instance, you can type ib2.group or b2.group.

          bFor example, ib(#2). means to use the second value as the base
          Hopefully that helps.
          Best regards,

          Marcos

          Comment

          Working...
          X