Testing for interaction for survival with two categorical variables

Jasper Tromp

Join Date: Jul 2014

Posts: 27
#1

Testing for interaction for survival with two categorical variables

24 Feb 2017, 22:36

Dear all,

I have searched statalist, but have not been able to find my answer. I have found possible solution within the stata.

I have two variables, variable A (dichotomous: 0/1) and variable B(8 categories, random order, coded 1-8).

I want to look an interaction between variable A and B for survival (death).

Intuitively, I am inclined to create 8 dummy variables and test for interaction for each dummy separately. Based on what I observe in differences in HRs between the 8 groups, these interactions make sense. Is this a correct way on going about this?

Alternatively, I have found this:

Code:

stcox i.A i.B i.A##i.B

Here, I get a P-value for each category separately, yet these differ substantially from the p-values when I use dummy variables. Any help or feedback would be very much appreciated.

Thank you.

Best,

Jasper
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

25 Feb 2017, 02:26

Hello Jasper,

With the use of ## you will get the effects of each dummy separetely plus the interaction.

When using the double (##) sign, you are supposed to omit typing the same variables again.

Compared to creating dummies instead of using factor notation, the result may differ. Among the reasons, the baseline (reference).

My advice is: you'd rather stick to factor notation, That said, make sure the command is correct.

Hopefully that helps..

Last edited by Marcos Almeida; 25 Feb 2017, 02:35.

Best regards,

Marcos
Comment
Jasper Tromp

Join Date: Jul 2014

Posts: 27
#3

25 Feb 2017, 05:11

Dear Marcos,

Thank you very much for your answer. When using the i. prefix, it will take group 1 from factor variable B as a reference. I noticed that it makes considerable difference for the interaction p-value based on which group it takes as a reference. The coding 1-8 is rather arbitrary. How do you determine which group to take as a reference? In the case of dummy variables, it takes the rest of the population as a reference, isn't this more fair? Thank you

Best,

Jasper
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17850
#4

25 Feb 2017, 05:25

Jasper:
as an aside to Marcos' helpful advice, please note that -help fvvarlist- and related entry in Stata .pdf manual will answer most of your questions.
If .variable B- is not ordered (I assume you meant it by "random"), you can keep the coding arbitrary (or, better, you can -label- each code).
As a rule of thumb, the reference group may be chosen according to the largest or smallest sample stratum size.

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#5

25 Feb 2017, 06:42

Carlo explained the meanders of the so-called "ib." operator.

Also, in the end of the day (maybe it is better to say "when starting up the preparation for the analysis"), it is up to you to decide which level should be taken as the reference. Shall you "abdicate" to choose, Stata will do the work for you and select the default option.

Below, an extract from the Stata Manual, echoing Carlo's advice to spend some - quite useful - time reading it.

Base levels
When we typed i.group, group = 1 became the base level. When we do not specify otherwise,
the smallest level becomes the base level.

You can specify the base level of a factor variable by using the ib. operator. The syntax is

Base operatora Description
ib#. use # as base, # = value of variable
ib(##). use the #th ordered value as baseb
ib(first). use smallest value as base (default)
ib(last). use largest value as base
ib(freq). use most frequent value as base
ibn. no base level

aThe i may be omitted. For instance, you can type ib2.group or b2.group.

bFor example, ib(#2). means to use the second value as the base

Hopefully that helps.

Best regards,

Marcos
Comment

Announcement

Testing for interaction for survival with two categorical variables

Comment

Comment

Comment

Comment