Why did my estimate change when I added more groups to the cox regression?

Tiange Tang

Join Date: Nov 2023

Posts: 14
#1

Why did my estimate change when I added more groups to the cox regression?

07 Nov 2023, 11:33

Hello, dear statalist,

My name is Tiange and I am new to Stack overflow. I have a quick question regarding my cox regression results.

When I first run the cox egression with two groups, here is what I got (the estimate is shown below - 2 groups)

However, when I put all the groups in the regression, the results changed to this (4 groups)

I kinda know it's the effect of group 2 and 3 changed the estimate but I don't know why. I am hoping I can get some help from you! Would really appreciate your help and time!

Sincerely, Tiange
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30126
#2

07 Nov 2023, 11:58

It would have been helpful had you posted the actual -stcox- commands in addition to the output. But I think I know what you did that got you these results.

I'm inferring that for the first results you ran -stcox group-, and group is a variable in your data set that takes the values 1, 2, 3, and 4. When you run -stcox group-, Stata treats group as a continuous variable. Consequently the hazard ratio you get is actually a hazard ratio per unit increase in the group variable. If group is, in fact, not even an ordered variable (and variables named group usually are not ordered) then these results are just meaningless garbage and you should discard them. It is simply wrong to treat a grouping variable as if it were continuous and, unwittingly, that is what I believe you did.

For the second results I believe you ran -stcox i.group-. That i. prefix is critical; it tells Stata to treat your group variable as a discrete variable--which variables named group usually are. So these results are usable.

If I have not correctly inferred what your -stcox- commands were, then please post back and show the actual commands that produced each output.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#3

07 Nov 2023, 12:58

Originally posted by Tiange Tang View Post

Hello, dear statalist,

My name is Tiange and I am new to Stack overflow.

Welcome Tiange. Please also note that since you have posted this question here and to Stack Overflow, you are requested to share that link here to avoid duplication of efforts.
1 like
Comment
Tiange Tang

Join Date: Nov 2023

Posts: 14
#4

07 Nov 2023, 20:46

Originally posted by Leonardo Guizzetti View Post

Welcome Tiange. Please also note that since you have posted this question here and to Stack Overflow, you are requested to share that link here to avoid duplication of efforts.

Hi Leonardo,

Thank you for letting me know! I am new to the policy, I will keep it in mind in my future post. I deleted the post on stackflow.

Really Appreciate it!

Sinerely,
Tiange

Last edited by Tiange Tang; 07 Nov 2023, 20:59.
Comment
Tiange Tang

Join Date: Nov 2023

Posts: 14
#5

07 Nov 2023, 20:55

Originally posted by Clyde Schechter View Post

It would have been helpful had you posted the actual -stcox- commands in addition to the output. But I think I know what you did that got you these results.

I'm inferring that for the first results you ran -stcox group-, and group is a variable in your data set that takes the values 1, 2, 3, and 4. When you run -stcox group-, Stata treats group as a continuous variable. Consequently the hazard ratio you get is actually a hazard ratio per unit increase in the group variable. If group is, in fact, not even an ordered variable (and variables named group usually are not ordered) then these results are just meaningless garbage and you should discard them. It is simply wrong to treat a grouping variable as if it were continuous and, unwittingly, that is what I believe you did.

For the second results I believe you ran -stcox i.group-. That i. prefix is critical; it tells Stata to treat your group variable as a discrete variable--which variables named group usually are. So these results are usable.

If I have not correctly inferred what your -stcox- commands were, then please post back and show the actual commands that produced each output.

Dear Dr.Schechter,

I am so excited that you replied to my post, I have been watching and searching for solutions on statalist all the time and see you answering other researchers' questions selflessly! It's my honor to talk to you here (huge fan of yours!)!
Below is the code I used for the cox regression

preserve
stset time, failure(event == 1)
keep if group == 1 | group == 4
stcox i.group (I forgot to add i. in previous results, factoring group in this line change the estimate to 0.689, still slightly different from 0.711)
restore

preserve
stset time, failure(event == 1)
stcox i.group
restore

I am not sure why adding group 2 and group 3 would change the estimate for group 4, I guess some of the effect of being in the group 4 is diluted by the increased sample size?
Again, thank you for your help!
Sincerely,
Tiange

Last edited by Tiange Tang; 07 Nov 2023, 21:13.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30126
#6

07 Nov 2023, 21:09

In your first analysis, -stcox group if inlist(group, 1, 4)-, because you did not use i., Stata treats group as a continuous variable, even though only values 1 and 4 are instantiated in the estimation sample. The hazard ratio is the hazard ratio associated with a unit increase in group--but that unit increase never actually happens in the data because the only values are 1 and 4, with a difference of 4-1 = 3. So the hazard ratio you are getting here is something like the cube root of the hazard ratio directly comparing group 1 to group 4 as a discrete variable since you would have to apply this hazard ratio 3 times to get from 1 up to 4.

By contrast, in the second analysis, -stcox i.group-, each value of group other than 1 gives rise to a hazard ratio directly comparing that group to group 1.

Now let's look at the actual comparisons of group 4 to group 1 in both models. In the second model, the group4:group1 hazard ratio can be read directly off the output, and it is 0.7113975. For the first model (group treated as continuous but having only values 1 and 4), the hazard ratio should be roughly the cube root of that. The cube root is 0.893, which is very close to the 0.883 that we got. The small difference between that cube root and the observed value can be attributed easily to sampling variation (the estimation samples are different) and is well within the standard error.

Last edited by Clyde Schechter; 07 Nov 2023, 21:12.
Comment
Tiange Tang

Join Date: Nov 2023

Posts: 14
#7

07 Nov 2023, 21:16

Originally posted by Clyde Schechter View Post

In your first analysis, -stcox group if inlist(group, 1, 4)-, because you did not use i., Stata treats group as a continuous variable, even though only values 1 and 4 are instantiated in the estimation sample. The hazard ratio is the hazard ratio associated with a unit increase in group--but that unit increase never actually happens in the data because the only values are 1 and 4, with a difference of 4-1 = 3. So the hazard ratio you are getting here is something like the cube root of the hazard ratio directly comparing group 1 to group 4 as a discrete variable since you would have to apply this hazard ratio 3 times to get from 1 up to 4.

By contrast, in the second analysis, -stcox i.group-, each value of group other than 1 gives rise to a hazard ratio directly comparing that group to group 1.

Now let's look at the actual comparisons of group 4 to group 1 in both models. In the second model, the group4:group1 hazard ratio can be read directly off the output, and it is 0.7113975. For the first model (group treated as continuous but having only values 1 and 4), the hazard ratio should be roughly the cube root of that. The cube root is 0.893, which is very close to the 0.883 that we got. The small difference between that cube root and the observed value can be attributed easily to sampling variation (the estimation samples are different) and is well within the standard error.

Dr.Schechter,

Thank you for your explanation, I think I get the point of what's going on with my code.

Again, really appreciate your help!!

Sincerely,
Tiange
Comment

Announcement

Why did my estimate change when I added more groups to the cox regression?

Comment

Comment

Comment

Comment

Comment

Comment