Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why did my estimate change when I added more groups to the cox regression?

    Hello, dear statalist,

    My name is Tiange and I am new to Stack overflow. I have a quick question regarding my cox regression results.

    When I first run the cox egression with two groups, here is what I got (the estimate is shown below - 2 groups)

    Click image for larger version

Name:	2 groups.png
Views:	1
Size:	94.7 KB
ID:	1733069


    However, when I put all the groups in the regression, the results changed to this (4 groups)
    Click image for larger version

Name:	4 groups.png
Views:	1
Size:	151.3 KB
ID:	1733070


    I kinda know it's the effect of group 2 and 3 changed the estimate but I don't know why. I am hoping I can get some help from you! Would really appreciate your help and time!

    Sincerely, Tiange

  • #2
    It would have been helpful had you posted the actual -stcox- commands in addition to the output. But I think I know what you did that got you these results.

    I'm inferring that for the first results you ran -stcox group-, and group is a variable in your data set that takes the values 1, 2, 3, and 4. When you run -stcox group-, Stata treats group as a continuous variable. Consequently the hazard ratio you get is actually a hazard ratio per unit increase in the group variable. If group is, in fact, not even an ordered variable (and variables named group usually are not ordered) then these results are just meaningless garbage and you should discard them. It is simply wrong to treat a grouping variable as if it were continuous and, unwittingly, that is what I believe you did.

    For the second results I believe you ran -stcox i.group-. That i. prefix is critical; it tells Stata to treat your group variable as a discrete variable--which variables named group usually are. So these results are usable.

    If I have not correctly inferred what your -stcox- commands were, then please post back and show the actual commands that produced each output.

    Comment


    • #3
      Originally posted by Tiange Tang View Post
      Hello, dear statalist,

      My name is Tiange and I am new to Stack overflow.
      Welcome Tiange. Please also note that since you have posted this question here and to Stack Overflow, you are requested to share that link here to avoid duplication of efforts.

      Comment


      • #4
        Originally posted by Leonardo Guizzetti View Post

        Welcome Tiange. Please also note that since you have posted this question here and to Stack Overflow, you are requested to share that link here to avoid duplication of efforts.
        Hi Leonardo,

        Thank you for letting me know! I am new to the policy, I will keep it in mind in my future post. I deleted the post on stackflow.

        Really Appreciate it!

        Sinerely,
        Tiange
        Last edited by Tiange Tang; 07 Nov 2023, 20:59.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          It would have been helpful had you posted the actual -stcox- commands in addition to the output. But I think I know what you did that got you these results.

          I'm inferring that for the first results you ran -stcox group-, and group is a variable in your data set that takes the values 1, 2, 3, and 4. When you run -stcox group-, Stata treats group as a continuous variable. Consequently the hazard ratio you get is actually a hazard ratio per unit increase in the group variable. If group is, in fact, not even an ordered variable (and variables named group usually are not ordered) then these results are just meaningless garbage and you should discard them. It is simply wrong to treat a grouping variable as if it were continuous and, unwittingly, that is what I believe you did.

          For the second results I believe you ran -stcox i.group-. That i. prefix is critical; it tells Stata to treat your group variable as a discrete variable--which variables named group usually are. So these results are usable.

          If I have not correctly inferred what your -stcox- commands were, then please post back and show the actual commands that produced each output.
          Dear Dr.Schechter,

          I am so excited that you replied to my post, I have been watching and searching for solutions on statalist all the time and see you answering other researchers' questions selflessly! It's my honor to talk to you here (huge fan of yours!)!
          Below is the code I used for the cox regression

          preserve
          stset time, failure(event == 1)
          keep if group == 1 | group == 4
          stcox i.group (I forgot to add i. in previous results, factoring group in this line change the estimate to 0.689, still slightly different from 0.711)
          restore

          preserve
          stset time, failure(event == 1)
          stcox i.group
          restore

          I am not sure why adding group 2 and group 3 would change the estimate for group 4, I guess some of the effect of being in the group 4 is diluted by the increased sample size?
          Again, thank you for your help!
          Sincerely,
          Tiange
          Last edited by Tiange Tang; 07 Nov 2023, 21:13.

          Comment


          • #6
            In your first analysis, -stcox group if inlist(group, 1, 4)-, because you did not use i., Stata treats group as a continuous variable, even though only values 1 and 4 are instantiated in the estimation sample. The hazard ratio is the hazard ratio associated with a unit increase in group--but that unit increase never actually happens in the data because the only values are 1 and 4, with a difference of 4-1 = 3. So the hazard ratio you are getting here is something like the cube root of the hazard ratio directly comparing group 1 to group 4 as a discrete variable since you would have to apply this hazard ratio 3 times to get from 1 up to 4.

            By contrast, in the second analysis, -stcox i.group-, each value of group other than 1 gives rise to a hazard ratio directly comparing that group to group 1.

            Now let's look at the actual comparisons of group 4 to group 1 in both models. In the second model, the group4:group1 hazard ratio can be read directly off the output, and it is 0.7113975. For the first model (group treated as continuous but having only values 1 and 4), the hazard ratio should be roughly the cube root of that. The cube root is 0.893, which is very close to the 0.883 that we got. The small difference between that cube root and the observed value can be attributed easily to sampling variation (the estimation samples are different) and is well within the standard error.
            Last edited by Clyde Schechter; 07 Nov 2023, 21:12.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              In your first analysis, -stcox group if inlist(group, 1, 4)-, because you did not use i., Stata treats group as a continuous variable, even though only values 1 and 4 are instantiated in the estimation sample. The hazard ratio is the hazard ratio associated with a unit increase in group--but that unit increase never actually happens in the data because the only values are 1 and 4, with a difference of 4-1 = 3. So the hazard ratio you are getting here is something like the cube root of the hazard ratio directly comparing group 1 to group 4 as a discrete variable since you would have to apply this hazard ratio 3 times to get from 1 up to 4.

              By contrast, in the second analysis, -stcox i.group-, each value of group other than 1 gives rise to a hazard ratio directly comparing that group to group 1.

              Now let's look at the actual comparisons of group 4 to group 1 in both models. In the second model, the group4:group1 hazard ratio can be read directly off the output, and it is 0.7113975. For the first model (group treated as continuous but having only values 1 and 4), the hazard ratio should be roughly the cube root of that. The cube root is 0.893, which is very close to the 0.883 that we got. The small difference between that cube root and the observed value can be attributed easily to sampling variation (the estimation samples are different) and is well within the standard error.
              Dr.Schechter,

              Thank you for your explanation, I think I get the point of what's going on with my code.

              Again, really appreciate your help!!

              Sincerely,
              Tiange

              Comment

              Working...
              X