Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cumulative hazard ratio in Cox-regression

    I am doing a large registry-based study in which I am doing Cox-regression for mortality rates in hyperthyroid individuals compared to euthyroid individuals. Furthermore, I subdivide the hyperthyroid group into treated and untreated individuals. The cases are defined by a blood-sample; a decreased TSH-measurement (below 0.3).

    I want to look at the length of exposure, and the effect of longer exposure (low tsh) on mortality. As the hypothesis is that if you have low TSH levels for a long time your system strain is higher and it will cause you to die sooner than if your exposure to low TSH is closer to 0 months.

    In order to look at the length of exposure I have a time-dependent cumulative covariate, to capture the number of six-month periods during which the participant is exposed

    Thus, 2 variables are created: a dummy variable for low TSH pr half year (0 or 1, depending on normal or low TSH), and a cumulative variable (the number of half years up until this point counted as exposed (low TSH)). The followup period is 32 half-years.

    The table shows an example dataset:

    id low_tsh_1 low_tsh_2 low_tsh_3 cumulative_tsh_1 cumulative_tsh_2 cumulative_tsh_3
    134 1 1 1 1 2 3
    135 0 1 0 0 1 1
    136 1 0 1 1 1 2
    137 0 1 1 0 1 2

    Iā€™m not sure how to implement this into a cox-regression, so I tried to find a solution by doing the following:


    Code:
      generate halfyear=floor((rsdato-12785)/(365.25/2))
      bysort pnr halfyear: egen avg_tsh=mean(TSH_n)
      sort pnr halfyear
      quietly by pnr halfyear: gen dup_tsh=cond(_N==1,0,_n)
      drop if dup_tsh>1
      gen dummy_TSH_low=1 if avg_tsh<0.3
      replace dummy_TSH_low=0 if avg_tsh>=0.3
      bysort pnr (halfyear): gen cumulative=sum(dummy_TSH_low)
      collapse (max) cumulative, by(pnr)

    rsdato is the date of TSH-measurement, pnr is the id.

    I ran my Cox-regression:
    Code:
      stcox i.gruppe c.age i.charlson i.sex cumulative

    "gruppe" is the variable describing if you are euthyroid (normal TSH: =1), treated hyperthyroid (TSH<0.3: =2) or untreated hyperthyroid (=3) ā€“ at baseline, charlson is the comorbidity score.

    Contrary what I expected to find, the HR is below 1 for cumulative (the longer you are exposed the longer you have lived and the less you have died). I suspect this is because of the statistical problem also seen in eg. cancer-research when controlling for pack-years; the people who have smoked the longest have a lower mortality rate. This being because they have survived long enough to increase their pack-years.

    So the question ā€“ how can I measure the effect of each exposed half-year on survival? In other words: calculating the changes in mortality rates pr half year exposed (lived with low tsh).

  • #2
    Your sample data omits variables referenced in your code, so it's difficult to check your logic. Please create a proper sample; use dataex (SSC) and, as FAQ 12 requests, put all listings between CODE delimiters. I suggest that you make all variables lower case for ease of manipulation.
    Last edited by Steve Samuels; 25 Apr 2016, 07:13.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      I am following this

      Comment


      • #4
        I am very sorry for not understanding the rules correctly, I'll try to explain better. Please leave out the code posted in the former, as it might confuse. I have tried using dataex to create an example of my set. Please note that these data are not real data:
        Code:
        clear
        input float(ID birthdate) long date_first_tsh double tsh long(treatment_date end_date) float died byte sex double age int(group low_tsh_1 low_tsh_2 low_tsh_3 cumulative_tsh_1 cumulative_tsh_2 cumulative_tsh_3)
         1 -12407 13152  .01 13158 14663 1 1  69.97672826830937 3 1 1 0 1 2 2
         2  -1052 13552 3.45     . 19327 0 1 39.983572895277206 1 0 0 0 0 0 0
         3  -5753 13941 2.14     . 19327 0 0   53.9192334017796 1 0 0 0 0 0 0
         4  -7368 12847  .28 12848 15431 1 1  55.34565366187543 3 1 0 0 1 1 1
         5  -7818 16593   .2     . 19298 1 1  66.83367556468173 2 1 1 1 1 2 3
         6  -4489 17354  .12 17354 19327 0 1  59.80287474332649 3 1 0 1 1 1 2
         7  -9487 18553  3.2     . 14045 1 0  76.76933607118411 1 0 0 0 0 0 0
         8 -12451 14188 1.25     . 16800 1 1  72.93360711841204 1 0 0 0 0 0 0
         9   7656 17571 2.01     . 19327 0 1 27.145790554414784 1 0 0 0 0 0 0
        10   3420 16964 .003     . 19327 0 1  37.08145106091718 2 1 1 0 1 2 2
        end
        format %tdD_m_Y birthdate
        format %tdD_m_Y date_first_tsh
        format %tdD_m_Y treatment_date
        format %tdD_m_Y end_date
        And the regression:
        Code:
        stset end_date, failure(died=1) enter(time date_first_tsh) origin(time date_first_tsh) scale(365.25) id(ID)
        
        stcox i.group c.age i.charlson i.sex
        To explain the variables that aren't obvious:
        died: 0 - did not die during observation, 1 - died during observation
        end_date: end of observation. If died=1, then this is the date of death
        sex: 0 - female, 1 - male
        charlson: Charlson comorbidity index, value ranges from 0-3 in this set.
        age: Persons age at the start of observation - (date_first_tsh-birthdate)/365.25
        treatment_date: date the person started treatment. If a person has not received treatment, this date is missing.


        There are several TSH-measurements per individual over time. The first measured TSH is used in my cox-regression, looking at the association between low TSH and mortality.

        But since I have access to other TSH-measurements as well, I would like to investigate the effect of the length exposure (half years with low TSH).
        Now the problem seems to be, that when adding the cumulative variable, in this case cumulative_tsh_3, to the stcox, the ones who have the lowest mortality rate (ie. those who live the longest), are those with the most exposure to low TSH --> and that is counter intuitive and shown in the literature to be untrue.

        So how can i include a cumulative exposure variable into my analysis without introducing survival bias into the data?

        Hope that was a bit more clear.

        Mads

        Comment


        • #5
          Edit: I notice I have included the Charlson Comorbidity Index in the stcox, but omitted it from the dataex. Removed.
          Code:
          stset end_date, failure(died=1) enter(time date_first_tsh) origin(time date_first_tsh) scale(365.25) id(ID)
          
          stcox i.group c.age i.sex

          Comment


          • #6
            As you speculated,our setup has caused what is known as "guarantee time bias" or "immortal time bias". Your TSH "half-year" exposures are something known only after the start of followup. Someone with a three-half-year measure, for example, had to survive 18 months to get that measure. There are many references online, e.g. Giobbie-Hurder (2013)

            You will need to organize your data as multiple record data. There are examples in the manual entry for stset. Start with a dataset that has a record for each TSH measurement; one for with the measurement on that date; the classified value (high or low) and the usual dates of entry, treatment start (if not the same), end of observation and status (failed or not)

            The idea is to be able to compute for any date _t of followup the exposure measure at that date. Such changeable measures are "time-varying" covariates. The rule is that you can only use covariate information ascertained prior to time _t to estimate exposure at _t. You might need some snapspan andwill certainly need stsplit, at(failures)

            At any time t, people with the same cumulative measure of "exposure" C(t) will also have the same average exposure, where average is average per unit time: cumulative average A(t)= C(t)/t. If TSH ascertainments took place at equal intervals, this average will be proportional to C(t)/(number of measurements). In other words, cumulative totals and averages per time-on study are equivalent measures. You can, ask whether exposures for longer-terms are more important than exposure for shorter terms by using" time-varying coefficients".. See pp 81 and 82 of the Stata 14 Survival maual.

            Here are some examples of cumulative measures.

            1. Fraction of of prior measurements in the low category

            2. Average of prior measurements (denominator time or number. of measurements)

            3. Weighted average of prior measurements, with higher or lower weights given to more recent values, depending on the theory for the relation of TSH level to mortality.

            4. Number of lifetime months in low category.

            I think it is a mistake to start analysis with dichotomous categories, such as "low or "high"" Such dichotomization can lose too much information. You can later on decide if it is safe to categorize exposure.


            Reference:

            Giobbie-Hurder, Anita, Richard D Gelber, and Meredith M Regan. 2013. Challenges of guarantee-time bias. Journal of Clinical Oncology 31, no. 23: 2963-2969.

            http://bcb.dfci.harvard.edu/ibcsg/Pu...TBBias_JCO.pdf

            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Steve,
              What a great reference - that seems to cover the subject matter brilkiantly.

              Can i just ask, when you say stsplit at(failure) do you meen death, or half years with low TSH?

              Lars

              Comment


              • #8
                I should have written
                Code:
                stsplit, at(failures)
                "at(failures)" is an option of stsplit and is explained in the documentation.
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  Steve
                  Thank you very much for your explanation. As the first post mentions, I do have a data-set with multiple entries pr ID, one for each date of TSH-measurment.

                  ā€‹I've utilized stsplit earlier, I looked into this in this post:

                  http://www.statalist.org/forums/foru...iates-in-stcox

                  In this, I found out how to control for the time before an untreated individual starts treatment, thus allowing these individuals to act as controls until the time they begin treatment. Thus my code looks like this:
                  Code:
                  stsplit treatment, at(0) after(treatment_date)
                  replace treatment=treatment+1
                  This gives every person a 0 in treatment before started treatment. If a person starts treatment the same day, they will not be split.

                  If I understand your post correctly, you are proposing that i use stplit after stset'ing my data with failures being the different dates of TSH measurements. This will give me the opportunity to create a variable that will be the total time exposed, and I can thus determine this fraction: time_exposed/follow_up_time. This does present the problem that some people only have 1 or 2 measurements, and the last one may be low, and this person will then be in the low category for the remainder of follow-up, which the person most likely isn't. To counter this, I am proposing that a person who has not had a TSH measurement in 180 days automatically shifts to the euthyroid group, regardless of the last measurement. I do not, though, know how to code this. I am looking into finding a code that will let me recode an entry of low_tsh=1 to low_tsh=0 if more than 180 days have passed since the last measurment.

                  Comment


                  • #10
                    No, You misunderstand what stsplit(at failures) does. It ensures that at each failure time t, the most recent measure of a time-varying covariate is recorded at t for everyone in the risk set at t. It has nothing to do with defining what a failure is--you already did that with your stset statement. Note that the actual covariate might not be a measured value of TSH, but one of the functions of previous values I mentioned.

                    Moving people to the euthyroid group ("low group")) is likely an error. It apparently violates the well-known principle that "absence of evidence is not evidence of absence". But if the probability of a measurement was related to a person's observed health (e.g. no symptoms->no test), then absence of a measurement is informative in itself.

                    A study of the kind that you are trying to do requires measurements at more or less regular intervals over the entire period of followup for each person. If data are very sparse for only a small fraction of individuals, then excluding them is a possible solution; Otherwise I'm not sure what can be done.
                    Last edited by Steve Samuels; 05 May 2016, 14:11.
                    Steve Samuels
                    Statistical Consulting
                    [email protected]

                    Stata 14.2

                    Comment


                    • #11
                      Please forgive me for not being that experienced in survival time statistics, if I understand your last post correctly, I stset with failure being death, and then afterwards stsplit at the dates of TSH-measurement. This will give allow me to see whether a person is exposed in different periods of time, Is that correct?

                      Regarding the other matter, I have blood tests both from the patient's GP and from the hospital. For the sake of argument, let's assume that a person who has a low TSH at a given time, and then no more blood samples for the rest of follow-up, it will then be safe to assume that this patient either exhibits no symptoms and perhaps has been declared euthyroid by their GP or a hospital doctor. I think then, that it is fair to move this person to the euthyroid group for the rest of follow-up.

                      This will also apply to a person who then after, let's say, 1 or 2 years has another blood test with low TSH, then this patient has to be moved back to the exposed group.

                      Comment


                      • #12
                        "This will give allow me to see whether a person is exposed in different periods of time, Is that correct".

                        Not exactly, -stsplit- will only assign the last value of whatever time-varying covariates there are to the current time (that of failure of some subject). Whether that was the actual exposure at the current time is what is in question.

                        I understand your decision to move people into the euthyroid group-it is what I suspected. This makes the actual classification criterion not TSH, level but rather the presence or absence TSH-level-related symptoms.
                        Steve Samuels
                        Statistical Consulting
                        [email protected]

                        Stata 14.2

                        Comment

                        Working...
                        X