Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice regarding estimating competing risk

    If i desire to calculate the cumulative incidence of both outcome (fracture=1/no fracture=0) and death (dead=1/alive=0) in a cohort, using competing risk, does the "status" variable that i am using have to have outcome, death and missing combined (outcome = 1 dead = 2 no outcome/no death=3). I have been trying with the command below, but it wont seem to work:
    Code:
    gen outcome = had_fracture
    gen endpoint = min(first_fracture_date, end_follow_date)
    stset endpoint, failure(outcome = 1) scale(365.25) origin(start_follow_date)
    stcompet= , compet(dead=1)
    It is when using "stcompet= , compet(dead=1)", that i am suspecting that i need to combine outcome and dead into one variabel for it to be correct

    And then to depict the (competing) cumulated incidence of fractures and death:
    Code:
    twoway ///
    (line cummort _t if status==1 , c(J)) ///
    (line cummort _t if dead==1 , c(J))
    I cannot use dataex, because i am working with sensitive data, sorry. Please tell me if more information is needed for better comprehension, thank you

  • #2
    To use -stcompet- you must create a single variable with levels for all of the outcomes under consideration. You specify one of them as your failure outcome in the -stset- command, and you specify the others as a numerical list (Stata numlist) in the -compet()- option.

    Also I don't believe the syntax -stcompet = ,...- is allowed. I have very little experience with this command (I usually use -stcrreg-, which implements the Fine & Gray approach), but I believe you have to specify something like -stcompet cummort = ci, ...- And then you should be able to do the graphs the way you describe.

    Comment


    • #3
      Thank you for helping me again Mr. Schechter. I really appreciate it. You are right

      In regards to the help you gave me with the command below. I am trying to estimate the incidence-rate of fractures:

      Code:
      sort person_id stroke_date
      by person_id (stroke_date): gen start_follow_date = stroke_date[1]
      by person_id (admission_date), sort: egen first_fracture_date = min(cond(has_fracture_now, admission_date, .))
      by person_id (admission_date): gen end_follow_date = min(admission_date[_N], death_date)
      by person_id: egen had_fracture = max(has_fracture_now) 
       
      gen dead = 1
      replace dead = 0 if missing(death_date)
      by person_id: egen died = max(dead) 
       
      by person_id: keep if _n == 1
      
       //  FOR FRACTURE OUTCOME ONLY
      gen outcome = had_fracture
      gen endpoint = min(first_fracture_date, end_follow_date)
       
      // INCIDENCE-RATE OF FRACTURES
      stset endpoint, failure(outcome = 1) scale(36525) origin(start_follow_date) id(person_id)
      stptime
      As you can see in the attached picture, the analysis of the incidence-rate has "36.682 observations end on or before enter()"
      Is that because some patients has had a fracture before their stroke date, which means that they are therefore not included when stata does the analysis?
      And if that is the case, how can i assure that if they actually also have a fracture after their stroke, that that is the facture that is included.
      Attached Files

      Comment


      • #4
        Yes, this means that they had a fracture before their first stroke. If your goal is to estimate the incidence of fracture after stroke, though, it would seem appropriate to exclude them, as Stata does. They would not be incident cases.

        On another interpretation, if you want to simply disregard fractures that occur before the first stroke and consider the incidence rate of new fractures that do occur after the stroke, but your data is contaminated with those old, pre-stroke fractures, then you could do this:

        Code:
        sort person_id stroke_date
        by person_id (stroke_date): gen start_follow_date = stroke_date[1]
        by person_id (admission_date), sort: egen first_post_stroke_fx_date = ///
            min(cond(has_fracture_now & admission_date > start_follow_date, admission_date, .))
        by person_id (admission_date): gen end_follow_date = min(admission_date[_N], death_date)
        by person_id: egen had_fracture = max(has_fracture_now) 
         
        gen dead = 1
        replace dead = 0 if missing(death_date)
        by person_id: egen died = max(dead) 
         
        by person_id: keep if _n == 1
        
         //  FOR FRACTURE OUTCOME ONLY
        gen outcome = had_fracture
        gen endpoint = min(first_post_stroke_fx_date, end_follow_date)
         
        // INCIDENCE-RATE OF FRACTURES
        stset endpoint, failure(outcome = 1) scale(36525) origin(start_follow_date) id(person_id)
        stptime
        Changes to previous code are in italics. This code will give you the date of the first post-stroke fracture (if any), rather than the first stroke ever.

        Comment


        • #5
          This is exactly what i would like to do: "On another interpretation, if you want to simply disregard fractures that occur before the first stroke and consider the incidence rate of new fractures that do occur after the stroke, but your data is contaminated with those old, pre-stroke fractures, then you could do this:"

          I used the above command, but there is still 19192 observations end on or before enter(). I have attached a comparison photo, with before and after the new code.

          Attached Files

          Comment


          • #6
            Could your data have some people in it whose final admission date precedes, or is the same as their first stroke date? These would also be people whose data ends on or before their enter date.

            Comment


            • #7
              Yes of course. That is definitely the case. If i then have end of follow up at last admission date, would it then makes sense to include these patients if they did not have fracture or died, because then they would theoretically have zero risk time, because they have not registered admission AFTER their stroke.

              And how can one go about this, if this is the case?

              Comment


              • #8
                Well, Stata is already excluding those people for you: it's telling you about that in that warning message about observations entering on or before enter(). So if you just leave it the way you have it, you're automatically getting this.

                If it bothers you to see that message, you can change the -stset- command as follows:

                Code:
                stset endpoint if start_follow_date < endpoint, failure(outcome = 1) scale(36525) origin(start_follow_date) id(person_id)
                The warning from -stset- will go away. But the results will be unchanged.

                Comment


                • #9
                  Thank you so much! I have now settled on following patients until 2017.

                  In terms of estimating the incidence-rate of specific fractures (diagnosis codes), I have used the code below:

                  Code:
                  sort person_id stroke_date
                  by person_id (stroke_date): gen start_follow_date = stroke_date[1]
                  by person_id (admission_date), sort: egen first_post_stroke_fx_date = ///
                  min(cond(has_fracture_now & admission_date > start_follow_date, admission_date, .))
                  by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .))
                  by person_id (admission_date): gen end_follow_date = min(td(31dec2017), dødsdato)
                  by person_id: egen had_fracture = max(has_fracture_now)
                  
                  gen dead = 1
                  replace dead = 0 if missing(death_date)
                  by person_id: egen died = max(dead)
                  
                  by person_id: keep if _n == 1
                  
                  // FOR FRACTURE OUTCOME ONLY
                  gen outcome = had_fracture
                  gen endpoint = min(first_post_stroke_fx_date, end_follow_date)
                  
                  // INCIDENCE-RATE OF FRACTURES
                  stset endpoint, failure(outcome = 1) scale(365250) origin(start_follow_date) id(person_id)
                  stptime
                  
                  // INCIDENCE-RATE OF DIAGNOSIS SPECIFIC FRACTURE
                  stset endpoint, failure(diagnosiscode = 5) scale(365250) origin(start_follow_date) id(person_id)
                  stptime
                  
                  // INCIDENCE-RATE OF CATEGORICAL FRACTURE-CODE
                  stset endpoint, failure(diagnosiscode = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 //
                  24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45) scale(365250) origin(start_follow_date) id(person_id)
                  stptime
                  When adding all of the diagnosis-specific fractures together, it amounts to 16.256 failures (fractures), but when estimating the incidence rate of fractures in general, it amounts to 34.290 failures (fractures), which indicates that something probably goes wrong when i am generating the diagnosiscode-variable. I just can't seem to figure out what it is. I have attatched a small excerpt of the results, to give you an idea of what i am seeing.
                  Attached Files
                  Last edited by Jonas Kristensen; 04 Feb 2019, 09:06.

                  Comment


                  • #10

                    As you can see in the photo below, it apperently does not included all of the diagnosis codes, when generating the variable "diagnosiscode" (3 i for example missing) Is there another way to estimate these type-specific incidence rates to get around this issue?
                    Attached Files

                    Comment


                    • #11
                      I do not understand what the Table 4 you are showing is and how the numbers in it were calculated, so I can't begin to think about why your results are different from that.

                      Comment


                      • #12
                        Okay
                        Would there be another or better way, command-wise, to estimate the categorical-diagnosis code for fractures than using the below commands:
                        Code:
                        by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .)
                        
                        stset endpoint, failure(diagnosiscode = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ///
                        24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45) scale(365250) origin(start_follow_date) id(person_id)
                        stptime
                        The above is for example diagnosis code DS02, which is the categorical/upper-hierarchical diagnosis code for Cranial and Face Fractures. It includes DS020-DS29A, which has the codes 1-45, which i use as failure above.
                        I think the approach above is correct, but there are simply some of the codes that it does not include in the calculations, which therefore results in to few failures/fractures. A possible solution could be creating af varibale, as you showed me with:
                        Code:
                        by person_id: egen had_fracture = max(has_fracture_now
                        Where every duplicate person_id gets the first diagnosis code that was registered. Because i think it is when removing duplicates that many of the diagnosiscodes are lost, and therefore there are not as many failures as for "had_fracture"
                        Last edited by Jonas Kristensen; 05 Feb 2019, 06:39.

                        Comment


                        • #13
                          I don't see why the code you are using would fail to pick up some fracture codes. Are you sure that all of those fracture codes actually occur in the data as the first fracture after a stroke?

                          Comment


                          • #14
                            Yes i agree. Thank you for answering!
                            The wierd thing is that after i have used the entire list of codes that you showed me, and i have set up time points and removed duplicates, i have 116.519 patients.
                            If i then tabulate "had fracture", there is 34.502 fractures (1's) and i get 34.290 failures when i run "stset endpoint, failure(outcome=1) scale(365.25) origin(start_follow_date)".
                            AND THEN if i tabulate "first_poststroke_fx_date" i get 16.273 who has a date (who had a fracture), and if i tabulate "diagnosiskode" i also get 16.273 who has a diagcode, which also adds up to the amount of failures when estimating the incidence rate for diagnosis specific fractures, which was 16.256 failures (see #9).

                            Doesn't this all mean that the problem actually lies with the variable "had_fracture", and not with diagnosis code? And if it does, could it be solved by generating the outcome variable on the basis of wether or not the patient has a first_poststroke_fx_date instead of using "had_fracture"?

                            The thing i have a hard time understanding, is how there can be 34.290 failures when i run the above stset command, because much more patients must have died before fracture than 34.502-34.290=212
                            Last edited by Jonas Kristensen; 05 Feb 2019, 09:53.

                            Comment


                            • #15
                              Well, the code does not show where the variable has_fracture_now comes from. If I remember from the earlier thread, which was removed from Statalist, that variable was not created by the code but was already existing (or assumed to exist) in the data set. So the question is whether or not has_fracture_now agrees with having a diagcode within that list of codes. Perhaps it does not. Perhaps try -tab diagcode has_fracture_now- to see if every fracture-related diagnosis code has all of its observations in the 1 column of has_fracture_now in that output. Maybe that has_fracture_now variable is missing some of the codes.

                              Comment

                              Working...
                              X