Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating person time and incidence rates

    Hi all,

    I wanted to know how to calculate person time and incidence rates given data that is set up in a similar fashion:

    ID 001 Hearttransplantdate 3/23/1992 Testdate 12/30/2003 Cardiovascular disease (CVD) 0
    ID 001 Hearttransplantdate 3/23/1992 Testdate 5/2/2004 CVD 0
    ID 001 Hearttransplantdate 3/23/1992 Testdate 5/13/2004 CVD 0
    ID 001 Hearttransplantdate 3/23/1992 Testdate 7/19/2004 CVD 1
    ID 002
    ID 002
    ID 003
    ID 003
    ID 003
    .
    .
    .

    My failure event would be cardiovascular disease (indicated by 1) or the last time they had a test date. My origin would be when they had the transplant date. I want to calculate the person-time in months.
    Last edited by CEdward; 28 Dec 2014, 14:05.

  • #2
    Hi Jack,

    Not sure how big your data is, but you might want to import to excel and have a play around with it before putting it into STATA. You could use pivot tables to get the Min date for each translplantdate and max date for each Testdate by ID. Then export to Stata after that. Otherwise, the Introduction to Survival Analysis book has chapters on how to set survival time etc. Its a good read, here is the link

    http://www.stata.com/bookstore/survi...-introduction/

    Cheers,
    Dan

    Comment


    • #3
      Thanks for the reply Daniel. Anybody else know how to do this?

      Comment


      • #4
        the structure of the data shown in #1 above makes no sense to me; do you really have variable (name not shown) that is equal to "ID" in every observation? at any rate; look at the help for -snapspan-; when you re-post, please show us (following the instructions in the FAQ what your data really look like

        Comment


        • #5
          I am confused by what you are asking Rich. What do you mean by "do you really have variable (name not shown)...observation?".

          Comment


          • #6
            Originally posted by Rich Goldstein View Post
            the structure of the data shown in #1 above makes no sense to me; do you really have variable (name not shown) that is equal to "ID" in every observation? at any rate; look at the help for -snapspan-; when you re-post, please show us (following the instructions in the FAQ what your data really look like

            Also, I don't think I should be using the snapspan command. Indeed, I do not know the time span information and I have measurements which differ at each test date. However, my data violates the assumption that these measurements (i.e. height/weight) are constant b/w two dates.
            Last edited by CEdward; 28 Dec 2014, 15:21.

            Comment


            • #7
              Jack:
              I believe that the confusion about the structure of your data comes from the fact that you intermingle variable names with variable values. Hopefully that is not the real structure of your data. Here is how I believe (or hope) that your first four observations look:

              Code:
              clear
               input id str10 s_htdate str10 s_testdate cvd
               1 "3/23/1992" "12/30/2003" 0
               1 "3/23/1992" "5/12/2004" 0
               1 "3/23/1992" "5/13/2004" 0
               1 "3/23/1992" "7/19/2004" 1
               end
              
              .  list , clean
                     id    s_htdate   s_testdate   cvd 
                1.    1   3/23/1992   12/30/2003     0 
                2.    1   3/23/1992    5/12/2004     0 
                3.    1   3/23/1992    5/13/2004     0 
                4.    1   3/23/1992    7/19/2004     1
              I gave your dates an s_ prefix because I had to enter them as strings. But to proceed you need them as Stata dates. I use the less ambiguous %td format:
              Code:
              gen htdate = date(s_htdate,"MDY")
               gen testdate = date(s_testdate,"MDY")
               format htdate testdate %td
              drop s_*
              gen htdate = date(s_htdate,"MDY")
               gen testdate = date(s_testdate,"MDY")
               format htdate testdate %td
               drop s_*
              
              .  list , clean
                     id   cvd      htdate    testdate 
                1.    1     0   23mar1992   30dec2003 
                2.    1     0   23mar1992   12may2004 
                3.    1     0   23mar1992   13may2004 
                4.    1     1   23mar1992   19jul2004
              Now, first find the enddate = last testdate for each id. Next, replace the enddate by the testdate when cvd occurred:

              Code:
              sort id testdate
               by id: egen enddate = max(testdate)
               format enddate %td
               replace enddate = testdate if cvd==1
              
              .  list, clean
                     id   cvd      htdate    testdate     enddate 
                1.    1     0   23mar1992   30dec2003   19jul2004 
                2.    1     0   23mar1992   12may2004   19jul2004 
                3.    1     0   23mar1992   13may2004   19jul2004 
                4.    1     1   23mar1992   19jul2004   19jul2004
              I would definitely not encourage to do these things in Excel, as suggested in post #2.

              Hope this helps

              Comment


              • #8
                Jack,

                while I expect that Svend is correct about the actual structure of your data, the only way to be sure is to show us (using the "code" delimiters (reminder - read the FAQ))

                Comment


                • #9
                  Hi Svend, this is an excellent response to my post and it is exactly what I am after. But, how would I now determine the time in months between the enddate and the htdate?

                  Originally posted by Svend Juul View Post
                  Jack:
                  I believe that the confusion about the structure of your data comes from the fact that you intermingle variable names with variable values. Hopefully that is not the real structure of your data. Here is how I believe (or hope) that your first four observations look:

                  Code:
                  clear
                  input id str10 s_htdate str10 s_testdate cvd
                  1 "3/23/1992" "12/30/2003" 0
                  1 "3/23/1992" "5/12/2004" 0
                  1 "3/23/1992" "5/13/2004" 0
                  1 "3/23/1992" "7/19/2004" 1
                  end
                  
                  . list , clean
                  id s_htdate s_testdate cvd
                  1. 1 3/23/1992 12/30/2003 0
                  2. 1 3/23/1992 5/12/2004 0
                  3. 1 3/23/1992 5/13/2004 0
                  4. 1 3/23/1992 7/19/2004 1
                  I gave your dates an s_ prefix because I had to enter them as strings. But to proceed you need them as Stata dates. I use the less ambiguous %td format:
                  Code:
                  gen htdate = date(s_htdate,"MDY")
                  gen testdate = date(s_testdate,"MDY")
                  format htdate testdate %td
                  drop s_*
                  gen htdate = date(s_htdate,"MDY")
                  gen testdate = date(s_testdate,"MDY")
                  format htdate testdate %td
                  drop s_*
                  
                  . list , clean
                  id cvd htdate testdate
                  1. 1 0 23mar1992 30dec2003
                  2. 1 0 23mar1992 12may2004
                  3. 1 0 23mar1992 13may2004
                  4. 1 1 23mar1992 19jul2004
                  Now, first find the enddate = last testdate for each id. Next, replace the enddate by the testdate when cvd occurred:

                  Code:
                  sort id testdate
                  by id: egen enddate = max(testdate)
                  format enddate %td
                  replace enddate = testdate if cvd==1
                  
                  . list, clean
                  id cvd htdate testdate enddate
                  1. 1 0 23mar1992 30dec2003 19jul2004
                  2. 1 0 23mar1992 12may2004 19jul2004
                  3. 1 0 23mar1992 13may2004 19jul2004
                  4. 1 1 23mar1992 19jul2004 19jul2004
                  I would definitely not encourage to do these things in Excel, as suggested in post #2.

                  Hope this helps

                  Comment


                  • #10
                    Find the time in days by:
                    Code:
                    generate pdays=enddate-htdate
                    With the time in days, any transformation to months, years, or millenia is trivial. For simple estimation of incidence rates, see:
                    Code:
                    help strate
                    help stptime

                    Comment

                    • This topic by CEdward has been deleted by CEdward

                      #10
                      Also, I am not sure if I should snapspan my data. The reason being that measurements are taken intermittently with different dates associated with those measurements, but the snapspan function makes me assume that those measurements stay constant between dates, which isn't the case for my data.

                    • #11
                      Originally posted by Svend Juul View Post
                      Find the time in days by:
                      Code:
                      generate pdays=enddate-htdate
                      With the time in days, any transformation to months, years, or millenia is trivial. For simple estimation of incidence rates, see:
                      Code:
                      help strate
                      help stptime
                      Hi Svend, when we are using the spapspan command, is it possible to apply if qualifers (e.g. Months if (Months !=.))

                      Last edited by CEdward; 06 Jan 2015, 20:38.

                      Comment


                      • #12
                        Jack,

                        According to the documentation you cannot use the -if- qualifier with -snapspan-. But I don't understand what you are trying to accomplish. What do you mean by "Months if (Months !=.)?

                        Comment


                        • #13
                          Perhaps you could tell me better if the data above (which is similar to mine) is an example of data in spanspan. For each id the individual has measurements that are taken on them on various different dates and on those same dates the practitioner is trying to determine if they have CVD. Would that be data in snapspan?

                          Comment


                          • #14
                            I see no point in using -snapspan- for these data. You can -stset- them as they are and go ahead with, for example, -sts graph- or -stcox-.

                            There are worries, however, related to the circumstances leading to testing. The event of interest (cvd) is determined by a test. The interpretation is different if tests are taken at predetermined points in time or if they are taken due to changes in the patient's condition.

                            Code:
                            .  stset testdate , failure(cvd==1) origin(htdate) id(id)
                            
                                            id:  id
                                 failure event:  cvd == 1
                            obs. time interval:  (testdate[_n-1], testdate]
                             exit on or before:  failure
                                t for analysis:  (time-origin)
                                        origin:  time htdate
                            
                            ------------------------------------------------------------------------------
                                    4  total observations
                                    0  exclusions
                            ------------------------------------------------------------------------------
                                    4  observations remaining, representing
                                    1  subject
                                    1  failure in single-failure-per-subject data
                                 4501  total analysis time at risk and under observation
                                                                          at risk from t =         0
                                                               earliest observed entry t =         0
                                                                    last observed exit t =      4501
                            
                            . list, clean
                            
                                   id   cvd      htdate    testdate     enddate   _st   _d     _t    _t0 
                              1.    1     0   23mar1992   30dec2003   19jul2004     1    0   4299      0 
                              2.    1     0   23mar1992   12may2004   19jul2004     1    0   4433   4299 
                              3.    1     0   23mar1992   13may2004   19jul2004     1    0   4434   4433 
                              4.    1     1   23mar1992   19jul2004   19jul2004     1    1   4501   4434

                            Comment

                            Working...
                            X