Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time-to-event analysis for panels STATA 13

    Hi all,

    I have the following database

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(idproduct Year recalls)
     2 2008 0
     2 2009 0
     2 2010 1
     2 2011 0
     2 2012 0
     3 2004 0
     3 2005 0
     5 2004 0
     5 2005 0
     5 2006 0
     5 2007 0
     5 2008 0
     5 2009 0
     5 2010 0
     5 2011 0
     5 2012 0
     5 2013 0
     6 2004 0
     6 2005 0
     6 2006 0
     6 2007 0
     7 2004 0
     8 2007 0
     8 2008 0
     9 2004 0
     9 2005 0
     9 2006 0
     9 2007 1
     9 2008 0
     9 2009 0
     9 2010 0
     9 2011 0
     9 2012 0
     9 2013 0
    10 2004 0
    10 2005 0
    10 2006 0
    10 2007 0
    10 2008 0
    10 2009 0
    10 2010 0
    10 2011 1
    10 2012 0
    10 2013 0
    11 2009 0
    12 2008 0
    12 2009 0
    12 2010 0
    12 2011 0
    12 2012 0
    12 2013 0
    13 2013 0
    14 2004 0
    14 2005 0
    14 2006 0
    14 2007 0
    14 2008 0
    14 2010 0
    14 2011 0
    14 2012 0
    14 2013 0
    15 2004 0
    15 2005 1
    15 2006 0
    15 2007 0
    16 2004 0
    16 2005 0
    16 2006 0
    16 2007 0
    16 2008 0
    16 2009 0
    16 2010 0
    16 2011 0
    16 2012 0
    16 2013 0
    17 2004 0
    17 2005 0
    17 2006 0
    17 2007 0
    18 2008 0
    18 2009 0
    18 2010 0
    19 2007 0
    19 2008 0
    19 2009 0
    19 2010 0
    19 2011 0
    19 2012 0
    19 2013 0
    20 2007 0
    20 2008 0
    21 2004 1
    21 2005 0
    21 2006 0
    21 2007 0
    21 2008 1
    21 2009 0
    21 2010 1
    21 2011 0
    22 2004 0
    end
    where basically are displayed products observed for a number of years in an unbalanced panel. Recalls is a dummy representing the occurrence of an event in a year for a product. I am running STATA 13 and I would like to make a time-to-event plot. Is it possible to do so? I tried but actually struggling.

    Thank you,

    Federico

  • #2
    So this is multiple-observation data with multiple failures. And it is somewhat more complicated because different idproducts begin observation in different years. I think the following is what you need:

    Code:
    by idproduct (Year), sort: gen origin_year = Year[1]
    stset Year, id(idproduct) failure(recalls) origin(time origin_year)
    
    sts graph
    Last edited by Clyde Schechter; 13 Dec 2020, 11:37.

    Comment


    • #3
      Clyde Schechter Thank you Clyde. So my idea was to make it cleverly cross sectional by collapsing by idpr so to construct a database having three variables: years, idpr, recalls, event. years represents the number of years until the observation is present in the data either because of unbalancedness or because the event occurred; recalls is a dummy representing whether the product undergone the event or not; finally event represent the fact that the product is censored basically. Then:

      Code:
      gen event = 1 if years<10
      stset years, failure(event==1)
      sts graph, by(recalls)
      I say that because unfortunately I tried your idea of exploiting multiple indexing and the result its quite obscure to me (please see K-M_multipleindex.pdf). What I would like to do is to confront the failure time of recalled products let's say vs not recalled (the result should be that median time of failure is higher for not recalled products).

      With the code I previously entered I obtained time-to-event-prod (1).pdf which makes sense to me. But actually, being the first time that I am performing a time-to-event analysis in stata I am quite unsure about the outcome...
      Attached Files

      Comment


      • #4
        So if you want to do an analysis of time to first recalls, reducing the data to one observation per product, that would be:

        Code:
        by idproduct (Year), sort: egen time_to_first_recalls ///
            = min(cond(recalls, Year, .))
        by idproduct (Year): gen failed = !missing(time_to_first_recalls)
        by idproduct (Year), sort: replace time_to_first_recalls ///
            = time_to_first_recalls - Year[1]
        
        by idproduct (Year): keep if _n == 1
        drop Year
        
        stset time_to_first_recalls, failure(failed)
        sts graph
        I do not understand the code you show in #3. It refers to a variable, years, that has not been included in your example data, nor mentioned in earlier posts. And you seem to have set some arbitrary 10 year cutoff on this variable to define failure. That seems like a very bad idea because it conflates something happening within 10 years with censoring at the ten year mark. I don't understand what this code is trying to accomplish.

        What I would like to do is to confront the failure time of recalled products let's say vs not recalled (the result should be that median time of failure is higher for not recalled products).
        This makes no sense to me. You stated that failiure means having recalls. Therefore, by definition, non-recalled products have infinite failure time.

        I have not reviewed your PDFs. Like some others, I do not download attachments from people I don't know. If the contents of these files are code or Stata output, you can just paste the content here into the forum editor between code delimieters. If they contain graphs, you can embed them in the forum editor.

        Comment


        • #5
          Clyde Schechter thank you very much for the reply. My original idea was to generate a variable years which represents the number of years until failure. You are right that I should have been clearer on "failure time" meaning. Yet failure is whenever, for some reason a product is dropped from the sample before 10 years which is the entire sample size I have at disposal. I am quite confused on whether failure is "having a recall" or in general "being censored/cut for some reason". I'lll try to be clearer on what I would like to achieve below. Sorry for the confusion.

          I don't understand what this code is trying to accomplish.
          The idea is that I should compare the two groups recalled products vs not recalled products and being able to say something of this kind "If a product has been recalled, then its failure time is (I expect) less than the faillure time of a product that did not undergo a recall". The event occurs every time the panel is unbalanced because for some reason there is a failure. I thought that this was the easiest way to control for thee unbalanceness of the panel by saying: given that both rcalled and nor recalled products might be unbalanced for some reason not necessarily being the recalled, what is the average failure time for both groups? Say this is is the reason why I thought that the graph should be done "by recalls." Maybe I include the whole code that led me to the graph below (hopefully) displayed:

          Code:
          egen rec_sum = sum(recalls), by(idpr)
          bys idpr: gen rec_year = Year if recalls == 1
          egen recyear = max(rec_year), by(idpr)
          drop if Year>recyear & rec_sum
          bysort idpr (Year): gen byte panelsize = _N
          
          
          collapse (first) panelsize (max) recalls, by(idpr)
          rename (panelsize) (years) // years is basically the "number of years in sample"
          
          gen event = 1 if years<10 //panels ending before the time (i.e. 10 years, which is the entire sample)
          stset years, failure(event==1)
          
          set scheme s1mono
          sts graph, by(recalls)
          As far as I understood the code you proposed in #4 is in line with what I asked in a first instance. So basically it generates a variable telling when a product undergone a recall, it ticks with 1 the entire panelsize of a product with thee failed variable and keeps the first observation by panel. The problem is that it gives me weird results for what concerns recalled products, specifically while not recalled products have a shape resembling the usual K-M graphs, for recalled products the result is a straight line on probability 1 (hopefully it should be displayed in weird_resullt.pdf).

          P.s. I don't know if this is the right way to display pdfs but it is the best I can do from the editor. Yet uploading them as attachments is what I have donee in #3. Now I tried with the "image" icon.

          km_survival.pdf ;



          weird_result.pdf
          Attached Files
          Last edited by Federico Nutarelli; 17 Dec 2020, 02:39.

          Comment


          • #6
            Please post Stata graphs as .png not .pdf.

            This is explained directly at https://www.stata-journal.com/articl...article=st0313 12.4

            Pdf attachments will work widely -- even on my phone -- but they oblige people to open them in new windows, and then go back to your question.

            Comment


            • #7
              Please post Stata graphs as .png not .pdf.
              Sorry I will. By the way the images in #5 are visible directly at leat from my screen. If they are not visible I will change them into .png

              Comment


              • #8
                The graphs embedded in #5 are not visible on my setup, nor does clicking on them cause a new window to open with them. They are just inoperable. I do not download attachments from people I don't know, so I have not looked at your PDF files.

                What I have done is run the code you show in #5 on your example data from #1, with the intent of troubleshooting it. But there is nothing to troubleshoot. I cannot reproduce the difficulty you are having. I get two perfectly reasonably looking K-M curves.

                Click image for larger version

Name:	nutarelli.png
Views:	2
Size:	210.2 KB
ID:	1586384
                Click image for larger version

Name:	nutarelli.png
Views:	2
Size:	210.2 KB
ID:	1586385

                I will add one thing. The command -egen recyear = max(rec_year), by(idpr)- leads to recyear being the last year in which the product experienced a recall. Now, that may in fact be what you want for your purposes. But it is more usual for the failure time to be defined as the first such event. If you want the first event to be the failure time, then replace -max- by -min- in that command.

                Comment


                • #9
                  Clyde Schechter I see. Sorry for any inconvenience. I will try to review my code, though maybe I am almost done. Next time I will post images as pdf. Thank you both for you patience and your time!

                  Comment

                  Working...
                  X