Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aggregating Kaplan Meier Data via Bootstrap Method

    Hi All,

    I am running a meta-analysis on aggregate time-to-event data. This has led me to extract (graphically) a series of Kaplan-Meiers from a large number of studies (about 400).
    I now am in the process of aggregating the data to offer a cumulative survival estimate for the included studies. Although the DerSimonian-Laird method was the initial test of choice, due to the lack of reported Numbers at Risk, estimating the variance of each study has become difficult. Hence why, I am now opting to use the Bootstrap method to aggregate the multiple Kaplan-Meiers.

    A brief summary of the data:
    • Around 300 KMs, all extracted from different studies. The heterogeneity in each study will be handled via subgroup analysis (subgroup aggregation of KMs)
    • 30,000 individuals across all studies, with an average KM size of 20-50 individuals
    • The numbers at risk are available for 40%-ish of the extracted KMs, but the sampling (time) of the numbers at risk is wider than that of the data from the KM (hence why even if I used the numbers at risk, I would lose precision in the KM sampling)
    • This data was extracted graphically as a pair of X,Y co-ordinates, where X is expressed in months and Y as a % of overall survival. The total (N) count of individuals in each KM is known, but the individual censoring/when individuals were lost to follow-up is not. The Hazard ratios for each Kaplan Meier is therefore assumed to be varying with time.
    Is my assumption that a bootstrap method is the most correct way to aggregate this data correct?

    If so, how should I format the data for import to Stata? Currently the KMs are formatted as:
    A. B. C. D. E. F. ....
    Ma1 %a1. Na. Mb1. %b1. Nb. ....
    Ma2 %a2. Na. Mb2. %b2. Nb. ....
    ... ..... .... .... ..... ....

    Where the first KMa has the time variable (Ma) in column A and the respective % survival variable (%a) in column B, with the total sample size in C (Na). The second KMb then is adjacent in columns D E F, following the same pattern.

  • #2
    Eduardo, hi.

    I am not sure what your question is. It seems you have multiple questions.

    The bootstrapping approach to approximate the uncertainty around log-hazard ratio estimates (based on aggregate data) sounds odd to me. I am familiar with dozens of approaches to approximate the standard error (SE) of the log-hazard ratio, but I am not aware of any bootstrapping method using aggregate data. In principle, bootstrapping methods can be used with individual patient data, which does not seem to be the case.

    Usually, the most common approaches to approximate the SE(logHR) are:
    • Reconstructed KM plots (you will need the digitalized coordinates, and the number of participants at risk per time point);
    • two-sided P-values from survival models;
    • Log-rank tests;
    • The number of events in each group and the total number of participants in each group;

    Check these references:
    1. Whitehead, A. Meta-Analysis of Controlled Clinical Trials, 2002 (page 235)
    2. https://trialsjournal.biomedcentral....1745-6215-8-16
    3. https://bmcmedresmethodol.biomedcent...74-021-01308-8

    After testing many alternative methods to reconstruct KM plots, I found out that the approach and shiny interface from #3. were the most user-friendly.

    Hope this helps.


    Tiago
    Last edited by Tiago Pereira; 12 Jul 2023, 17:20.

    Comment


    • #3
      I will have a read of the above, many thanks Tiago

      Comment

      Working...
      X