Hi All,
I am running a meta-analysis on aggregate time-to-event data. This has led me to extract (graphically) a series of Kaplan-Meiers from a large number of studies (about 400).
I now am in the process of aggregating the data to offer a cumulative survival estimate for the included studies. Although the DerSimonian-Laird method was the initial test of choice, due to the lack of reported Numbers at Risk, estimating the variance of each study has become difficult. Hence why, I am now opting to use the Bootstrap method to aggregate the multiple Kaplan-Meiers.
A brief summary of the data:
If so, how should I format the data for import to Stata? Currently the KMs are formatted as:
A. B. C. D. E. F. ....
Ma1 %a1. Na. Mb1. %b1. Nb. ....
Ma2 %a2. Na. Mb2. %b2. Nb. ....
... ..... .... .... ..... ....
Where the first KMa has the time variable (Ma) in column A and the respective % survival variable (%a) in column B, with the total sample size in C (Na). The second KMb then is adjacent in columns D E F, following the same pattern.
I am running a meta-analysis on aggregate time-to-event data. This has led me to extract (graphically) a series of Kaplan-Meiers from a large number of studies (about 400).
I now am in the process of aggregating the data to offer a cumulative survival estimate for the included studies. Although the DerSimonian-Laird method was the initial test of choice, due to the lack of reported Numbers at Risk, estimating the variance of each study has become difficult. Hence why, I am now opting to use the Bootstrap method to aggregate the multiple Kaplan-Meiers.
A brief summary of the data:
- Around 300 KMs, all extracted from different studies. The heterogeneity in each study will be handled via subgroup analysis (subgroup aggregation of KMs)
- 30,000 individuals across all studies, with an average KM size of 20-50 individuals
- The numbers at risk are available for 40%-ish of the extracted KMs, but the sampling (time) of the numbers at risk is wider than that of the data from the KM (hence why even if I used the numbers at risk, I would lose precision in the KM sampling)
- This data was extracted graphically as a pair of X,Y co-ordinates, where X is expressed in months and Y as a % of overall survival. The total (N) count of individuals in each KM is known, but the individual censoring/when individuals were lost to follow-up is not. The Hazard ratios for each Kaplan Meier is therefore assumed to be varying with time.
If so, how should I format the data for import to Stata? Currently the KMs are formatted as:
A. B. C. D. E. F. ....
Ma1 %a1. Na. Mb1. %b1. Nb. ....
Ma2 %a2. Na. Mb2. %b2. Nb. ....
... ..... .... .... ..... ....
Where the first KMa has the time variable (Ma) in column A and the respective % survival variable (%a) in column B, with the total sample size in C (Na). The second KMb then is adjacent in columns D E F, following the same pattern.
Comment