How to do bootstrap for internal validation in Cox regression : training Dxy

Kullathorn Thephamongkhol

Join Date: Sep 2018

Posts: 12
#1

How to do bootstrap for internal validation in Cox regression : training Dxy

15 Sep 2018, 18:12

Hi everyone,

I am trying to do internal validation in Cox regression in order to compare with R (rms package, validate function) and learn how to do in flexible parametric survival in the near future.

1) I have done this in standard rms package in R as below : I got results: the original Dxy = 0.1832, training Dxy = 0.1930. I use this as gold standard

library(rms)
library(rstpm2)
data(brcancer)

cphfit <- cph(Surv(rectime,censrec==1)~hormon + x1 + x2 + x3,data=brcancer, x=TRUE, y=TRUE)
#logfit <- lrm(censrec~hormon + x1 + x2 + x3,data=brcancer, x=TRUE, y=TRUE)
#stpm2fit <- stpm2(Surv(rectime,censrec==1)~hormon + x1 + x2 + x3,data=brcancer, x=TRUE, y=TRUE)

#set.seed(129)
validate(cphfit, B=200)

2) I have done this in stata below
webuse brcancer, clear
stset rectime, failure(censrec)
save brcancer.dta, replace

*step0 original Dxy
stcox hormon x1 x2 x3, nohr
estat concor
dis r(D) // Dxy original = 0.1832

* step1 trainging Dxy
quietly stcox hormon x1 x2 x3, nohr
estat concor
matrix observe=r(D) // training Dxy

*step2 training Dxy
capture drop Dxy
gen Dxy=.
capture program drop myboot
program define myboot
preserve
bsample
stcox hormon x1 x2 x3, nohr
estat concor
replace Dxy = r(D)
restore
end

*step3 training Dxy
simulate Dxy=r(D), reps(200) seed(129): myboot

*step4 training Dxy
bstat, stat(observe) n(200)
estat bootstrap, all // look for observed Dxy
sum Dxy, de // look for training Dxy = mean = observe Dxy + bias ?
dis .183164 + .0126265 // = mean .1957905

I am trying to upload the results in picture jpg later

My question is I am doing the right thing and interpreting correct ?
Tags: None
Kullathorn Thephamongkhol

Join Date: Sep 2018

Posts: 12
#2

15 Sep 2018, 18:17

Results: from R

validate(cphfit, B=200)

index.orig training test optimism index.corrected n

Dxy 0.1832 0.1923 0.1726 0.0196 0.1636 200

R2 0.0420 0.0487 0.0374 0.0113 0.0307 200

Slope 1.0000 1.0000 0.9070 0.0930 0.9070 200

D 0.0079 0.0093 0.0070 0.0023 0.0056 200

U -0.0006 -0.0006 0.0005 -0.0010 0.0005 200

Q 0.0085 0.0098 0.0065 0.0033 0.0051 200

g 0.3451 0.3695 0.3238 0.0457 0.2994 200
Comment
Kullathorn Thephamongkhol

Join Date: Sep 2018

Posts: 12
#3

15 Sep 2018, 18:19

Results from stata

estat bootstrap, all // look for observed Dxy

Bootstrap results Number of obs = 200
Replications = 200

------------------------------------------------------------------------------
| Observed Bootstrap
| Coef. Bias Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
Dxy | .183164 .0126265 .03229908 .119859 .246469 (N)
| .1360066 .2595819 (P)
| .1106535 .235089 (BC)
------------------------------------------------------------------------------

. sum Dxy, de // look for training Dxy = mean = observe Dxy + bias ?

r(D)
-------------------------------------------------------------
Percentiles Smallest
1% .1214481 .1106535
5% .1420535 .1187887
10% .1540226 .1241075 Obs 200
25% .1737617 .1285118 Sum of Wgt. 200

50% .1957305 Mean .1957905
Largest Std. Dev. .0322991
75% .2167162 .2640784
90% .2381284 .2647654 Variance .0010432
95% .249509 .269256 Skewness .0089156
99% .2670107 .2846647 Kurtosis 2.713841

. dis .183164 + .0126265 // = mean .1957905
.1957905
Comment
Kullathorn Thephamongkhol

Join Date: Sep 2018

Posts: 12
#4

15 Sep 2018, 18:21

If this is correct. I am trying to do test Dxy and optimism in the near future
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#5

20 Sep 2018, 15:49

Thanks for using a built-in dataset to illustrate your problem. You've not had an answer, in part because table columns are very jumbled. It did not help that much of the R output is unexplained. Before posting code and results again, please read FAQ 12, which, among other things, asks that code, results, and data listings be copied into the forum editor between [CODE] and [/CODE] delimiters.

The greatest difficulty in evaluating your Stata code is that you don't tell us what bootstrap R uses. Perhaps it is the "regular bootstrap" referred to in Steyerberg et al. (2001):

For the regular bootstrap procedure, the model as estimated in the bootstrap sample was evaluated in the bootstrap sample and in the original sample. The performance in the bootstrap sample represents estimation of the apparent performance, and the performance in the original sample represents test performance. The difference between these performances is an estimate of the optimism in the apparent performance. This difference is averaged to obtain a stable estimate of the optimism. The optimism is subtracted from the apparent performance to estimate the internally validated performance [7]: estimated performance apparent performance average(bootstrap performance test performance).

The Stata code in post #1 doesn't do the "regular bootstrap", because it does not apply model coefficients from each bootstrap sample to the original sample. If you need help in doing that, start a new topic with a link to this one.

Note: don't upload JPGS to the list! FAQ 12 is explicit: upload only png files.

Reference:
Steyerberg, E. W., Harrell, J., Frank E, Borsboom, G. J. J. M., Eijkemans, M. J. C., Vergouwe, Y., & Habbema, J. D. F. (2001). Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. Journal of Clinical Epidemiology, 54(8), 774-781.

Last edited by Steve Samuels; 20 Sep 2018, 16:42.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

How to do bootstrap for internal validation in Cox regression : training Dxy

Comment

Comment

Comment

Comment