Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to do bootstrap for internal validation in Cox regression : training Dxy

    Hi everyone,

    I am trying to do internal validation in Cox regression in order to compare with R (rms package, validate function) and learn how to do in flexible parametric survival in the near future.

    1) I have done this in standard rms package in R as below : I got results: the original Dxy = 0.1832, training Dxy = 0.1930. I use this as gold standard

    library(rms)
    library(rstpm2)
    data(brcancer)

    cphfit <- cph(Surv(rectime,censrec==1)~hormon + x1 + x2 + x3,data=brcancer, x=TRUE, y=TRUE)
    #logfit <- lrm(censrec~hormon + x1 + x2 + x3,data=brcancer, x=TRUE, y=TRUE)
    #stpm2fit <- stpm2(Surv(rectime,censrec==1)~hormon + x1 + x2 + x3,data=brcancer, x=TRUE, y=TRUE)

    #set.seed(129)
    validate(cphfit, B=200)

    2) I have done this in stata below
    webuse brcancer, clear
    stset rectime, failure(censrec)
    save brcancer.dta, replace

    *step0 original Dxy
    stcox hormon x1 x2 x3, nohr
    estat concor
    dis r(D) // Dxy original = 0.1832


    * step1 trainging Dxy
    quietly stcox hormon x1 x2 x3, nohr
    estat concor
    matrix observe=r(D) // training Dxy

    *step2 training Dxy
    capture drop Dxy
    gen Dxy=.
    capture program drop myboot
    program define myboot
    preserve
    bsample
    stcox hormon x1 x2 x3, nohr
    estat concor
    replace Dxy = r(D)
    restore
    end

    *step3 training Dxy
    simulate Dxy=r(D), reps(200) seed(129): myboot

    *step4 training Dxy
    bstat, stat(observe) n(200)
    estat bootstrap, all // look for observed Dxy
    sum Dxy, de // look for training Dxy = mean = observe Dxy + bias ?
    dis .183164 + .0126265 // = mean .1957905

    I am trying to upload the results in picture jpg later

    My question is I am doing the right thing and interpreting correct ?

  • #2
    Results: from R


    validate(cphfit, B=200)

    index.orig training test optimism index.corrected n

    Dxy 0.1832 0.1923 0.1726 0.0196 0.1636 200

    R2 0.0420 0.0487 0.0374 0.0113 0.0307 200

    Slope 1.0000 1.0000 0.9070 0.0930 0.9070 200

    D 0.0079 0.0093 0.0070 0.0023 0.0056 200

    U -0.0006 -0.0006 0.0005 -0.0010 0.0005 200

    Q 0.0085 0.0098 0.0065 0.0033 0.0051 200

    g 0.3451 0.3695 0.3238 0.0457 0.2994 200

    Comment


    • #3
      Results from stata

      estat bootstrap, all // look for observed Dxy

      Bootstrap results Number of obs = 200
      Replications = 200

      ------------------------------------------------------------------------------
      | Observed Bootstrap
      | Coef. Bias Std. Err. [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      Dxy | .183164 .0126265 .03229908 .119859 .246469 (N)
      | .1360066 .2595819 (P)
      | .1106535 .235089 (BC)
      ------------------------------------------------------------------------------

      . sum Dxy, de // look for training Dxy = mean = observe Dxy + bias ?

      r(D)
      -------------------------------------------------------------
      Percentiles Smallest
      1% .1214481 .1106535
      5% .1420535 .1187887
      10% .1540226 .1241075 Obs 200
      25% .1737617 .1285118 Sum of Wgt. 200

      50% .1957305 Mean .1957905
      Largest Std. Dev. .0322991
      75% .2167162 .2640784
      90% .2381284 .2647654 Variance .0010432
      95% .249509 .269256 Skewness .0089156
      99% .2670107 .2846647 Kurtosis 2.713841

      . dis .183164 + .0126265 // = mean .1957905
      .1957905

      Comment


      • #4
        If this is correct. I am trying to do test Dxy and optimism in the near future

        Comment


        • #5
          Thanks for using a built-in dataset to illustrate your problem. You've not had an answer, in part because table columns are very jumbled. It did not help that much of the R output is unexplained. Before posting code and results again, please read FAQ 12, which, among other things, asks that code, results, and data listings be copied into the forum editor between [CODE] and [/CODE] delimiters.

          The greatest difficulty in evaluating your Stata code is that you don't tell us what bootstrap R uses. Perhaps it is the "regular bootstrap" referred to in Steyerberg et al. (2001):
          For the regular bootstrap procedure, the model as estimated in the bootstrap sample was evaluated in the bootstrap sample and in the original sample. The performance in the bootstrap sample represents estimation of the apparent performance, and the performance in the original sample represents test performance. The difference between these performances is an estimate of the optimism in the apparent performance. This difference is averaged to obtain a stable estimate of the optimism. The optimism is subtracted from the apparent performance to estimate the internally validated performance [7]: estimated performance apparent performance average(bootstrap performance test performance).
          The Stata code in post #1 doesn't do the "regular bootstrap", because it does not apply model coefficients from each bootstrap sample to the original sample. If you need help in doing that, start a new topic with a link to this one.

          Note: don't upload JPGS to the list! FAQ 12 is explicit: upload only png files.

          Reference:
          Steyerberg, E. W., Harrell, J., Frank E, Borsboom, G. J. J. M., Eijkemans, M. J. C., Vergouwe, Y., &amp; Habbema, J. D. F. (2001). Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. Journal of Clinical Epidemiology, 54(8), 774-781.
          Last edited by Steve Samuels; 20 Sep 2018, 16:42.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment

          Working...
          X