Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Speed up cox regression with bootstrap

    Dear Stata-users,

    I am running a large number of cox-regressions with bootstrap (50 rep) and storing several estimates in a tempfile.
    This takes a lot of time (multiple hours). Is there maybe a way to speed things up?
    I am using Stata MP 16.1.

    Thanks!
    Marianne Heins

    This is the code I am using:

    forval subgroeplft=1/4 {

    set more off
    tempname nc1tot5subgrlft`subgroeplft'
    postfile `nc1tot5subgrlft`subgroeplft'' subgroeplft icpc nicpc double(basesurv hr p) FUPcat using "$BEWERKT\Tussenbestanden\nc1tot5subgrlft`subg roep lft'.dta", replace

    forval i=1001(1)2629 {
    capture confirm variable icpc`i'
    if !_rc {
    gen exit=t_incdatplus5
    format exit %td

    gen entry=t_incdatplus1
    format entry %td

    stset exit, failure(icpc`i'==1) origin(time entry)

    quietly stcox case if subgroeplft==`subgroeplft', vce(boot, seed(12345))

    if _rc!=0 {
    display "`i': regression failed"
    }
    else {

    matrix t = r(table)
    matrix list t
    scalar hr = t[1,1]
    scalar pwaarde= t[4,1]
    predict xb, xb
    predict s, basesurv
    count if icpc`i'==1 & case==1 & subgroeplft==`subgroeplft'
    scalar nicpc = r(N)
    egen tt=max(_t) if subgroeplft==`subgroeplft'
    sum s if _t==tt, meanonly
    scalar basesurv=r(mean)
    drop xb s tt
    post `nc1tot5subgrlft`subgroeplft'' (`subgroeplft') (`i') (nicpc) (basesurv) (hr) (pwaarde) (3)
    }
    drop entry exit
    stset, clear
    }
    else {
    display "icpc`i' does not exist"
    }

    }
    postclose `nc1tot5subgrlft`subgroeplft''
    }
    Last edited by Marianne Heins; 08 Feb 2023, 04:13.

  • #2
    You could try using frames instead of temporary files that hold data in memory and therefore slow things down.

    Code:
    help frames
    Your code could also be more efficient.

    matrix t = r(table)
    matrix list t
    There is no need to create a separate matrix that duplicates r(table) or list it every time. You can reference r(table) and extract its elements directly in Stata16 and above.


    Code:
    sysuse auto, clear
    regress price mpg weight disp
    di r(table)["b", "mpg"]
    di r(table)["se", "weight"]
    di r(table)["pvalue", "displacement"]
    Res.:

    Code:
    . regress price mpg weight disp
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(3, 70)        =      9.74
           Model |   187000328         3  62333442.8   Prob > F        =    0.0000
        Residual |   448065068        70  6400929.54   R-squared       =    0.2945
    -------------+----------------------------------   Adj R-squared   =    0.2642
           Total |   635065396        73  8699525.97   Root MSE        =      2530
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             mpg |  -51.30545   86.87821    -0.59   0.557    -224.5786    121.9677
          weight |   1.486438   1.026837     1.45   0.152    -.5615243      3.5344
    displacement |   2.357987   7.239564     0.33   0.746    -12.08087    16.79684
           _cons |   2304.461   3783.453     0.61   0.544    -5241.397     9850.32
    ------------------------------------------------------------------------------
    
    .
    . di r(table)["b", "mpg"]
    -51.30545
    
    .
    . di r(table)["se", "weight"]
    1.0268371
    
    .
    . di r(table)["pvalue", "displacement"]
    .74561667
    
    .
    I cannot be more helpful without a reproducible example. See FAQ Advice #12 on how to provide a data sample using the dataex command. But my guess is that the replications are taking the bulk of the time, and here there may be very limited options to speed things up.
    Last edited by Andrew Musau; 08 Feb 2023, 04:44.

    Comment


    • #3
      I assume the central question is how long it takes you to estimate a single regression model. Say, this is 1 minute, then your bootstrap is, approx. 50 minutes in total. What you can do is using parallel to speed this up, see https://github.com/gvegayon/parallel
      Best wishes

      (Stata 18.0 MP)

      Comment

      Working...
      X