Speed up cox regression with bootstrap

Marianne Heins

Join Date: Oct 2018

Posts: 4
#1

Speed up cox regression with bootstrap

08 Feb 2023, 04:11

Dear Stata-users,

I am running a large number of cox-regressions with bootstrap (50 rep) and storing several estimates in a tempfile.
This takes a lot of time (multiple hours). Is there maybe a way to speed things up?
I am using Stata MP 16.1.

Thanks!
Marianne Heins

This is the code I am using:

forval subgroeplft=1/4 {

set more off
tempname nc1tot5subgrlft`subgroeplft'
postfile `nc1tot5subgrlft`subgroeplft'' subgroeplft icpc nicpc double(basesurv hr p) FUPcat using "$BEWERKT\Tussenbestanden\nc1tot5subgrlft`subg roep lft'.dta", replace

forval i=1001(1)2629 {
capture confirm variable icpc`i'
if !_rc {
gen exit=t_incdatplus5
format exit %td

gen entry=t_incdatplus1
format entry %td

stset exit, failure(icpc`i'==1) origin(time entry)

quietly stcox case if subgroeplft==`subgroeplft', vce(boot, seed(12345))

if _rc!=0 {
display "`i': regression failed"
}
else {

matrix t = r(table)
matrix list t
scalar hr = t[1,1]
scalar pwaarde= t[4,1]
predict xb, xb
predict s, basesurv
count if icpc`i'==1 & case==1 & subgroeplft==`subgroeplft'
scalar nicpc = r(N)
egen tt=max(_t) if subgroeplft==`subgroeplft'
sum s if _t==tt, meanonly
scalar basesurv=r(mean)
drop xb s tt
post `nc1tot5subgrlft`subgroeplft'' (`subgroeplft') (`i') (nicpc) (basesurv) (hr) (pwaarde) (3)
}
drop entry exit
stset, clear
}
else {
display "icpc`i' does not exist"
}

}
postclose `nc1tot5subgrlft`subgroeplft''
}

Last edited by Marianne Heins; 08 Feb 2023, 04:13.
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10207

08 Feb 2023, 04:38

You could try using frames instead of temporary files that hold data in memory and therefore slow things down.

Code:

help frames

Your code could also be more efficient.

matrix t = r(table)
matrix list t

There is no need to create a separate matrix that duplicates r(table) or list it every time. You can reference r(table) and extract its elements directly in Stata16 and above.

Code:

sysuse auto, clear
regress price mpg weight disp
di r(table)["b", "mpg"]
di r(table)["se", "weight"]
di r(table)["pvalue", "displacement"]

Res.:

Code:

. regress price mpg weight disp

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(3, 70)        =      9.74
       Model |   187000328         3  62333442.8   Prob > F        =    0.0000
    Residual |   448065068        70  6400929.54   R-squared       =    0.2945
-------------+----------------------------------   Adj R-squared   =    0.2642
       Total |   635065396        73  8699525.97   Root MSE        =      2530

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -51.30545   86.87821    -0.59   0.557    -224.5786    121.9677
      weight |   1.486438   1.026837     1.45   0.152    -.5615243      3.5344
displacement |   2.357987   7.239564     0.33   0.746    -12.08087    16.79684
       _cons |   2304.461   3783.453     0.61   0.544    -5241.397     9850.32
------------------------------------------------------------------------------

.
. di r(table)["b", "mpg"]
-51.30545

.
. di r(table)["se", "weight"]
1.0268371

.
. di r(table)["pvalue", "displacement"]
.74561667

.

I cannot be more helpful without a reproducible example. See FAQ Advice #12 on how to provide a data sample using the dataex command. But my guess is that the replications are taking the bulk of the time, and here there may be very limited options to speed things up.

Last edited by Andrew Musau; 08 Feb 2023, 04:44.

Comment

Felix Bittmann

Join Date: Aug 2018

Posts: 701
#3

08 Feb 2023, 05:47

I assume the central question is how long it takes you to estimate a single regression model. Say, this is 1 minute, then your bootstrap is, approx. 50 minutes in total. What you can do is using parallel to speed this up, see https://github.com/gvegayon/parallel

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment

Announcement