stcrreg runs very slowly

Ryan Sandler

Join Date: May 2014

Posts: 28
#1

stcrreg runs very slowly

27 Feb 2018, 11:55

I'm trying to estimate the competing risks hazard model of Fine and Gray (1999) using the stcrreg command in Stata/MP 15.0, running on a ~40 core Linux server with ~500GB RAM. My data are fairly large, consisting of a panel of delinquent mortgages, with one observation per loan per month. The outcome of interest is foreclosure, with competing risks of pre-payment (selling the house or refinancing) and curing (coming current on the loan). We do need the panel structure as there are several important time-varying covariates.

The problem is that the stcrreg command seems to run very, very slowly despite having plenty of RAM and computational power available. Running on a 0.1% sample of our data, with about 15,000 observations on ~2000 loans (about 230 failures), it takes more than 3 hours for the model to converge. With a 1% sample, the model ran for at least 24 hours before we killed it. By comparison, ignoring the competing events and estimating the same specification with stcox takes less than 5 minutes even with the 1% sample.

Anyone know why this happens, and what the best way is to deal with it? Is it just that stcrreg was never parallelized, or otherwise poorly optimized?

I've come across the stcrprep package by Peter Lambert on SSC, which by its description sounds like it might speed things up, but I'm puzzled at the overall slowness in the built-in package.

Thanks,
-Ryan Sandler
Tags: None
Ryan Sandler

Join Date: May 2014

Posts: 28
#2

27 Feb 2018, 16:53

Having now tried stcrprep from SSC (which is supposed to pre-calculate the weights used in stcrreg so as to speed up the process), and it turns out that it can't be used with data that has been stset with multiple observations per individual. So, it seems my question is really: how can I make stcrreg run faster on data with multiple observations per individual?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

27 Feb 2018, 17:18

I expect you've already ruled this out, but it would be remiss not to ask: You told us your Linux box has ~40 cores, but how many of those is your copy of Stata/MP licensed for, and how many is it actually configured to make use of? The output of creturn list will reveal that information, as well as whether your Stata/MP environment has been set up to use the maximum number of cores allowed on your license.

With that said, if you don't hear from other Statalist members with relevant experience to share, you might direct your question to Stata Technical Services. I imagine they have a better knowledge of the performance of stcrreg under Stata/MP.
Comment
Ryan Sandler

Join Date: May 2014

Posts: 28
#4

27 Feb 2018, 17:30

A fair question. The Stata install is licensed to 8, but defaults to 4. Still ought to get reasonable speed here particularly given that a Cox model runs reasonably quickly.

Hoping to get suggestions of work-arounds from Statalist, since stcrprep isn't an option for my application.
Comment
Andrea Discacciati

Join Date: Feb 2016

Posts: 194
#5

28 Feb 2018, 04:37

Not a real answer to your question, but if your goal is to estimate cumulative incidence functions given covariates, you can do it "indirectly" by fitting multiple Cox regressions (or parametric survival models) (under different proportionality assumptions than in the Fine-Gray model). See for example: http://data.princeton.edu/pop509/justices2.html

However, this suggestion doesn't apply if you're interested to quantify the impact of covariates on the sub-hazard function.
Comment
Federico Cav

Join Date: Mar 2017

Posts: 11
#6

13 Aug 2018, 08:31

Hello, I have a similar problem. Dataset with 60k obs (1 obs = 1 id). Any update/ solution?
Comment
Andrea Discacciati

Join Date: Feb 2016

Posts: 194
#7

13 Aug 2018, 08:45

Apart from my suggestion in #5, you might want to take a look at -stcrprep- (from SJ, see also https://www.stata-journal.com/articl...article=st0471) and at -stpm2cr- (from SJ, see also http://www.stata-journal.com/article...article=st0482)
1 like
Comment
Honorata Bogusz

Join Date: May 2021

Posts: 8
#8

12 May 2023, 03:04

I've recently had the same problem. A model with 90.000 subjects (1 row per subject) and daily time was computing for ages on a server with 500 GB RAM. Killed it after 24h. But I've noticed that it computes faster after decreasing time precision to months or years. "One reason for this is that everytime you fit a model using stcrreg you the probability of censoring weights are calculated and the data must be expanded (in the background) when maximising the likelihood." (https://pclambert.net/software/stcrp...onal_benefits/) So I guess monthly/yearly data takes less space when expanded. It might not be an ideal solution in every case but in my case, decreased precision was good enough.
Comment
Paolo Costa

Join Date: Sep 2023

Posts: 19
#9

09 Oct 2023, 06:23

I'm having the same problem. STATA takes more than an hour before crashing against a competing risk in a multiple imputed dataset with logs regression both with one and multiple covariates. I can't get my head around this
Comment

Announcement

stcrreg runs very slowly

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment