Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • stcrreg runs very slowly

    I'm trying to estimate the competing risks hazard model of Fine and Gray (1999) using the stcrreg command in Stata/MP 15.0, running on a ~40 core Linux server with ~500GB RAM. My data are fairly large, consisting of a panel of delinquent mortgages, with one observation per loan per month. The outcome of interest is foreclosure, with competing risks of pre-payment (selling the house or refinancing) and curing (coming current on the loan). We do need the panel structure as there are several important time-varying covariates.

    The problem is that the stcrreg command seems to run very, very slowly despite having plenty of RAM and computational power available. Running on a 0.1% sample of our data, with about 15,000 observations on ~2000 loans (about 230 failures), it takes more than 3 hours for the model to converge. With a 1% sample, the model ran for at least 24 hours before we killed it. By comparison, ignoring the competing events and estimating the same specification with stcox takes less than 5 minutes even with the 1% sample.

    Anyone know why this happens, and what the best way is to deal with it? Is it just that stcrreg was never parallelized, or otherwise poorly optimized?

    I've come across the stcrprep package by Peter Lambert on SSC, which by its description sounds like it might speed things up, but I'm puzzled at the overall slowness in the built-in package.

    Thanks,
    -Ryan Sandler

  • #2
    Having now tried stcrprep from SSC (which is supposed to pre-calculate the weights used in stcrreg so as to speed up the process), and it turns out that it can't be used with data that has been stset with multiple observations per individual. So, it seems my question is really: how can I make stcrreg run faster on data with multiple observations per individual?

    Comment


    • #3
      I expect you've already ruled this out, but it would be remiss not to ask: You told us your Linux box has ~40 cores, but how many of those is your copy of Stata/MP licensed for, and how many is it actually configured to make use of? The output of creturn list will reveal that information, as well as whether your Stata/MP environment has been set up to use the maximum number of cores allowed on your license.

      With that said, if you don't hear from other Statalist members with relevant experience to share, you might direct your question to Stata Technical Services. I imagine they have a better knowledge of the performance of stcrreg under Stata/MP.

      Comment


      • #4
        A fair question. The Stata install is licensed to 8, but defaults to 4. Still ought to get reasonable speed here particularly given that a Cox model runs reasonably quickly.

        Hoping to get suggestions of work-arounds from Statalist, since stcrprep isn't an option for my application.

        Comment


        • #5
          Not a real answer to your question, but if your goal is to estimate cumulative incidence functions given covariates, you can do it "indirectly" by fitting multiple Cox regressions (or parametric survival models) (under different proportionality assumptions than in the Fine-Gray model). See for example: http://data.princeton.edu/pop509/justices2.html

          However, this suggestion doesn't apply if you're interested to quantify the impact of covariates on the sub-hazard function.

          Comment


          • #6
            Hello, I have a similar problem. Dataset with 60k obs (1 obs = 1 id). Any update/ solution?

            Comment


            • #7
              Apart from my suggestion in #5, you might want to take a look at -stcrprep- (from SJ, see also https://www.stata-journal.com/articl...article=st0471) and at -stpm2cr- (from SJ, see also http://www.stata-journal.com/article...article=st0482)

              Comment


              • #8
                I've recently had the same problem. A model with 90.000 subjects (1 row per subject) and daily time was computing for ages on a server with 500 GB RAM. Killed it after 24h. But I've noticed that it computes faster after decreasing time precision to months or years. "One reason for this is that everytime you fit a model using stcrreg you the probability of censoring weights are calculated and the data must be expanded (in the background) when maximising the likelihood." (https://pclambert.net/software/stcrp...onal_benefits/) So I guess monthly/yearly data takes less space when expanded. It might not be an ideal solution in every case but in my case, decreased precision was good enough.

                Comment


                • #9
                  I'm having the same problem. STATA takes more than an hour before crashing against a competing risk in a multiple imputed dataset with logs regression both with one and multiple covariates. I can't get my head around this

                  Comment

                  Working...
                  X