Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Laura Myles
    started a topic stpm2 and mi predictnl

    stpm2 and mi predictnl

    Dear Listers,

    I am running survival analysis using flexible parametric models in -stpm2-.

    I have missing data in one of my covariates. As part of my sensitivity analyses, I am imputing this missing information using multiple imputation. I would like to produce survival plots based on my imputed results and I have found examples online showing this could be achieved using -mi predictnl- just after sptm2. However, this is not straightforward once I run it in on my data so I thought I tried on the original sample data (melanoma.dta).


    I use the following code:

    replace stage = . if stage==0

    mi set flong
    mi register regular subsite agegrp sex
    mi register imputed stage
    mi impute chained (mlogit) stage = i.subsite sex i.agegrp , add(10)

    mi stset surv_mm, fail(status==1 2) id(id) scale(12)
    mi stset surv_mm, failure(status=1,2) scale(12) id(id) exit(time 120.5)

    mi estimate, dots cmdok sav(mi_stpm2,replace): ///
    stpm2 ib1.stage i.agegrp, df(5) bhaz(rate) scale(hazard) nolog eform

    mi predictnl survimp2 = predict(survival at(agegrp 2)) using mi_stpm2

    On this sample data and my data, I get the following error message:

    variable _rcs1 not found
    xb(xb) invalid
    predict(survival at(agegrp 2)) invalid


    _rcs1 is part of the survival analysis output so I am not sure what could be going wrong. I tried updating my stato ado files (update all) but this has not resolved the issue. Any suggestion would be greatly appreciated.

  • Paul Lambert
    replied
    Hi,

    I agree with Paul Dickman that using the timevar() option is often preferable, but you do need to be careful here. We did not see what you wanted to with the predictions. Your original question was around the following
    Code:
     
     mi predictnl survimp2 = predict(survival at(agegrp 2) timevar(_t)) using mi_stpm2
    and we found a solution on how to replicate this. With this you could plot survimp2 vs _t at each level of stage, e.g.
    Code:
    line survimp2 _t if stage==1


    There is an explanation of using the timevar() option here. https://pclambert.net/software/stpm2/stpm2_timevar/

    In the recent code you used the meansurv() option. This predicts marginal survival functions rather than conditional survival functions. Essentially it predicts a survival curve for each individual in your study and then takes an average of the N survival curves. Use of the at() option forces the specified covariates to take specific values.

    So in most cases when you use the
    timevar() option, if you want conditional survival curves, (i.e. for a specific covariate pattern), you should specify all covariate values in the at() option or potentially use the zeros option as well to force unspecified covariates to be zero.

    When using meansurv you only want to specify covariates which you want to force to take specific values, for example when making contrasts between marginal survival functions.

    Paul

    Leave a comment:


  • Laura Myles
    replied
    Paul Dickman Thanks again for the clarification!

    Leave a comment:


  • Paul Dickman
    replied
    Originally posted by Laura Myles View Post
    Paul - I would like to check with you. In a previous thread (https://www.statalist.org/forums/for...r-net-survival), I was advised to create a timevar rather than using _t when using -predict- as below:
    Code:
    range timevar_new 0 10 100
    predict s1_new, meansurv (at sex 1) timevar(timevar_new)
    Is this still an option when working with imputed data or should I use _t?
    Yes, this is the recommended option. By default, -predict- provides predictions for all observations at values of _t. By specifying -timevar(_t)- we are just forcing what should be the default behaviour. I made the predictions at _t to avoid adding extra complexity to the code. The dataset in my example has over 15000 observations and we don't need that many predictions so in practice I would use another temptime variable, such as in your example where we only predict for 100 values of time. One of the reasons it took so long to spot this bug is that most of us routinely use temptime when making predictions.

    Leave a comment:


  • Paul Dickman
    replied
    Thanks Steve Samuels for the nice words about our course. Thanks Laura Myles for identifying this problem. Here is the link to the exercises for our course. I have updated the exercise (285) on multiple imputation. The do file for that exercise can be found here.

    Leave a comment:


  • Laura Myles
    replied
    Thank you all again for your help in resolving my query!

    Paul - I would like to check with you. In a previous thread (https://www.statalist.org/forums/for...r-net-survival), I was advised to create a timevar rather than using _t when using -predict- as below:
    range timevar_new 0 10 100
    predict s1_new, meansurv (at sex 1) timevar(timevar_new) Is this still an option when working with imputed data or should I use _t? Thanks again.
    Last edited by Laura Myles; 07 Nov 2018, 11:27.

    Leave a comment:


  • Steve Samuels
    replied
    Sorry, try this link.

    Leave a comment:


  • Matt Warkentin
    replied
    Steve Samuels I have been following this thread closely. The hyperlink for the course notes does not seem to work for me. Could you provide the link again?

    Leave a comment:


  • Steve Samuels
    replied
    I want to thank Laura, for bringing this problem to the list, and the two Pauls for contributing their great knowledge to this problem. I urge anyone who aspires to use survival techniques to read the notes and exercises for their short course, available here.
    Information about the course itself can be found here.

    Leave a comment:


  • Steve Samuels
    replied
    I wrote to Paul Lambert about this error, and he replied:

    "You can’t add a ci option here, but should add it as an option of the mi predictnl command, e.g., ci(lci uci)"
    I hadn't realized that ci was an option of predictnl. The entire command also requires a force option, so the correct version would be:
    Code:
    mi predictnl survimp2 = predict(survival at(agegrp 2) timevar(_t)) using mi_stpm2, ci(surv_ll surv_ul) force
    Last edited by Steve Samuels; 06 Nov 2018, 18:14.

    Leave a comment:


  • Steve Samuels
    replied
    Unfortunately, trying to add confidence limits to predict also throws an error:
    Code:
    mi predictnl survimp2 = predict(survival at(agegrp 2) timevar(_t) ci) using mi_stpm2
    variable __000009_lci already defined
    r(110);

    Last edited by Steve Samuels; 06 Nov 2018, 12:37.

    Leave a comment:


  • Laura Myles
    replied
    Dear Steve and Paul,

    Thank you so much for looking into my query and finding ways to fix the issue.

    I also appreciate your suggestions on how to improve my code. I will now leave my data with stset before I set up my imputation and include the outcome as suggested in your example.

    Laura



    Leave a comment:


  • Paul Lambert
    replied
    Thanks,

    What is going on here is that when you use -stpm2- it creates sets of restricted cubic spline variables (and their derivatives) that are needed to model the effect of time. When you use the -predict- option after fitting an -stpm2- model the default is to use the spline variables created by -stpm2- when calculating the predictions.

    When using multiple imputation a number of different models are fitted. When making the predictions, the -predict- command tries to use the spline variables, but in the examples above they do not exist and so the error is given.

    One way around this is to use the -timevar()- option of the -predict- command, which causes all the splines variables to be recalculated at the specified values of time. If the variable given is _t, then the mi commands will do what you have tried to do above, or you can choose specific values of time to predict at.

    Thus the following code works,

    Code:
    set more off
    use http://pauldickman.com/survival/colon, clear
    
    stset surv_mm, failure(status=1 2) scale(12) exit(time 10*12)
    replace stage = . if stage==0
    
    
    sts gen H=na
    
    // Comment the following line and mi predictnl returns an error
    // but using the timevar() option will fix this.
    //stpm2 ib1.stage i.agegrp , df(5) scale(hazard) eform nolog
    
    mi set wide
    mi register regular subsite agegrp sex
    mi register imputed stage
    mi register passive _rcs* _d_rcs*
    mi impute chained (mlogit) stage = i.subsite sex i.agegrp H _d, add(10)
    
    mi estimate, dots cmdok sav(mi_stpm2,replace): ///
    stpm2 ib1.stage i.agegrp, df(5) scale(hazard) nolog eform 
    
    mi predictnl survimp2 = predict(survival at(agegrp 2) timevar(_t)) using mi_stpm2
    I will see of I can add a warning to stpm2's -predict- command if it is being called from an mi predict command.

    Leave a comment:


  • Paul Dickman
    replied
    I managed to reproduce the error and identify why the code in my example runs, but Steve and Laura's code does not. It seems you have identified a bug Laura (although I won't speculate where).

    I am running Stata 15.1 and -stpm2- version 1.7.3 (08Oct2018).

    The following code runs without error, but if I do not fit a standard stpm2 model prior to mi set, then I get the same error as Laura ("_rcs1 not found"). This suggests that, in my example, mi predictnl is using the values of _rcs? (the spline basis vectors) from the complete-case model rather than those from mi estimate (which it should be using).

    Removing the "mi register regular" command (as in my code from the exercise) didn't change anything.

    I think I'll leave it for Paul Lambert (author of -stpm2-) to take it from here.

    Code:
    set more off
    use http://pauldickman.com/survival/colon, clear
    
    stset surv_mm, failure(status=1 2) scale(12) exit(time 10*12)
    replace stage = . if stage==0
    
    // Comment the following line and mi predictnl returns an error
    stpm2 ib1.stage i.agegrp , df(5) scale(hazard) eform nolog
    
    mi set wide
    mi register regular subsite agegrp sex
    mi register imputed stage
    mi impute chained (mlogit) stage = i.subsite sex i.agegrp, add(10)
    
    mi estimate, dots cmdok sav(mi_stpm2,replace): ///
    stpm2 ib1.stage i.agegrp, df(5) scale(hazard) nolog eform
    
    mi predictnl survimp2 = predict(survival at(agegrp 2)) using mi_stpm2
    Some notes:
    1. I fitted an all-cause (rather than excess) mortality model to simplify the code.
    2. I didn't put the outcome in the imputation model (as in Laura's code) but my understanding is, in general, the outcome should be in the imputation model.
    3. According to the documentation, "If you have set your data with [stset] before you mi set them, there is no problem; the settings were automatically imported"

    Leave a comment:


  • Steve Samuels
    replied
    The Exercise code did run without error in Stata 14.2 , Paul.
    I can't pin down why my personal code doesn't. The only big difference is that I mi stset after mi set, whereas you stset the data first, then mi set (without mi stset). When I use your sequence and harmonize other differences, the error messages appear. Very puzzling!

    Laura, I notice that you, like me, used mi stset. What happens if you run the entire Exercise code block from beginning to end? Did you also update -stpm2- to the latest version? It's version 1.7.3 08Oct2018)
    Last edited by Steve Samuels; 02 Nov 2018, 13:53.

    Leave a comment:

Working...
X