Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • stpm2 and mi predictnl

    Dear Listers,

    I am running survival analysis using flexible parametric models in -stpm2-.

    I have missing data in one of my covariates. As part of my sensitivity analyses, I am imputing this missing information using multiple imputation. I would like to produce survival plots based on my imputed results and I have found examples online showing this could be achieved using -mi predictnl- just after sptm2. However, this is not straightforward once I run it in on my data so I thought I tried on the original sample data (melanoma.dta).


    I use the following code:

    replace stage = . if stage==0

    mi set flong
    mi register regular subsite agegrp sex
    mi register imputed stage
    mi impute chained (mlogit) stage = i.subsite sex i.agegrp , add(10)

    mi stset surv_mm, fail(status==1 2) id(id) scale(12)
    mi stset surv_mm, failure(status=1,2) scale(12) id(id) exit(time 120.5)

    mi estimate, dots cmdok sav(mi_stpm2,replace): ///
    stpm2 ib1.stage i.agegrp, df(5) bhaz(rate) scale(hazard) nolog eform

    mi predictnl survimp2 = predict(survival at(agegrp 2)) using mi_stpm2

    On this sample data and my data, I get the following error message:

    variable _rcs1 not found
    xb(xb) invalid
    predict(survival at(agegrp 2)) invalid


    _rcs1 is part of the survival analysis output so I am not sure what could be going wrong. I tried updating my stato ado files (update all) but this has not resolved the issue. Any suggestion would be greatly appreciated.

  • #2
    Perhaps predict will work (not tried):
    Code:
    mi predict survimpt using mi_stpm2, survival  at(agegrp 2)
    ADDED: predict doesn't work either. The error message is "option survival" not found.
    I'm curious. Can you provide a link to one of the online examples that use predictnl? Thanks.
    Last edited by Steve Samuels; 01 Nov 2018, 14:27.
    Steve Samuels
    Statistical Consulting
    sjsamuels@gmail.com

    Stata 14.2

    Comment


    • #3
      Here's a round-about approach that gets the prediction for every completed data imputation, then takes the mean:
      Code:
      tempfile t1
      save `t1', replace
      save results, emptyok replace
      forvalues i = 1/10 {
                  use `t1', clear
                  mi extract `i'
                  mi estimate, dots cmdok sav(mi_stpm2,replace): ///
                       stpm2 ib1.stage i.agegrp, df(5) bhaz(rate) scale(hazard) nolog eform
                  predict surv, survival at(agegrp  2)
                  gen dset = `i'
                  keep  dset surv valve _t agegrp
                  append using results
                  save results , replace
      }
      use results, clear
      collapse (mean) surv,  by (agegrp _t)
      sort_t
      twoway connect surv _t
      Steve Samuels
      Statistical Consulting
      sjsamuels@gmail.com

      Stata 14.2

      Comment


      • #4
        Sorry, mi estimate isn't necessary and will trigger an error, because the extracted data set is not mi set. This should be okay.
        Code:
        tempfile t1
        save `t1', replace
        save results, emptyok replace
        forvalues i = 1/10 {
            use `t1', clear
            mi extract `i'
            stpm2 ib1.stage i.agegrp, df(5) bhaz(rate) scale(hazard) nolog eform
            predict surv, survival at(agegrp  2)
            gen dset = `i'
            keep  dset surv valve _t agegrp
            append using results
            save results , replace
        }
        use results, clear
        collapse (mean) surv,  by (agegrp _t)
        sort_t
        twoway connect surv _t
        Steve Samuels
        Statistical Consulting
        sjsamuels@gmail.com

        Stata 14.2

        Comment


        • #5
          Hi Steve,

          Thanks for your suggestion. I will have a go!

          The example I found is at the end of this document (page 250 onwards).

          http://www.pauldickman.com/survival/.../solutions.pdf

          Laura



          Comment


          • #6
            Thank you, Laura. I've written to the authors about this.
            Steve Samuels
            Statistical Consulting
            sjsamuels@gmail.com

            Stata 14.2

            Comment


            • #7
              Thanks Steve.

              This code is from an exercise Paul Lambert and I use in short courses. I'm not sure of the problem, but this is the latest version of the code for this exercise and it runs for me. I believe Paul Lambert (author of -stpm2- and co-author of this exercise) is looking into it. For the moment, could you confirm that this code runs for you.

              Neither Paul Lambert or I have great expertise in methods for multiple imputation, so I'm happy to receive any comments on how the code could be improved.

              Code:
              //======================================================================//
              // EXERCISE 285: 
              // From short-course on population-based cancer patient survival
              // http://cansurv.net/
              // October 2015
              // Paul Dickman
              //======================================================================//
              set more off
              version
              which stpm2
              
              use http://pauldickman.com/survival/colon, clear
              
              stset surv_mm, failure(status=1 2) scale(12) exit(time 10*12)
              gen _age = min(int(age + _t),99)
              gen _year = int(yydx + mmdx/12 + _t)
              merge m:1 _year sex _age using http://pauldickman.com/survival/popmort
              keep if _merge==3
              
              tab stage
              
              /* Check stage distribution over age and gender */
              tab stage agegrp, column
              tab stage sex, column
              
              /* Graphs of survival by age group and stage */
              stpm2 ib1.stage##i.agegrp , df(5) bhaz(rate) scale(hazard) eform nolog
              predict survival, surv
              
              line survival _t if stage==0, lpattern(dash) sort || ///
              line survival _t if stage==1, sort || ///
              line survival _t if stage==2, sort || ///
              line survival _t if stage==3, sort by(agegrp) ///
               legend(order(1 "Unknown" 2 "Localised" 3 "Regional" 4 "Distant"))
              
              /* Fit model using missing indicator approach */
              stpm2 ib1.stage i.agegrp , df(5) bhaz(rate) scale(hazard) eform nolog
              
              /* Refit model using complete records approach */
              replace stage=. if stage==0
              stpm2 ib1.stage i.agegrp , df(5) bhaz(rate) scale(hazard) eform nolog
              
              // The outcome should be included in the imputation model.
              // Falcaro et al suggest the Nelson-Aalen estimate of cum. hazard.
              sts gen H=na
              
              // Declare multiple-imputation data
              mi set wide
              
              //  Declare which variables have missing data
              mi register imputed stage
              
              // Perform imputation. 
              // This creates 10 additional copies of the obs with missing stage.
              // Carpenter and Kenward (2013) suggest 30 imputations. We use 10.
              set seed 29390
              mi impute chained (mlogit) stage = i.subsite sex i.agegrp H _d, add(10)
              
              list id _mi_m agegrp sex stage _t _d if id==2287
              list id _mi_m agegrp sex stage _t _d if id==3362
              list id _mi_m agegrp sex stage _t _d if id==3501
              
              // Note that as stpm2 is not an official Stata command we need to use cmd ok option
              // Save the estimates so that they can be used for making predictions 
              mi estimate, dots cmdok sav(mi_stpm2,replace): ///
                  stpm2 ib1.stage i.agegrp, df(5) bhaz(rate) scale(hazard) nolog eform
              
              // predict survival using -mi predictnl-    
              mi predictnl survimp2 = predict(survival at(agegrp 2)) using mi_stpm2
              
              // compare predictions to complete case analysis
              stpm2 ib1.stage i.agegrp if _mi_m==0, df(5) scale(h) bhaz(rate)
              predict surv, survival at(agegrp 2)
              
              line surv survimp2 _t if stage==1 & _mi_m==0, sort || ///
              line surv survimp2 _t if stage==2 & _mi_m==0, sort || ///
              line surv survimp2 _t if stage==3 & _mi_m==0, sort ///
              title("Predicted survival for agegrp==2 (60-74)") ///
              legend(order(1 "Localised (Complete)" 2  "Localised (Imputed)" ///
                            3 "Regional (Complete)" 4 "Regional (Imputed)" ///
                            5 "Distant (Complete)" 6 "Distant (Imputed)")) ///
                         name(imputed, replace)
                         
              // predict 5-year net survival using meansurv approach
              gen t5=5
              predict ns5, survival timevar(t5)
              mi predictnl ns5imp = predict(survival timevar(t5)) using mi_stpm2
              table stage agegrp, contents(mean ns5 mean ns5imp) format(%5.3f)
              table stage, contents(mean ns5 mean ns5imp) format(%5.3f)
              table agegrp, contents(mean ns5 mean ns5imp) format(%5.3f)

              Comment


              • #8
                The Exercise code did run without error in Stata 14.2 , Paul.
                I can't pin down why my personal code doesn't. The only big difference is that I mi stset after mi set, whereas you stset the data first, then mi set (without mi stset). When I use your sequence and harmonize other differences, the error messages appear. Very puzzling!

                Laura, I notice that you, like me, used mi stset. What happens if you run the entire Exercise code block from beginning to end? Did you also update -stpm2- to the latest version? It's version 1.7.3 08Oct2018)
                Last edited by Steve Samuels; 02 Nov 2018, 13:53.
                Steve Samuels
                Statistical Consulting
                sjsamuels@gmail.com

                Stata 14.2

                Comment


                • #9
                  I managed to reproduce the error and identify why the code in my example runs, but Steve and Laura's code does not. It seems you have identified a bug Laura (although I won't speculate where).

                  I am running Stata 15.1 and -stpm2- version 1.7.3 (08Oct2018).

                  The following code runs without error, but if I do not fit a standard stpm2 model prior to mi set, then I get the same error as Laura ("_rcs1 not found"). This suggests that, in my example, mi predictnl is using the values of _rcs? (the spline basis vectors) from the complete-case model rather than those from mi estimate (which it should be using).

                  Removing the "mi register regular" command (as in my code from the exercise) didn't change anything.

                  I think I'll leave it for Paul Lambert (author of -stpm2-) to take it from here.

                  Code:
                  set more off
                  use http://pauldickman.com/survival/colon, clear
                  
                  stset surv_mm, failure(status=1 2) scale(12) exit(time 10*12)
                  replace stage = . if stage==0
                  
                  // Comment the following line and mi predictnl returns an error
                  stpm2 ib1.stage i.agegrp , df(5) scale(hazard) eform nolog
                  
                  mi set wide
                  mi register regular subsite agegrp sex
                  mi register imputed stage
                  mi impute chained (mlogit) stage = i.subsite sex i.agegrp, add(10)
                  
                  mi estimate, dots cmdok sav(mi_stpm2,replace): ///
                  stpm2 ib1.stage i.agegrp, df(5) scale(hazard) nolog eform
                  
                  mi predictnl survimp2 = predict(survival at(agegrp 2)) using mi_stpm2
                  Some notes:
                  1. I fitted an all-cause (rather than excess) mortality model to simplify the code.
                  2. I didn't put the outcome in the imputation model (as in Laura's code) but my understanding is, in general, the outcome should be in the imputation model.
                  3. According to the documentation, "If you have set your data with [stset] before you mi set them, there is no problem; the settings were automatically imported"

                  Comment


                  • #10
                    Thanks,

                    What is going on here is that when you use -stpm2- it creates sets of restricted cubic spline variables (and their derivatives) that are needed to model the effect of time. When you use the -predict- option after fitting an -stpm2- model the default is to use the spline variables created by -stpm2- when calculating the predictions.

                    When using multiple imputation a number of different models are fitted. When making the predictions, the -predict- command tries to use the spline variables, but in the examples above they do not exist and so the error is given.

                    One way around this is to use the -timevar()- option of the -predict- command, which causes all the splines variables to be recalculated at the specified values of time. If the variable given is _t, then the mi commands will do what you have tried to do above, or you can choose specific values of time to predict at.

                    Thus the following code works,

                    Code:
                    set more off
                    use http://pauldickman.com/survival/colon, clear
                    
                    stset surv_mm, failure(status=1 2) scale(12) exit(time 10*12)
                    replace stage = . if stage==0
                    
                    
                    sts gen H=na
                    
                    // Comment the following line and mi predictnl returns an error
                    // but using the timevar() option will fix this.
                    //stpm2 ib1.stage i.agegrp , df(5) scale(hazard) eform nolog
                    
                    mi set wide
                    mi register regular subsite agegrp sex
                    mi register imputed stage
                    mi register passive _rcs* _d_rcs*
                    mi impute chained (mlogit) stage = i.subsite sex i.agegrp H _d, add(10)
                    
                    mi estimate, dots cmdok sav(mi_stpm2,replace): ///
                    stpm2 ib1.stage i.agegrp, df(5) scale(hazard) nolog eform 
                    
                    mi predictnl survimp2 = predict(survival at(agegrp 2) timevar(_t)) using mi_stpm2
                    I will see of I can add a warning to stpm2's -predict- command if it is being called from an mi predict command.

                    Comment


                    • #11
                      Dear Steve and Paul,

                      Thank you so much for looking into my query and finding ways to fix the issue.

                      I also appreciate your suggestions on how to improve my code. I will now leave my data with stset before I set up my imputation and include the outcome as suggested in your example.

                      Laura



                      Comment


                      • #12
                        Unfortunately, trying to add confidence limits to predict also throws an error:
                        Code:
                        mi predictnl survimp2 = predict(survival at(agegrp 2) timevar(_t) ci) using mi_stpm2
                        variable __000009_lci already defined
                        r(110);

                        Last edited by Steve Samuels; 06 Nov 2018, 12:37.
                        Steve Samuels
                        Statistical Consulting
                        sjsamuels@gmail.com

                        Stata 14.2

                        Comment


                        • #13
                          I wrote to Paul Lambert about this error, and he replied:

                          "You can’t add a ci option here, but should add it as an option of the mi predictnl command, e.g., ci(lci uci)"
                          I hadn't realized that ci was an option of predictnl. The entire command also requires a force option, so the correct version would be:
                          Code:
                          mi predictnl survimp2 = predict(survival at(agegrp 2) timevar(_t)) using mi_stpm2, ci(surv_ll surv_ul) force
                          Last edited by Steve Samuels; 06 Nov 2018, 18:14.
                          Steve Samuels
                          Statistical Consulting
                          sjsamuels@gmail.com

                          Stata 14.2

                          Comment


                          • #14
                            I want to thank Laura, for bringing this problem to the list, and the two Pauls for contributing their great knowledge to this problem. I urge anyone who aspires to use survival techniques to read the notes and exercises for their short course, available here.
                            Information about the course itself can be found here.
                            Steve Samuels
                            Statistical Consulting
                            sjsamuels@gmail.com

                            Stata 14.2

                            Comment


                            • #15
                              Steve Samuels I have been following this thread closely. The hyperlink for the course notes does not seem to work for me. Could you provide the link again?

                              Comment

                              Working...
                              X