Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Development of a nomogram to predict muscle invasiveness of a certain type of cancer

    Dear Statalist,

    I'm working on a dataset of >1000 patients to develop a clinical nomogram able to predict muscle invasiveness (binary outcome: yes/no) at final histology after a specific surgical procedure.

    The population of interest involves patients undergoing the same surgical procedure within a certain risk-group.

    The objective is to develop a nomogram after comparing different predictive models based on different variables (mostly categorical). The model with the best AUC and decision curve will be chosen to develop the nomogram.

    After logistic regression for each model, I performed AUC and leave-one-out cross validation for each model with no problem.

    When it comes to decision curve analysis STATA provides me with a syntax error I can't explain. I suppose the reason is because I don't have the variable I want to use in the analysis in the dataset and I don't know how to correctly generate it because it has to contain predictions of each model in terms of probability.

    After that, I couldn't go on with the analysis so I will post the code with generic names variables and not with the proper ones.

    I would be grateful if anyone could find a solution to this and eventually complete the code after the "* Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit" line with correct varnames. I will also need to correct the DCA for overfitting, may you please add the command for that to the code?

    Here I provide a sample generated with dataex and under that the code I'm using for the dataset.

    Dear Statalist,

    I am currently working on a dataset comprising over 1000 patients to develop a clinical nomogram to predict muscle invasiveness (binary outcome: yes/no) at final histology following a specific surgical procedure. The study population involves patients undergoing the same surgical procedure within a specified risk group.

    My objective is to develop a nomogram by comparing different predictive models based on various variables, primarily categorical. The selection criterion for the model is the one with the best AUC and decision curve.

    I encountered a syntax error in the decision curve analysis that I'm struggling to resolve. I suspect the issue arises from the absence of the variable I intend to use in my original dataset, and I'm unsure how to correctly generate it as it needs to contain predictions of each model in terms of probability.

    Below is a sample dataset generated with dataex, followed by the code I'm using for the dataset. After the syntax error, I've had to replace the actual variable names with generic ones because I still don't know which will be the best model that will be used to develop the nomogram.

    I would be grateful if anyone could find a solution to this and eventually complete the code after the "* Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit" line with correct varnames. I will also need to correct the DCA for overfitting, may you please add the command for that to the code?

    Your assistance is highly appreciated. Thank you in advance!


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float muscle_invasive byte grade_bio float(clinicalt_high size_tumor_high_EAU size_tumor_high_NCCN) byte(preop_cyto_result multifocal) float previous_cystectomy byte variant_histology
    1 1 0 . . . 1 0 .
    0 1 1 0 0 . 0 0 0
    0 0 0 1 1 . 0 0 0
    0 1 0 0 0 . 0 0 0
    0 1 0 1 1 1 0 0 0
    0 1 0 . . . 0 0 .
    0 1 1 1 1 0 0 0 .
    0 1 1 1 1 . 0 0 0
    0 1 0 1 1 . 0 0 0
    0 1 0 1 1 1 0 0 0
    1 1 0 1 1 . 1 0 .
    1 1 1 1 1 1 0 0 0
    0 1 0 1 1 . 0 0 0
    1 1 1 1 1 1 0 0 0
    0 1 1 1 1 3 0 . 0
    0 0 0 1 1 0 1 0 0
    0 0 1 0 1 . 0 0 0
    1 1 0 1 1 . 1 1 0
    1 1 1 1 1 . 0 0 0
    1 1 1 1 1 . 0 0 0
    1 1 0 1 1 . 0 0 0
    1 1 0 1 1 . 0 0 0
    0 1 0 1 1 . 1 0 0
    1 1 0 . . . 1 0 0
    1 1 1 0 0 1 0 0 0
    1 1 0 1 1 1 0 0 0
    0 1 0 . . . 1 0 0
    0 0 0 1 1 . 0 0 0
    0 1 1 1 1 . 1 0 .
    0 0 1 1 1 1 1 0 0
    1 1 0 . . 1 0 0 0
    1 1 1 0 0 . 0 0 0
    1 1 0 . . . 0 0 0
    0 1 1 . . . 1 0 0
    1 1 0 1 1 . 0 0 0
    1 1 0 . . . 0 0 0
    1 0 0 0 1 . 0 0 0
    1 1 0 1 1 . 0 0 0
    1 1 1 . . . 0 0 0
    1 1 0 1 1 1 0 0 0
    0 1 0 . . 2 0 0 0
    0 0 0 1 1 0 0 0 0
    1 1 0 . . . 0 0 0
    0 0 1 . . . 0 0 0
    0 0 0 . . 2 1 0 0
    0 1 0 . . 2 0 0 0
    1 1 1 1 1 1 1 0 0
    1 1 0 0 1 1 0 0 0
    1 1 1 . . 2 0 0 .
    1 0 1 . . . 0 0 0
    0 1 0 0 0 . 0 0 0
    1 1 0 1 1 1 0 0 0
    1 1 1 1 1 . 0 0 0
    0 1 0 1 1 1 0 0 0
    1 1 1 0 0 1 0 0 0
    0 1 0 1 1 1 0 0 0
    0 1 0 1 1 0 1 0 0
    1 1 0 1 1 3 0 0 0
    0 1 0 . . 1 0 0 0
    0 1 0 . . 1 0 0 0
    1 1 0 1 1 1 1 0 0
    0 0 1 . . . 0 0 0
    0 0 0 0 1 . 0 0 0
    1 1 0 . . . 0 0 0
    1 1 0 1 1 . 0 0 0
    0 0 0 1 1 0 1 0 0
    1 1 1 1 1 0 0 0 .
    0 1 0 1 1 2 0 0 0
    0 1 0 . . 1 0 0 0
    1 1 0 0 0 2 0 0 0
    0 0 0 1 1 2 0 0 0
    1 0 0 1 1 . 0 0 0
    1 1 0 1 1 1 1 0 0
    1 1 0 0 0 1 0 0 0
    1 1 1 1 1 . 0 1 0
    1 1 0 1 1 2 0 0 .
    1 1 1 1 1 1 0 0 0
    1 1 1 1 1 . 0 0 0
    0 0 0 1 1 2 0 0 0
    1 1 1 1 1 . 0 1 0
    0 1 0 1 1 1 0 0 0
    1 1 1 1 1 . 0 0 0
    1 0 0 0 0 . 1 0 0
    1 0 0 . . 1 1 0 0
    0 1 0 1 1 . 1 0 0
    0 1 0 . . . 0 0 0
    1 1 1 1 1 . 1 0 0
    1 1 1 1 1 . 0 0 0
    1 0 1 1 1 2 0 0 0
    1 1 1 1 1 . 0 0 0
    1 1 0 1 1 . 0 0 0
    1 1 0 1 1 . 0 0 0
    1 1 1 1 1 . 0 0 0
    1 1 1 1 1 1 0 0 0
    1 1 0 0 1 . 0 0 0
    0 1 0 1 1 1 0 0 0
    0 0 1 1 1 1 0 0 0
    1 1 0 . . . 0 0 0
    1 1 0 1 1 2 0 0 0
    1 1 0 0 0 . 0 0 0
    end
    label values grade_bio grade_bio_
    label def grade_bio_ 0 "Low Grade", modify
    label def grade_bio_ 1 "High Grade", modify
    label values preop_cyto_result preop_cyto_result_
    label def preop_cyto_result_ 0 "Negative", modify
    label def preop_cyto_result_ 1 "Positive", modify
    label def preop_cyto_result_ 2 "Atypia/Suspicious", modify
    label def preop_cyto_result_ 3 "Not diagnostic", modify
    label values multifocal multifocal_
    label def multifocal_ 0 "No", modify
    label def multifocal_ 1 "Yes", modify

    *population setting

    keep if grade_bio==1 | clinicalt_high== 1 | size_tumor_high_EAU==1 | size_tumor_high_NCCN==1 | preop_cyto_result==1 | multifocal==1 | previous_cystectomy==1 |variant_histology==1

    keep if type_surg==2

    drop if pt_path==.

    //PREDICTIVE MODELS

    *univariate analysis

    logistic muscle_invasive grade_bio

    logistic muscle_invasive clinicalt_high

    logistic muscle_invasive size_tumor_high_EAU

    logistic muscle_invasive size_tumor_high_NCCN

    logistic muscle_invasive preop_cyto_result

    logistic muscle_invasive multifocal

    logistic muscle_invasive previous_cystectomy

    logistic muscle_invasive variant_histology

    *multivariate analysis

    //eventuale aggiunta di lsens per calcolo sensibilità e specificità

    //clinical model (based on variables only obtainable at CT and anamnestic evaluation)

    logistic muscle_invasive clinicalt_high size_tumor_high_EAU multifocal previous_cystectomy preop_cyto_result, coef

    lroc

    looclass muscle_invasive clinicalt_high size_tumor_high_EAU multifocal previous_cystectomy preop_cyto_result, model(logit) fig

    capture drop clinical_model_EAU_prediction

    predict clinical_model_EAU_prediction

    label variable clinical_model_EAU_prediction "Clinical model EAU"

    logistic muscle_invasive clinicalt_high size_tumor_high_NCCN multifocal previous_cystectomy preop_cyto_result, coef

    lroc

    looclass muscle_invasive clinicalt_high size_tumor_high_NCCN multifocal previous_cystectomy preop_cyto_result, model(logit) fig

    capture drop clinical_model_NCCN_prediction

    predict clinical_model_NCCN_prediction

    label variable clinical_model_NCCN_prediction "Clinical model NCCN"

    //endoscopic model (based on variables only verifiable after URS)

    logistic muscle_invasive grade_bio variant_histology, coef

    lroc

    looclass muscle_invasive grade_bio variant_histology, model(logit) fig

    capture drop endoscopic_model_prediction

    predict endoscopic_model_prediction

    label variable endoscopic_model_prediction "Endoscopic model"

    //tumor-related model (based only on tumor features)

    logistic muscle_invasive grade_bio clinicalt_high size_tumor_high_EAU multifocal variant_histology

    lroc

    looclass muscle_invasive grade_bio clinicalt_high size_tumor_high_EAU multifocal variant_histology, model(logit) fig

    capture drop tumor_model_EAU_prediction

    predict tumor_model_EAU_prediction

    label variable tumor_model_EAU_prediction "Tumor model EAU"

    logistic muscle_invasive grade_bio clinicalt_high size_tumor_high_NCCN multifocal variant_histology

    lroc

    looclass muscle_invasive grade_bio clinicalt_high size_tumor_high_NCCN multifocal variant_histology, model(logit) fig

    capture drop tumor_model_NCCN_prediction

    predict tumor_model_NCCN_prediction

    label variable tumor_model_NCCN_prediction "Tumor model NCCN"

    //staging model (based only on clinical tumor grade and stage, which are the strongest predictors of worse prognosis)

    logistic muscle_invasive grade_bio clinicalt_high

    lroc

    looclass muscle_invasive grade_bio clinicalt_high, model(logit) fig

    capture drop staging_model_prediction

    predict staging_model_prediction

    label variable staging_model_prediction "Staging model"

    *Run the decision curve with dca command (https://www.danieldsjoberg.com/dca-t...ial-stata.html) and save out net benefit

    dca muscle_invasive clinical_model_EAU_prediction clinical_model_NCCN_prediction endoscopic_model_prediction tumor_model_EAU_prediction tumor_model_NCCN_prediction staging_model_prediction, xstart(0.05) xstop(0.35) xlabel(0(0.01)0.35) smooth ///
    saving("DCA Output marker.dta", replace)

    *nomogram visual description is executed on the predictive model with the best AUC and net benefit

    nomolog

    * Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit
    use "DCA Output marker.dta", clear
    g advantage = model - all
    label var advantage "Increase in net benefit from using Marker model"

    *Calculate the interventions avoided of the predictive model with the best AUC and net benefit

    dca muscle_invasive model, prob(no) intervention xstart(0.05) xstop(0.35)

    Thank you in advance!

  • #2
    Code:
    *population setting
    
    keep if grade_bio==1 | clinicalt_high== 1 | size_tumor_high_EAU==1 | size_tumor_high_NCCN==1 | preop_cyto_result==1 | multifocal==1 | previous_cystectomy==1 |variant_histology==1
    
    keep if type_surg==2
    
    drop if pt_path==.
    
    //PREDICTIVE MODELS
    
    *univariate analysis
    
    logistic muscle_invasive grade_bio
    
    logistic muscle_invasive clinicalt_high
    
    logistic muscle_invasive size_tumor_high_EAU
    
    logistic muscle_invasive size_tumor_high_NCCN
    
    logistic muscle_invasive preop_cyto_result
    
    logistic muscle_invasive multifocal
    
    logistic muscle_invasive previous_cystectomy
    
    logistic muscle_invasive variant_histology
    
    *multivariate analysis
    
    //eventuale aggiunta di lsens per calcolo sensibilità e specificità
    
    //clinical model (based on variables only obtainable at CT and anamnestic evaluation)
    
    logistic muscle_invasive clinicalt_high size_tumor_high_EAU multifocal previous_cystectomy preop_cyto_result, coef
    
    lroc
    
    looclass muscle_invasive clinicalt_high size_tumor_high_EAU multifocal previous_cystectomy preop_cyto_result, model(logit) fig
    
    capture drop clinical_model_EAU_prediction
    
    predict clinical_model_EAU_prediction
    
    label variable clinical_model_EAU_prediction "Clinical model EAU"
    
    logistic muscle_invasive clinicalt_high size_tumor_high_NCCN multifocal previous_cystectomy preop_cyto_result, coef
    
    lroc
    
    looclass muscle_invasive clinicalt_high size_tumor_high_NCCN multifocal previous_cystectomy preop_cyto_result, model(logit) fig
    
    capture drop clinical_model_NCCN_prediction
    
    predict clinical_model_NCCN_prediction
    
    label variable clinical_model_NCCN_prediction "Clinical model NCCN"
    
    //endoscopic model (based on variables only verifiable after URS)
    
    logistic muscle_invasive grade_bio variant_histology, coef
    
    lroc
    
    looclass muscle_invasive grade_bio variant_histology, model(logit) fig
    
    capture drop endoscopic_model_prediction
    
    predict endoscopic_model_prediction
    
    label variable endoscopic_model_prediction "Endoscopic model" 
    
    //tumor-related model (based only on tumor features)
    
    logistic muscle_invasive grade_bio clinicalt_high size_tumor_high_EAU multifocal variant_histology
    
    lroc
    
    looclass muscle_invasive grade_bio clinicalt_high size_tumor_high_EAU multifocal variant_histology, model(logit) fig
    
    capture drop tumor_model_EAU_prediction
    
    predict tumor_model_EAU_prediction
    
    label variable tumor_model_EAU_prediction "Tumor model EAU"
    
    logistic muscle_invasive grade_bio clinicalt_high size_tumor_high_NCCN multifocal variant_histology
    
    lroc
    
    looclass muscle_invasive grade_bio clinicalt_high size_tumor_high_NCCN multifocal variant_histology, model(logit) fig
    
    capture drop tumor_model_NCCN_prediction
    
    predict tumor_model_NCCN_prediction
    
    label variable tumor_model_NCCN_prediction "Tumor model NCCN" 
    
    //staging model (based only on clinical tumor grade and stage, which are the strongest predictors of worse prognosis)
    
    logistic muscle_invasive grade_bio clinicalt_high
    
    lroc
    
    looclass muscle_invasive grade_bio clinicalt_high, model(logit) fig 
    
    capture drop staging_model_prediction
    
    predict staging_model_prediction
    
    label variable staging_model_prediction "Staging model"
    
    *Run the decision curve with dca command (https://www.danieldsjoberg.com/dca-t...ial-stata.html) and save out net benefit 
    
    dca muscle_invasive clinical_model_EAU_prediction clinical_model_NCCN_prediction endoscopic_model_prediction tumor_model_EAU_prediction tumor_model_NCCN_prediction staging_model_prediction, xstart(0.05) xstop(0.35) xlabel(0(0.01)0.35) smooth ///
    saving("DCA Output marker.dta", replace)
    
    *nomogram visual description is executed on the predictive model with the best AUC and net benefit
    
    nomolog
    
    * Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit
    use "DCA Output marker.dta", clear
    g advantage = model - all
    label var advantage "Increase in net benefit from using Marker model"
    
    *Calculate the interventions avoided of the predictive model with the best AUC and net benefit
    
    dca muscle_invasive model, prob(no) intervention xstart(0.05) xstop(0.35)

    Comment

    Working...
    X