Dear Statalist,
I'm working on a dataset of >1000 patients to develop a clinical nomogram able to predict muscle invasiveness (binary outcome: yes/no) at final histology after a specific surgical procedure.
The population of interest involves patients undergoing the same surgical procedure within a certain risk-group.
The objective is to develop a nomogram after comparing different predictive models based on different variables (mostly categorical). The model with the best AUC and decision curve will be chosen to develop the nomogram.
After logistic regression for each model, I performed AUC and leave-one-out cross validation for each model with no problem.
When it comes to decision curve analysis STATA provides me with a syntax error I can't explain. I suppose the reason is because I don't have the variable I want to use in the analysis in the dataset and I don't know how to correctly generate it because it has to contain predictions of each model in terms of probability.
After that, I couldn't go on with the analysis so I will post the code with generic names variables and not with the proper ones.
I would be grateful if anyone could find a solution to this and eventually complete the code after the "* Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit" line with correct varnames. I will also need to correct the DCA for overfitting, may you please add the command for that to the code?
Here I provide a sample generated with dataex and under that the code I'm using for the dataset.
Dear Statalist,
I am currently working on a dataset comprising over 1000 patients to develop a clinical nomogram to predict muscle invasiveness (binary outcome: yes/no) at final histology following a specific surgical procedure. The study population involves patients undergoing the same surgical procedure within a specified risk group.
My objective is to develop a nomogram by comparing different predictive models based on various variables, primarily categorical. The selection criterion for the model is the one with the best AUC and decision curve.
I encountered a syntax error in the decision curve analysis that I'm struggling to resolve. I suspect the issue arises from the absence of the variable I intend to use in my original dataset, and I'm unsure how to correctly generate it as it needs to contain predictions of each model in terms of probability.
Below is a sample dataset generated with dataex, followed by the code I'm using for the dataset. After the syntax error, I've had to replace the actual variable names with generic ones because I still don't know which will be the best model that will be used to develop the nomogram.
I would be grateful if anyone could find a solution to this and eventually complete the code after the "* Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit" line with correct varnames. I will also need to correct the DCA for overfitting, may you please add the command for that to the code?
Your assistance is highly appreciated. Thank you in advance!
*population setting
keep if grade_bio==1 | clinicalt_high== 1 | size_tumor_high_EAU==1 | size_tumor_high_NCCN==1 | preop_cyto_result==1 | multifocal==1 | previous_cystectomy==1 |variant_histology==1
keep if type_surg==2
drop if pt_path==.
//PREDICTIVE MODELS
*univariate analysis
logistic muscle_invasive grade_bio
logistic muscle_invasive clinicalt_high
logistic muscle_invasive size_tumor_high_EAU
logistic muscle_invasive size_tumor_high_NCCN
logistic muscle_invasive preop_cyto_result
logistic muscle_invasive multifocal
logistic muscle_invasive previous_cystectomy
logistic muscle_invasive variant_histology
*multivariate analysis
//eventuale aggiunta di lsens per calcolo sensibilità e specificità
//clinical model (based on variables only obtainable at CT and anamnestic evaluation)
logistic muscle_invasive clinicalt_high size_tumor_high_EAU multifocal previous_cystectomy preop_cyto_result, coef
lroc
looclass muscle_invasive clinicalt_high size_tumor_high_EAU multifocal previous_cystectomy preop_cyto_result, model(logit) fig
capture drop clinical_model_EAU_prediction
predict clinical_model_EAU_prediction
label variable clinical_model_EAU_prediction "Clinical model EAU"
logistic muscle_invasive clinicalt_high size_tumor_high_NCCN multifocal previous_cystectomy preop_cyto_result, coef
lroc
looclass muscle_invasive clinicalt_high size_tumor_high_NCCN multifocal previous_cystectomy preop_cyto_result, model(logit) fig
capture drop clinical_model_NCCN_prediction
predict clinical_model_NCCN_prediction
label variable clinical_model_NCCN_prediction "Clinical model NCCN"
//endoscopic model (based on variables only verifiable after URS)
logistic muscle_invasive grade_bio variant_histology, coef
lroc
looclass muscle_invasive grade_bio variant_histology, model(logit) fig
capture drop endoscopic_model_prediction
predict endoscopic_model_prediction
label variable endoscopic_model_prediction "Endoscopic model"
//tumor-related model (based only on tumor features)
logistic muscle_invasive grade_bio clinicalt_high size_tumor_high_EAU multifocal variant_histology
lroc
looclass muscle_invasive grade_bio clinicalt_high size_tumor_high_EAU multifocal variant_histology, model(logit) fig
capture drop tumor_model_EAU_prediction
predict tumor_model_EAU_prediction
label variable tumor_model_EAU_prediction "Tumor model EAU"
logistic muscle_invasive grade_bio clinicalt_high size_tumor_high_NCCN multifocal variant_histology
lroc
looclass muscle_invasive grade_bio clinicalt_high size_tumor_high_NCCN multifocal variant_histology, model(logit) fig
capture drop tumor_model_NCCN_prediction
predict tumor_model_NCCN_prediction
label variable tumor_model_NCCN_prediction "Tumor model NCCN"
//staging model (based only on clinical tumor grade and stage, which are the strongest predictors of worse prognosis)
logistic muscle_invasive grade_bio clinicalt_high
lroc
looclass muscle_invasive grade_bio clinicalt_high, model(logit) fig
capture drop staging_model_prediction
predict staging_model_prediction
label variable staging_model_prediction "Staging model"
*Run the decision curve with dca command (https://www.danieldsjoberg.com/dca-t...ial-stata.html) and save out net benefit
dca muscle_invasive clinical_model_EAU_prediction clinical_model_NCCN_prediction endoscopic_model_prediction tumor_model_EAU_prediction tumor_model_NCCN_prediction staging_model_prediction, xstart(0.05) xstop(0.35) xlabel(0(0.01)0.35) smooth ///
saving("DCA Output marker.dta", replace)
*nomogram visual description is executed on the predictive model with the best AUC and net benefit
nomolog
* Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit
use "DCA Output marker.dta", clear
g advantage = model - all
label var advantage "Increase in net benefit from using Marker model"
*Calculate the interventions avoided of the predictive model with the best AUC and net benefit
dca muscle_invasive model, prob(no) intervention xstart(0.05) xstop(0.35)
Thank you in advance!
I'm working on a dataset of >1000 patients to develop a clinical nomogram able to predict muscle invasiveness (binary outcome: yes/no) at final histology after a specific surgical procedure.
The population of interest involves patients undergoing the same surgical procedure within a certain risk-group.
The objective is to develop a nomogram after comparing different predictive models based on different variables (mostly categorical). The model with the best AUC and decision curve will be chosen to develop the nomogram.
After logistic regression for each model, I performed AUC and leave-one-out cross validation for each model with no problem.
When it comes to decision curve analysis STATA provides me with a syntax error I can't explain. I suppose the reason is because I don't have the variable I want to use in the analysis in the dataset and I don't know how to correctly generate it because it has to contain predictions of each model in terms of probability.
After that, I couldn't go on with the analysis so I will post the code with generic names variables and not with the proper ones.
I would be grateful if anyone could find a solution to this and eventually complete the code after the "* Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit" line with correct varnames. I will also need to correct the DCA for overfitting, may you please add the command for that to the code?
Here I provide a sample generated with dataex and under that the code I'm using for the dataset.
Dear Statalist,
I am currently working on a dataset comprising over 1000 patients to develop a clinical nomogram to predict muscle invasiveness (binary outcome: yes/no) at final histology following a specific surgical procedure. The study population involves patients undergoing the same surgical procedure within a specified risk group.
My objective is to develop a nomogram by comparing different predictive models based on various variables, primarily categorical. The selection criterion for the model is the one with the best AUC and decision curve.
I encountered a syntax error in the decision curve analysis that I'm struggling to resolve. I suspect the issue arises from the absence of the variable I intend to use in my original dataset, and I'm unsure how to correctly generate it as it needs to contain predictions of each model in terms of probability.
Below is a sample dataset generated with dataex, followed by the code I'm using for the dataset. After the syntax error, I've had to replace the actual variable names with generic ones because I still don't know which will be the best model that will be used to develop the nomogram.
I would be grateful if anyone could find a solution to this and eventually complete the code after the "* Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit" line with correct varnames. I will also need to correct the DCA for overfitting, may you please add the command for that to the code?
Your assistance is highly appreciated. Thank you in advance!
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float muscle_invasive byte grade_bio float(clinicalt_high size_tumor_high_EAU size_tumor_high_NCCN) byte(preop_cyto_result multifocal) float previous_cystectomy byte variant_histology 1 1 0 . . . 1 0 . 0 1 1 0 0 . 0 0 0 0 0 0 1 1 . 0 0 0 0 1 0 0 0 . 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 . . . 0 0 . 0 1 1 1 1 0 0 0 . 0 1 1 1 1 . 0 0 0 0 1 0 1 1 . 0 0 0 0 1 0 1 1 1 0 0 0 1 1 0 1 1 . 1 0 . 1 1 1 1 1 1 0 0 0 0 1 0 1 1 . 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 3 0 . 0 0 0 0 1 1 0 1 0 0 0 0 1 0 1 . 0 0 0 1 1 0 1 1 . 1 1 0 1 1 1 1 1 . 0 0 0 1 1 1 1 1 . 0 0 0 1 1 0 1 1 . 0 0 0 1 1 0 1 1 . 0 0 0 0 1 0 1 1 . 1 0 0 1 1 0 . . . 1 0 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 0 0 0 0 1 0 . . . 1 0 0 0 0 0 1 1 . 0 0 0 0 1 1 1 1 . 1 0 . 0 0 1 1 1 1 1 0 0 1 1 0 . . 1 0 0 0 1 1 1 0 0 . 0 0 0 1 1 0 . . . 0 0 0 0 1 1 . . . 1 0 0 1 1 0 1 1 . 0 0 0 1 1 0 . . . 0 0 0 1 0 0 0 1 . 0 0 0 1 1 0 1 1 . 0 0 0 1 1 1 . . . 0 0 0 1 1 0 1 1 1 0 0 0 0 1 0 . . 2 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 . . . 0 0 0 0 0 1 . . . 0 0 0 0 0 0 . . 2 1 0 0 0 1 0 . . 2 0 0 0 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 0 1 1 1 . . 2 0 0 . 1 0 1 . . . 0 0 0 0 1 0 0 0 . 0 0 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 . 0 0 0 0 1 0 1 1 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 1 1 0 1 0 0 1 1 0 1 1 3 0 0 0 0 1 0 . . 1 0 0 0 0 1 0 . . 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 1 . . . 0 0 0 0 0 0 0 1 . 0 0 0 1 1 0 . . . 0 0 0 1 1 0 1 1 . 0 0 0 0 0 0 1 1 0 1 0 0 1 1 1 1 1 0 0 0 . 0 1 0 1 1 2 0 0 0 0 1 0 . . 1 0 0 0 1 1 0 0 0 2 0 0 0 0 0 0 1 1 2 0 0 0 1 0 0 1 1 . 0 0 0 1 1 0 1 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 1 1 . 0 1 0 1 1 0 1 1 2 0 0 . 1 1 1 1 1 1 0 0 0 1 1 1 1 1 . 0 0 0 0 0 0 1 1 2 0 0 0 1 1 1 1 1 . 0 1 0 0 1 0 1 1 1 0 0 0 1 1 1 1 1 . 0 0 0 1 0 0 0 0 . 1 0 0 1 0 0 . . 1 1 0 0 0 1 0 1 1 . 1 0 0 0 1 0 . . . 0 0 0 1 1 1 1 1 . 1 0 0 1 1 1 1 1 . 0 0 0 1 0 1 1 1 2 0 0 0 1 1 1 1 1 . 0 0 0 1 1 0 1 1 . 0 0 0 1 1 0 1 1 . 0 0 0 1 1 1 1 1 . 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 0 1 . 0 0 0 0 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 1 0 . . . 0 0 0 1 1 0 1 1 2 0 0 0 1 1 0 0 0 . 0 0 0 end label values grade_bio grade_bio_ label def grade_bio_ 0 "Low Grade", modify label def grade_bio_ 1 "High Grade", modify label values preop_cyto_result preop_cyto_result_ label def preop_cyto_result_ 0 "Negative", modify label def preop_cyto_result_ 1 "Positive", modify label def preop_cyto_result_ 2 "Atypia/Suspicious", modify label def preop_cyto_result_ 3 "Not diagnostic", modify label values multifocal multifocal_ label def multifocal_ 0 "No", modify label def multifocal_ 1 "Yes", modify
*population setting
keep if grade_bio==1 | clinicalt_high== 1 | size_tumor_high_EAU==1 | size_tumor_high_NCCN==1 | preop_cyto_result==1 | multifocal==1 | previous_cystectomy==1 |variant_histology==1
keep if type_surg==2
drop if pt_path==.
//PREDICTIVE MODELS
*univariate analysis
logistic muscle_invasive grade_bio
logistic muscle_invasive clinicalt_high
logistic muscle_invasive size_tumor_high_EAU
logistic muscle_invasive size_tumor_high_NCCN
logistic muscle_invasive preop_cyto_result
logistic muscle_invasive multifocal
logistic muscle_invasive previous_cystectomy
logistic muscle_invasive variant_histology
*multivariate analysis
//eventuale aggiunta di lsens per calcolo sensibilità e specificità
//clinical model (based on variables only obtainable at CT and anamnestic evaluation)
logistic muscle_invasive clinicalt_high size_tumor_high_EAU multifocal previous_cystectomy preop_cyto_result, coef
lroc
looclass muscle_invasive clinicalt_high size_tumor_high_EAU multifocal previous_cystectomy preop_cyto_result, model(logit) fig
capture drop clinical_model_EAU_prediction
predict clinical_model_EAU_prediction
label variable clinical_model_EAU_prediction "Clinical model EAU"
logistic muscle_invasive clinicalt_high size_tumor_high_NCCN multifocal previous_cystectomy preop_cyto_result, coef
lroc
looclass muscle_invasive clinicalt_high size_tumor_high_NCCN multifocal previous_cystectomy preop_cyto_result, model(logit) fig
capture drop clinical_model_NCCN_prediction
predict clinical_model_NCCN_prediction
label variable clinical_model_NCCN_prediction "Clinical model NCCN"
//endoscopic model (based on variables only verifiable after URS)
logistic muscle_invasive grade_bio variant_histology, coef
lroc
looclass muscle_invasive grade_bio variant_histology, model(logit) fig
capture drop endoscopic_model_prediction
predict endoscopic_model_prediction
label variable endoscopic_model_prediction "Endoscopic model"
//tumor-related model (based only on tumor features)
logistic muscle_invasive grade_bio clinicalt_high size_tumor_high_EAU multifocal variant_histology
lroc
looclass muscle_invasive grade_bio clinicalt_high size_tumor_high_EAU multifocal variant_histology, model(logit) fig
capture drop tumor_model_EAU_prediction
predict tumor_model_EAU_prediction
label variable tumor_model_EAU_prediction "Tumor model EAU"
logistic muscle_invasive grade_bio clinicalt_high size_tumor_high_NCCN multifocal variant_histology
lroc
looclass muscle_invasive grade_bio clinicalt_high size_tumor_high_NCCN multifocal variant_histology, model(logit) fig
capture drop tumor_model_NCCN_prediction
predict tumor_model_NCCN_prediction
label variable tumor_model_NCCN_prediction "Tumor model NCCN"
//staging model (based only on clinical tumor grade and stage, which are the strongest predictors of worse prognosis)
logistic muscle_invasive grade_bio clinicalt_high
lroc
looclass muscle_invasive grade_bio clinicalt_high, model(logit) fig
capture drop staging_model_prediction
predict staging_model_prediction
label variable staging_model_prediction "Staging model"
*Run the decision curve with dca command (https://www.danieldsjoberg.com/dca-t...ial-stata.html) and save out net benefit
dca muscle_invasive clinical_model_EAU_prediction clinical_model_NCCN_prediction endoscopic_model_prediction tumor_model_EAU_prediction tumor_model_NCCN_prediction staging_model_prediction, xstart(0.05) xstop(0.35) xlabel(0(0.01)0.35) smooth ///
saving("DCA Output marker.dta", replace)
*nomogram visual description is executed on the predictive model with the best AUC and net benefit
nomolog
* Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit
use "DCA Output marker.dta", clear
g advantage = model - all
label var advantage "Increase in net benefit from using Marker model"
*Calculate the interventions avoided of the predictive model with the best AUC and net benefit
dca muscle_invasive model, prob(no) intervention xstart(0.05) xstop(0.35)
Thank you in advance!
Comment