Hi, I am working on bootstrap internal validation using Stata. I had developed these programs few months ago and tried running it again earlier this week, but I encountered
error.
Please find the programs here:
Here are the usage example:
My question:
1. What changed within Stata or parallel over the past few months? The codes had worked perfectly before but it's now returning error?
2. I have checked and there are no en dash or em dashes that might break the loop. I don't know what is wrong.
Thank you very much in advance.
Code:
r(3598)
Please find the programs here:
Code:
*===============================================================================
* PROGRAM 1: Forward Selection using QIC
* Returns: selected variables as local macro
*===============================================================================
capture program drop forward_select_qic
program define forward_select_qic, rclass
syntax varlist(min=1), outcome(varname) [penter(real 0.10)]
local candidates `varlist'
local selected ""
local best_qic = .
local keep_adding = 1
*---------------------------------------------------------------------------
* Step 1: Univariable screening (p < penter)
*---------------------------------------------------------------------------
local candidates_pass ""
foreach var of local candidates {
capture quietly xtgee `outcome' `var', ///
family(binomial) link(logit) corr(exchangeable) robust
if _rc == 0 {
local pval = 2*normal(-abs(_b[`var']/_se[`var']))
if `pval' < `penter' {
local candidates_pass `candidates_pass' `var'
}
}
}
if "`candidates_pass'" == "" {
return local selected ""
exit
}
*---------------------------------------------------------------------------
* Step 2: Forward selection based on QIC + p<0.05
*---------------------------------------------------------------------------
local remaining `candidates_pass'
while `keep_adding' {
local var_to_add = ""
local new_best_qic = .
local improved = 0
* Try adding each remaining variable
foreach var of local remaining {
local test_vars `selected' `var'
capture quietly qic `outcome' `test_vars', ///
family(binomial) link(logit) corr(exchangeable) robust
if _rc == 0 {
local current_qic = r(qic)
* Check if the new variable has p<0.05
capture quietly xtgee `outcome' `test_vars', ///
family(binomial) link(logit) corr(exchangeable) robust
if _rc == 0 {
local pval_new = 2*normal(-abs(_b[`var']/_se[`var']))
* Variable must have p<0.05
if `pval_new' < 0.05 {
* For first variable: find lowest QIC among all candidates
* For subsequent: only add if QIC improves
if "`selected'" == "" {
* First variable: select one with lowest QIC
if `new_best_qic' == . | `current_qic' < `new_best_qic' {
local new_best_qic = `current_qic'
local var_to_add = "`var'"
local improved = 1
}
}
else {
* Subsequent variables: must improve current best QIC
if `current_qic' < `best_qic' {
if `new_best_qic' == . | `current_qic' < `new_best_qic' {
local new_best_qic = `current_qic'
local var_to_add = "`var'"
local improved = 1
}
}
}
}
}
}
}
* Add variable if improvement found
if `improved' {
local selected `selected' `var_to_add'
local remaining : list remaining - var_to_add
local best_qic = `new_best_qic'
* Stop if no more candidates
if "`remaining'" == "" {
local keep_adding = 0
}
}
else {
* No improvement possible
local keep_adding = 0
}
}
return local selected "`selected'"
end
*===============================================================================
* PROGRAM 2: Bootstrap Internal Validation (Box 2)
* Follows the exact steps from Box 2
*===============================================================================
capture program drop boot_internal_validation
program define boot_internal_validation, rclass
version 16.0
syntax , reps(integer) [penter(real 0.10)]
display as text "{hline 80}"
display as text "Bootstrap Internal Validation (Following Box 2)"
display as text "{hline 80}"
display as text "Outcome: $outcome_var"
display as text "Candidate predictors: " _c
local n_cand : word count $candidates
display as result `n_cand'
display as text "Bootstrap replications: " as result `reps'
display as text "Univariable screening: p<" as result `penter'
display as text "Selection method: Forward selection by QIC (p<0.05 required)"
display as text "{hline 80}" _newline
preserve
*---------------------------------------------------------------------------
* STEP 1: Develop model on original data (apparent performance)
*---------------------------------------------------------------------------
display as text "STEP 1: Developing model on original dataset..."
quietly xtset $cluster_id visit_number
forward_select_qic $candidates, outcome($outcome_var) penter(`penter')
local original_selected = r(selected)
if "`original_selected'" == "" {
display as error "No variables selected on original data!"
display as text "Try increasing penter threshold or checking your data"
restore
exit
}
display as text "Selected variables: " as result "`original_selected'"
* Fit final model on original data
quietly xtgee $outcome_var `original_selected', ///
family(binomial) link(logit) corr(exchangeable) robust
predict xb_apparent, xb
quietly roctab $outcome_var xb_apparent
scalar auc_apparent = r(area)
quietly logit $outcome_var xb_apparent, vce(cluster $cluster_id)
scalar slope_apparent = _b[xb_apparent]
drop xb_apparent
display as text " Apparent AUC: " as result %5.3f auc_apparent
display as text " Apparent Calibration Slope: " as result %5.3f slope_apparent
display as text "{hline 80}" _newline
*---------------------------------------------------------------------------
* Initialize tracking matrices
*---------------------------------------------------------------------------
local n_candidates : word count $candidates
tempname selection_matrix corrected_aucs corrected_slopes
matrix `selection_matrix' = J(`reps', `n_candidates', 0)
matrix `corrected_aucs' = J(`reps', 1, .)
matrix `corrected_slopes' = J(`reps', 1, .)
*---------------------------------------------------------------------------
* STEPS 2-5: Bootstrap loop
*---------------------------------------------------------------------------
display as text "STEPS 2-5: Bootstrap resampling and optimism calculation..."
tempname total_optimism_auc total_optimism_slope
scalar `total_optimism_auc' = 0
scalar `total_optimism_slope' = 0
local successful = 0
forvalues rep = 1/`reps' {
if mod(`rep', 10) == 0 {
display as text " Bootstrap iteration `rep'/`reps' - " _c
display as result %3.0f (`rep'/`reps'*100) as text "% complete"
}
restore, preserve
capture noisily {
*-------------------------------------------------------------------
* STEP 2: Generate bootstrap sample
*-------------------------------------------------------------------
quietly bsample, cluster($cluster_id)
* Create new sequential time index to avoid duplicates
quietly bysort $cluster_id: gen _boot_time = _n
quietly xtset $cluster_id _boot_time
*-------------------------------------------------------------------
* STEP 3: Develop bootstrap model (same methods as original)
*-------------------------------------------------------------------
forward_select_qic $candidates, outcome($outcome_var) penter(`penter')
local boot_selected = r(selected)
* Track which variables were selected
if "`boot_selected'" != "" {
local col = 1
foreach var of global candidates {
local is_selected : list var in boot_selected
if `is_selected' {
matrix `selection_matrix'[`rep', `col'] = 1
}
local col = `col' + 1
}
*---------------------------------------------------------------
* STEP 3a: Bootstrap performance (in bootstrap sample)
*---------------------------------------------------------------
quietly {
xtgee $outcome_var `boot_selected', ///
family(binomial) link(logit) corr(exchangeable) robust
predict xb_boot, xb
roctab $outcome_var xb_boot
local auc_boot = r(area)
logit $outcome_var xb_boot, vce(cluster $cluster_id)
local slope_boot = _b[xb_boot]
drop xb_boot _boot_time
}
*---------------------------------------------------------------
* STEP 3b: Test performance (in original data)
*---------------------------------------------------------------
restore, preserve
quietly {
xtset $cluster_id visit_number
xtgee $outcome_var `boot_selected', ///
family(binomial) link(logit) corr(exchangeable) robust
predict xb_test, xb
roctab $outcome_var xb_test
local auc_test = r(area)
logit $outcome_var xb_test, vce(cluster $cluster_id)
local slope_test = _b[xb_test]
drop xb_test
}
*---------------------------------------------------------------
* STEP 4: Calculate optimism
*---------------------------------------------------------------
local optimism_auc = `auc_boot' - `auc_test'
local optimism_slope = `slope_boot' - `slope_test'
scalar `total_optimism_auc' = `total_optimism_auc' + `optimism_auc'
scalar `total_optimism_slope' = `total_optimism_slope' + `optimism_slope'
local successful = `successful' + 1
*---------------------------------------------------------------
* Store corrected values for this iteration
*---------------------------------------------------------------
local auc_corrected_i = auc_apparent - `optimism_auc'
local slope_corrected_i = slope_apparent - `optimism_slope'
matrix `corrected_aucs'[`rep', 1] = `auc_corrected_i'
matrix `corrected_slopes'[`rep', 1] = `slope_corrected_i'
}
}
if _rc {
* Skip failed iterations silently
}
}
restore
*---------------------------------------------------------------------------
* STEP 6: Average optimism
*---------------------------------------------------------------------------
if `successful' > 0 {
scalar avg_optimism_auc = `total_optimism_auc' / `successful'
scalar avg_optimism_slope = `total_optimism_slope' / `successful'
}
else {
display as error "No successful bootstrap iterations!"
exit
}
*---------------------------------------------------------------------------
* STEP 7: Optimism-corrected estimates
*---------------------------------------------------------------------------
scalar auc_corrected = auc_apparent - avg_optimism_auc
scalar slope_corrected = slope_apparent - avg_optimism_slope
*---------------------------------------------------------------------------
* Calculate 95% CIs from bootstrap distribution
*---------------------------------------------------------------------------
preserve
clear
svmat `corrected_aucs', names(auc_corr)
svmat `corrected_slopes', names(slope_corr)
drop if missing(auc_corr1)
_pctile auc_corr1, p(2.5 97.5)
scalar auc_ci_lower = r(r1)
scalar auc_ci_upper = r(r2)
_pctile slope_corr1, p(2.5 97.5)
scalar slope_ci_lower = r(r1)
scalar slope_ci_upper = r(r2)
restore
*---------------------------------------------------------------------------
* Display results
*---------------------------------------------------------------------------
display as text _newline "{hline 80}"
display as text "RESULTS: Internal Validation"
display as text "{hline 80}"
display as text "Original model variables: " as result "`original_selected'"
display as text "Successful bootstrap iterations: " as result `successful' as text "/" as result `reps'
display as text "{hline 80}"
display as text " Apparent Optimism Corrected"
display as text "{hline 80}"
display as text "AUC " ///
as result %9.3f auc_apparent " " %9.3f avg_optimism_auc " " %9.3f auc_corrected
display as text "Calibration Slope " ///
as result %9.3f slope_apparent " " %9.3f avg_optimism_slope " " %9.3f slope_corrected
display as text "{hline 80}"
display as text "AUC (95% CI) " ///
as result %9.3f auc_corrected " (" %5.3f auc_ci_lower " - " %5.3f auc_ci_upper ")"
display as text "Slope (95% CI) " ///
as result %9.3f slope_corrected " (" %5.3f slope_ci_lower " - " %5.3f slope_ci_upper ")"
display as text "{hline 80}" _newline
*---------------------------------------------------------------------------
* Variable selection frequencies
*---------------------------------------------------------------------------
display as text "Variable Selection Frequencies Across Bootstrap Samples"
display as text "{hline 80}"
display as text "{col 5}Variable{col 40}Selected (n){col 55}Frequency (%)"
display as text "{hline 80}"
local col = 1
foreach var of global candidates {
local count = 0
forvalues rep = 1/`reps' {
local count = `count' + `selection_matrix'[`rep', `col']
}
local freq = (`count' / `successful') * 100
if `freq' > 0 {
display as text "{col 5}`var'{col 40}" as result `count' ///
as text "{col 55}" as result %5.1f `freq'
}
local col = `col' + 1
}
display as text "{hline 80}"
*---------------------------------------------------------------------------
* Return results
*---------------------------------------------------------------------------
return local original_model "`original_selected'"
return scalar auc_apparent = auc_apparent
return scalar auc_corrected = auc_corrected
return scalar auc_ci_lower = auc_ci_lower
return scalar auc_ci_upper = auc_ci_upper
return scalar optimism_auc = avg_optimism_auc
return scalar slope_apparent = slope_apparent
return scalar slope_corrected = slope_corrected
return scalar slope_ci_lower = slope_ci_lower
return scalar slope_ci_upper = slope_ci_upper
return scalar optimism_slope = avg_optimism_slope
return scalar n_successful = `successful'
end
Code:
webuse airacc, clear
xtset airline time
global cluster_id airline
global candidates relcnt relsize pmiles ait uit
global outcome_var rec
parallel, prog(forward_select_qic): boot_internal_validation, reps(10) penter(0.10)
invalid '-'
stata(): 3598 Stata returned error
parallel_export_programs(): - function returned error
parallel_write_do(): - function returned error
<istmt>: - function returned error
r(3598);
1. What changed within Stata or parallel over the past few months? The codes had worked perfectly before but it's now returning error?
2. I have checked and there are no en dash or em dashes that might break the loop. I don't know what is wrong.
Thank you very much in advance.

Comment