Prediction model - IECV and performance estimates

Bjornar Berg

Join Date: May 2024
Posts: 1

Prediction model - IECV and performance estimates

26 May 2024, 14:16

Dear members,

I'm developing a prediction model using internal-external cross-validation (IECV) based on five different geographical regions.
I have multiple imputed data and run IECV to loop over each region, fit the model, and estimate metrics on held out region data. The performance metrics I'm interested in are discrimination (C-statistic) and calibration (slope and calibration-in-the-large). Below is the code to run IECV and get the pooled results:

Code:

* Run IECV *
forval x = 1(1)5 {
    mi estimate, dots saving(miestiecv, replace): logistic dep_var $covariates if (cluster!=`x')
    replace iecv_xb = xb if iecv_xb==.
    drop xb
    display `x'
}

** Pooled metrics **
* Calibration slope *
mi estimate, dots: logistic dep_var iecv_xb

* Calibration-in-the-large * 
mi estimate, dots: logistic dep_var iecv_xb, offset(iecv_xb) 

* Discrimination *
mi xeq 0: roctab dep_var iecv_xb
return list
cap program drop eroctab
program eroctab, eclass
        version 12.0

        /* Step 1: perform ROC analysis */
        args refvar classvar
        roctab `refvar' `classvar'

        /* Step 2: save estimate and its variance in temporary matrices*/
        tempname b V
        mat `b' = r(area)
        mat `V' = r(se)^2
        local N = r(N)

        /* Step 3: make column names and row names consistent*/
        mat colnames `b' = AUC
        mat colnames `V' = AUC
        mat rownames `V' = AUC

        /*Step 4: post results to e()*/
        ereturn post `b' `V', obs(`N')
        ereturn local cmd "eroctab"
        ereturn local title "ROC area"
end

mi estimate, cmdok dots: eroctab dep_var iecv_xb

For region-level estimates, I get calibration results but have problems with the C-statistic:
(The results will later be pooled using random effects meta-analysis)

Code:

* IECV for calibration slope *

capture postutil clear   
tempname slope_region
postfile `slope_region' slope slope_se val_size using slope_region.dta , replace 
  
  forval x = 1(1)5 {
  mi estimate, dots: logistic dep_var iecv_xb if cluster==`x'
  local slope = r(table)[1,1]
  local slope_se = r(table)[2,1]
  local val_size = e(N)
  post `slope_region' (`slope') (`slope_se') (`val_size')
  }
  
  postclose `slope_region' 

* IECV for calibration-in-the-large *

capture postutil clear   
tempname citl_region
postfile `citl_region' citl citl_se val_size using citl_region.dta , replace 
  
  forval x = 1(1)5 { 
  mi estimate, dots:  logistic dep_var iecv_xb if cluster==`x', offset(iecv_xb)
  local citl = r(table)[1,1]
  local citl_se = r(table)[2,1]
  local val_size = e(N)
  post `citl_region' (`citl') (`citl_se') (`val_size')
  }
  
  postclose `citl_region'

* IECV for C-statistic *

capture postutil clear   
tempname C_region
postfile `C_region' beta st_err val_size using C_region.dta , replace 

  forval x = 1(1)5 {
  mi estimate, cmdok dots: eroctab dep_var iecv_xb if cluster==`x' 
  local beta = r(table)[1,1]
  local st_err = r(table)[2,1]
  local val_size = e(N)
  post `C_region' (`beta') (`st_err') (`val_size')
  }
  
  postclose `C_region'

For C-statistic, it does not loop over the regions. Instead, it uses all data and estimates the same C-statistic five times.
So my question is: what do I need to change to get region-specific estimates for C-statistic?

Any help on this would be much appreciated.

Thank you

Tags: None

Announcement

Prediction model - IECV and performance estimates