Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Prediction model - IECV and performance estimates

    Dear members,

    I'm developing a prediction model using internal-external cross-validation (IECV) based on five different geographical regions.
    I have multiple imputed data and run IECV to loop over each region, fit the model, and estimate metrics on held out region data. The performance metrics I'm interested in are discrimination (C-statistic) and calibration (slope and calibration-in-the-large). Below is the code to run IECV and get the pooled results:

    Code:
    * Run IECV *
    forval x = 1(1)5 {
        mi estimate, dots saving(miestiecv, replace): logistic dep_var $covariates if (cluster!=`x')
        replace iecv_xb = xb if iecv_xb==.
        drop xb
        display `x'
    }
    
    ** Pooled metrics **
    * Calibration slope *
    mi estimate, dots: logistic dep_var iecv_xb
    
    * Calibration-in-the-large * 
    mi estimate, dots: logistic dep_var iecv_xb, offset(iecv_xb) 
    
    * Discrimination *
    mi xeq 0: roctab dep_var iecv_xb
    return list
    cap program drop eroctab
    program eroctab, eclass
            version 12.0
    
            /* Step 1: perform ROC analysis */
            args refvar classvar
            roctab `refvar' `classvar'
    
            /* Step 2: save estimate and its variance in temporary matrices*/
            tempname b V
            mat `b' = r(area)
            mat `V' = r(se)^2
            local N = r(N)
    
            /* Step 3: make column names and row names consistent*/
            mat colnames `b' = AUC
            mat colnames `V' = AUC
            mat rownames `V' = AUC
    
            /*Step 4: post results to e()*/
            ereturn post `b' `V', obs(`N')
            ereturn local cmd "eroctab"
            ereturn local title "ROC area"
    end
    
    mi estimate, cmdok dots: eroctab dep_var iecv_xb
    For region-level estimates, I get calibration results but have problems with the C-statistic:
    (The results will later be pooled using random effects meta-analysis)

    Code:
    * IECV for calibration slope *
    
    capture postutil clear   
    tempname slope_region
    postfile `slope_region' slope slope_se val_size using slope_region.dta , replace 
      
      forval x = 1(1)5 {
      mi estimate, dots: logistic dep_var iecv_xb if cluster==`x'
      local slope = r(table)[1,1]
      local slope_se = r(table)[2,1]
      local val_size = e(N)
      post `slope_region' (`slope') (`slope_se') (`val_size')
      }
      
      postclose `slope_region' 
    
    * IECV for calibration-in-the-large *
    
    capture postutil clear   
    tempname citl_region
    postfile `citl_region' citl citl_se val_size using citl_region.dta , replace 
      
      forval x = 1(1)5 { 
      mi estimate, dots:  logistic dep_var iecv_xb if cluster==`x', offset(iecv_xb)
      local citl = r(table)[1,1]
      local citl_se = r(table)[2,1]
      local val_size = e(N)
      post `citl_region' (`citl') (`citl_se') (`val_size')
      }
      
      postclose `citl_region'
    
    * IECV for C-statistic *
    
    capture postutil clear   
    tempname C_region
    postfile `C_region' beta st_err val_size using C_region.dta , replace 
    
      forval x = 1(1)5 {
      mi estimate, cmdok dots: eroctab dep_var iecv_xb if cluster==`x' 
      local beta = r(table)[1,1]
      local st_err = r(table)[2,1]
      local val_size = e(N)
      post `C_region' (`beta') (`st_err') (`val_size')
      }
      
      postclose `C_region'
    For C-statistic, it does not loop over the regions. Instead, it uses all data and estimates the same C-statistic five times.
    So my question is: what do I need to change to get region-specific estimates for C-statistic?

    Any help on this would be much appreciated.

    Thank you
Working...
X