Out-of-sample AUC for imputed data

Birte Olsen

Join Date: Feb 2022
Posts: 16

Out-of-sample AUC for imputed data

15 May 2022, 06:40

Dear all, I am having trouble understanding how to predict out-of-sample AUC and Pseudo R Squared for multiply imputed datasets. That is: estimated a logistic regression model on ¾ of the data and then tested the predictive performance on the remaining ¼ of the dataset that had not been used for estimating the model. Here is the code I have so far:

Code:

splitsample, generate(split) split(0.30 0.70)

local rhs "lagfpe lagpolyarchy lnlaggdp lnlagpop lnlagmilper oceania asia europe americas lagtimesv lagtimesv2 lagtimesv3 lagtimesnv lagtimesnv2 lagtimesnv3 coldwar"
noi mi estimate, or saving(miest, replace): logistic nonviolent_success `rhs' if split==2, vce(cluster country_name)
qui mi query
local M=r(M)
scalar r2=0
scalar cstat=0
qui mi xeq 1/`M': logistic nonviolent_success `rhs' if split==1; scalar r2=r2+e(r2_p); lroc, nog; scalar cstat=cstat+r(area)
scalar r2=r2/`M'
scalar cstat=cstat/`M'
noi di "Pseudo R=squared over imputed data = " r2
noi di "AUC statistic over imputed data = " cstat

I am new to working with MI data in Stata, so it would be great to get some feedback on whether my above code is correct!

Tags: AUC, imputed data

Announcement

Out-of-sample AUC for imputed data