Dear all, I am having trouble understanding how to predict out-of-sample AUC and Pseudo R Squared for multiply imputed datasets. That is: estimated a logistic regression model on ¾ of the data and then tested the predictive performance on the remaining ¼ of the dataset that had not been used for estimating the model. Here is the code I have so far:
I am new to working with MI data in Stata, so it would be great to get some feedback on whether my above code is correct!
Code:
splitsample, generate(split) split(0.30 0.70) local rhs "lagfpe lagpolyarchy lnlaggdp lnlagpop lnlagmilper oceania asia europe americas lagtimesv lagtimesv2 lagtimesv3 lagtimesnv lagtimesnv2 lagtimesnv3 coldwar" noi mi estimate, or saving(miest, replace): logistic nonviolent_success `rhs' if split==2, vce(cluster country_name) qui mi query local M=r(M) scalar r2=0 scalar cstat=0 qui mi xeq 1/`M': logistic nonviolent_success `rhs' if split==1; scalar r2=r2+e(r2_p); lroc, nog; scalar cstat=cstat+r(area) scalar r2=r2/`M' scalar cstat=cstat/`M' noi di "Pseudo R=squared over imputed data = " r2 noi di "AUC statistic over imputed data = " cstat
