Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Out-of-sample AUC for imputed data

    Dear all, I am having trouble understanding how to predict out-of-sample AUC and Pseudo R Squared for multiply imputed datasets. That is: estimated a logistic regression model on ¾ of the data and then tested the predictive performance on the remaining ¼ of the dataset that had not been used for estimating the model. Here is the code I have so far:

    Code:
    splitsample, generate(split) split(0.30 0.70)
    
    local rhs "lagfpe lagpolyarchy lnlaggdp lnlagpop lnlagmilper oceania asia europe americas lagtimesv lagtimesv2 lagtimesv3 lagtimesnv lagtimesnv2 lagtimesnv3 coldwar"
    noi mi estimate, or saving(miest, replace): logistic nonviolent_success `rhs' if split==2, vce(cluster country_name)
    qui mi query
    local M=r(M)
    scalar r2=0
    scalar cstat=0
    qui mi xeq 1/`M': logistic nonviolent_success `rhs' if split==1; scalar r2=r2+e(r2_p); lroc, nog; scalar cstat=cstat+r(area)
    scalar r2=r2/`M'
    scalar cstat=cstat/`M'
    noi di "Pseudo R=squared over imputed data = " r2
    noi di "AUC statistic over imputed data = " cstat
    I am new to working with MI data in Stata, so it would be great to get some feedback on whether my above code is correct!
Working...
X