As outlined in a Stata Journal article from 2002 by Mario Cleves, one can compute ROC curve areas using lroc or roctab, as follows:
Of course, this is the only way to proceed if you have multiple classification variables. If you have a single classification variable, you can use roctab alone and get the same answer:
However, in cases where the predication model is not that good (e.g., with a negative regression coefficient), the two methods give different answers. The reason for this seems to be that roctab is expecting a classification variable for which increasing values indicate increased risk of the outcome of interest. However a negative regression coefficient will yield predicted values where increasing values indicate decreased risk, which results in a different answer.
My questions are these:
Code:
logit refvar classvars lroc predict p roctab refvar p
Code:
roctab refvar classvar
My questions are these:
- Can anyone confirm that using logit (or probit) before roctab can give different results than using roctab alone?
- Is this documented anywhere?
- If it is not documented, should it be? Or is it something that I should have known about ahead of time?
Comment