Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating ROC curve areas: problems with using predicted values from logit

    As outlined in a Stata Journal article from 2002 by Mario Cleves, one can compute ROC curve areas using lroc or roctab, as follows:

    Code:
    logit refvar classvars
    lroc
    predict p
    roctab refvar p
    Of course, this is the only way to proceed if you have multiple classification variables. If you have a single classification variable, you can use roctab alone and get the same answer:

    Code:
    roctab refvar classvar
    However, in cases where the predication model is not that good (e.g., with a negative regression coefficient), the two methods give different answers. The reason for this seems to be that roctab is expecting a classification variable for which increasing values indicate increased risk of the outcome of interest. However a negative regression coefficient will yield predicted values where increasing values indicate decreased risk, which results in a different answer.

    My questions are these:
    1. Can anyone confirm that using logit (or probit) before roctab can give different results than using roctab alone?
    2. Is this documented anywhere?
    3. If it is not documented, should it be? Or is it something that I should have known about ahead of time?

  • #2
    I have run across the phenomenon where lroc after a logistic regression model using the classification variable gives a different result from roctab on the original classification variable, and it's for the reason that you mention, which is documented in the entry for roctab of the user's manual. But I don't recall ever getting different results using roctab on the predictions. Also, any discrepancy always went away when I reversed the sign of the original classification variable before using roctab.

    I cannot confirm that you can get a different result using predictions after fitting a logistic regression model, even with a poor predictor, with or without guaranteeing that the regression coefficient is negative.
    Code:
    version 13.1
    
    clear *
    set more off
    set seed `=date("2015-01-29", "YMD")'
    quietly set obs 100
    generate byte response = runiform() > 0.5
    quietly generate double predictor = .
    
    program define rocem, rclass
        version 13.1
        syntax , [NOREVerse]
    
        tempvar xb
        tempname area
        quietly {
            replace predictor = runiform()
            
            if "`reverse'" == "" {
                correlate response predictor
                if r(rho) > 0 replace predictor = -predictor
            }
            logit response c.predictor
            predict double `xb', xb
            lroc, nograph
            scalar define `area' = r(area)
            roctab response `xb'
        }
        return scalar roctab = r(area)
        return scalar lroc = `area'
    end
    
    program define signflip
        version 13.1
        syntax , [flip]
    
        local noreverse = cond(("`flip'" == ""), "", "noreverse")
        
        tempname file_handle
        tempfile tmpfil0
        postfile `file_handle' double(lroc roctab) using `tmpfil0'
    
        forvalues rep = 1/500 {
            rocem , `noreverse'
            post `file_handle' (r(lroc)) (r(roctab))
        }
        postclose `file_handle'
        
        preserve
        use `tmpfil0', clear
        graph twoway scatter roctab lroc, mcolor(black) msize(vsmall) ///
            ylabel( , angle(horizontal) nogrid)
    
        generate double delta = lroc - roctab
        summarize delta
    end
    
    pause on
    signflip
    pause
    
    signflip , flip
    
    exit

    Comment


    • #3
      I had deleted my first post to this thread, which illustrated the phenomenon and quoted the user's manual. Here's the illustration.
      Code:
      version 13.1
      
      clear *
      set more off
      
      sysuse auto
      
      logit foreign c.displacement, nolog
      lroc, nograph
      
      roctab foreign displacement
      
      quietly replace displacement = -displacement
      roctab foreign displacement
      
      exit
      The Description section of the entry in the user's manual for roctab says, "The rating or outcome of the diagnostic test or test modality is recorded in classvar, which must be at least ordinal, with higher values indicating higher risk."

      Comment


      • #4
        Joseph,

        Thanks for weighing in on this. My situation is similar to your example above (auto data set), except that in my case the regression coefficient, while negative, is not significantly different from 0. In other words, changing the sign of the classification variable makes the logit/lroc and roctab answers the same, but it doesn't make it any better of a predictor.

        Moreover, even though changing the sign makes the lroc and roctab results the same, in my case they are both wrong. My classification variable takes values 0 through 10, with higher values meaning higher risk of disease. In most cases, it is a reasonably good predictor, but in one particular case it is not. Simply changing the sign is not an appropriate solution because the classification variable generally works well the way it was designed. Instead, I want the ROC curve to be accurate representation of the classification variable as it was originally designed, so my solution is to use the roctab answer and ignore the logit/lroc answer.

        So, I guess the take-home message here is to pay attention to the requirement that the classification variable be positively correlated with the outcome variable and not to blindly use roctab (or roccomp) on the predicted probabilities without paying attention to whether that requirement is satisfied.

        Incidentally, using roctab on the predicted probabilities from the regression is generally the same as using roctab on the original classification variable because the predicted probabilities are perfectly correlated with the classification variable and thus result in the same ROC curve.

        Regards,
        Joe

        Comment


        • #5
          Just a couple of points.

          If the regression coefficient is negative (regardless of whether it is significantly different from zero), then take the value from logit/lroc (or from roctab on the predictions if you wish) and subtract it from 0.5. Alternatively, to use the orginal classification variable, change its sign, run roctab, and then subtract the result from 0.5.

          Using roctab on the predicted probabilities (or linear predictions) is never the same as using roctab on the original classification variable when the sign of regression coefficient is negative.

          Comment


          • #6
            Should have said "subtract 0.5 from the result", sorry.

            Comment

            Working...
            X