Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to perform Discrimination (e.g. lroc) and Calibration (e.g. estat gof) after logistic regression MULTIPLE IMPUTATION

    The Stata manual 13 mi.pdf says " Do not expect postestimation commands that depend on predicted values such as ... lroc and the like, to produce correct results, if they produce results at all."

    How does one get around this please?

  • #2
    Let's proceed with the reading, i.e., the next paragraph:

    Which brings us to the third point. Even when you specify mi estimate’s post option, mi estimate still does not post everything the estimation command expects to see. It does not post likelihood values, for instance, because there is no counterpart after MI estimation. Thus, you should be prepared to see unexpected and inelegant error messages if you use a postestimation command that depends on an unestimated and unposted result. All of which is to say that if you specify the post option, you have a responsibility beyond the usual to ensure the validity of any statistical results.
    Best regards,

    Marcos

    Comment


    • #3
      I agree with the general principles Marcos sets out in #2. But in the case of an ROC area and a calibration assessment, these are reasonable things to do after fitting a model to a dichotomous outcome. What I am presuming is wanted is an assessment of how the MI-model, treated as a single model of the original data (not of the multiply imputed data sets) discriminates and is calibrated. These can be done.

      Code:
      // RUN YOUR -mi estimate, post- COMMAND
      matrix b = e(b)
      mi extract 0 //  GO TO ORIGINAL DATA
      matrix score xb = b // GET LINEAR PREDICTIONS
      gen predicted_p = invologit(xb)
      
      // CALCULATE ROC CURVE AREA
      roctab outcome_var predicted_p // ROC CURVE
      
      // EMULATE HOSMER-LEMESHOW CALCULATIONS
      xtile decile = predicted_p, nq(10)
      collapse (sum) outcome predicted_p, by(decile)
      list, noobs clean
      // CALIBRATION GRAPH:  NOT PART OF H-L PROCEDURE
      // BUT USEFUL
      graph twoway scatter predicted_p outcome || line outcome outcome, sort
      gen chi2 = sum((outcome-predicted)^2)/predicted
      local h_l_chi2 = chi2[_N]
      display "Hosmer-Lemeshow Chi Square = " %3.2f =`h_l_chi2'
      display "p = " %05.3f `=chi2tail(8, `h_l_chi2')
      Note: not tested, beware of typos.

      Let me re-emphasize for clarity what this is. This takes the model defined by the MI-analysis coefficients and treats it as a model of the original (pre-imputations) data, and tests its discrimination and calibration. Again, as noted, this approach is valid with any procedure that produces predicted probabilities--even an oracle!

      Let me also be clear what it is not: it is not a multiply imputed version of ROC area or calibration test. I know of no reason to think that using Rubin's rules to combine the results of the ROC curves or H-L statistics in the imputed data sets would yield anything useful, or even meaningful. Probably not, in fact.

      Added Note: There is some controversy about the Hosmer-Lemeshow goodness of fit test and variants of it. I have opinions about those controversies, but I don't feel this is the place to go into them. I'm simply presenting the way one could emulate a Hosmer-Lemeshow test in this context, without commenting on whether it's a good idea or not.

      Comment


      • #4
        in addition you might want to use "mi predict" and then the invlogit function and then you can use -roctab- to get AurOC and lowess to get a calibration plot; e.g.,
        Code:
        lowess outcome predict
        where you replace "outcome" with the name of your outcome variable and replace "predict" with the name you gave to your predicted probabilities

        Comment

        Working...
        X