Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hosmer-Lemeshow test after meqrlogit

    Hi Statlist!

    Could anyone help me with some code to calculate the Hosmer-Lemeshow test after meqrlogit?

    Thanks a lot!
    Andrea

  • #2
    There is a certain ambiguity about how one defines the Hosmer-Lemeshow test following a multi-level model. The H-L test assess the extent to which the predictions of the logistic model are calibrated to the data. With a multi-level model there are two different kinds of predictions: one based on the fixed effects only (xb), and one based on the fixed + random effects (mu). The results will, in general be different, depending on which you want. Including the random effects gives you a good estimate of how well the current model is calibrated to the current data--but you cannot from there extrapolate to future applications of the model in other data sets because the random effects are inherently not known in advance. The H-L test based only on the fixed portion will give you an assessment of how the model might fare in a different sample. Anyway, I will leave it to you which of these you want to do.

    Code:
    // FIRST RUN THE meqrlogit MODEL
    
    // PICK ONE OF THE TWO VERSIONS OF phat BELOW:
    
    // EITHER GET MODEL PREDICTED RISK BASED ON FIXED EFFECTS
    predict phat, xb
    replace phat = invlogit(phat)
    
    // OR, IF YOU PREFER TO INCLUDE RANDOM EFFECTS
    predict phat, mu
    
    // GET DECILES OF PREDICTED RISK
    xtile decile = phat, nq(10)
    
    //  CALCULATE PREDICTED AND OBSERVED OUTCOMES
    //  IN EACH DECILE, AND SHOW TABLE
    collapse (sum) phat outcome, by(decile)
    list, noobs clean
    
    //  CALCULATE CHI SQUARE STATISTIC
    gen chi2 = (outcome-phat)^2/phat
    summ chi2, meanonly
    display "H-L Chi square = `r(sum)'"
    display "p = `=chi2tail(8, `r(sum)'')"
    Note: Not tested. Beware of typos. Also, note that the last line contains both double-quote (") characters and one sequence of two single quotes (''). They look very much alike. Be careful.

    Also, you might want to -preserve- the data in memory before you do the -collapse- if you need to return to it after this.





    Comment


    • #3
      Thanks a lot Clyde!

      I normally use only fixed effects (varying intercepts, but not slopes) for the reasons you explained.
      In this case a p > .05 would prove a good fit right?

      Comment


      • #4
        In this case a p > .05 would prove a good fit right?
        If you use the Hosmer-Lemeshow test in this way after ordinary logistic regression, it would be the same here.

        I never use it that way. The problem is that if you sample size is too large or too small, the p vs 0.05 comparison will be either too sensitive or not sensitive enough at diagnosing misfit. For me, the best use of the Homser-Lemeshow procedure is to examine the table of expected and observed outcomes in the deciles of predicted risk. I make a judgment as to whether the discrepancy between observed and expected is small enough for whatever my purpose at hand is. A "statistically significant" discrepancy might still be small enough that my model is good enough for practical purposes--this is especially likely if my sample is very large. In small sample data the opposite can be true: the discrepancy may fail to be designated as statistically significant, yet just in terms of the magnitudes of the observed-expected differences they are large enough to defeat the practical benefits of using the model. Another benefit of looking closely at the tables of observed and expected values is that it gives you a sense of where the model fits well and where it fits less well. So you might see, for example, that the model is well calibrated at low levels of risk, but not so much at high levels. Or you might see a model that is well calibrated at medium risk but does poorly at the extremes of risk. Thinking about those things can sometimes suggest how to improve the model by adding new variables, or interactions, or quadratic terms, or a transformation, etc. So I think the Hosmer-Lemeshow deciles of predicted risk concept is very useful. The p-value, not so much.

        Comment


        • #5
          Hi

          I use this code to calculate the Hosmer-Lemeshow test after multilevel logistic regression.
          However, i get the following error message with the last line:

          display "p = `=chi2tail(8, `r(sum)'')"
          too few ')' or ']'

          I consider the note: the last line contains both double-quote (") characters and one sequence of two single quotes ('').

          Any help would be appreciated.

          Comment


          • #6
            Well, when I wrote that code in #2, I did say that it wasn't tested and might contain typos. It does. Sorry about that. It should be:

            Code:
            display "p = `=chi2tail(8, `r(sum)')'"

            Comment


            • #7
              Thank you Clyde. It works.

              Comment

              Working...
              X