Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • AUC/Classification after Mixed Effects models

    Nothing is mentioned in help melogit_postestimation regarding ROC plots/AUC estimates and/or tests of goodness of fit that are available after logit. Just wondering if anyone has any suggestions of the best possible way to do something similar or if anyone knows of any user written commands to do this.

  • #2
    Well, both ROC plots and the Hosmer-Lemeshow goodness of fit statistic are calculated directly from the observed and predicted outcomes. Neither command will run directly after -melogit-, but it is easy to emulate them:

    Code:
    predict mu if e(sample), mu
    roctab observed_outcome mu // ROC CURVE & AUC
    
    xtile risk_decile = mu, nq(10) // DECILES OF PREDICTED RISK
    collapse (count) n_obs = mu (sum) observed_successes = observed_outcome predicted_successes = mu, by(risk_decile)
    At that point you will have, for each decile of predicted risk, a count of observations, the number of predicted successes and the number of observed successes. If you want to calculate the Hosmer-Lemeshow chi square statistic from that, it's just the standard chi square calculation of summing squared deviations divided by predicted. Personally, I don't much like the Hosmer-Lemeshow chi square, and usually I assess model fit by plotting the predicted successes against the observed successes, overlaid on a diagonal line, to see whether the calibration looks good overall, and also to see if there are particular ranges of predicted risk where the fit is not so good. Also, as others have commented in another recent thread, the ritualistic use of deciles of risk for this calculation is probably inappropriate. In large data sets it certainly makes sense to use more, narrower bins. In my personal practice, I generally choose a number of risk groups so as to have somewhere around 50-100 observations in each bin. I don't know what the sampling distribution of a chi square statistic calculated from this would be (my guess is approximately chi square with #bins - 2 df), but, as already said, I don't find the chi square and p-value particularly useful in any case and prefer to assess the calibration of the model visually.

    Hope this helps.

    Comment


    • #3
      My reply crossed with Clyde's, but I'd written out some code so that users can verify for themselves that the roctab command works after feeding it predicted probabilities from a mixed effect logistic regression.

      Code:
      webuse bangladesh
      
      logit c_use urban age child*
      predict p_logistic
      lroc
      roctab c_use p_logistic
      
      logit c_use urban age child* i.district
      predict p_logistic_fe
      lroc
      roctab c_use p_logistic_fe
      /The ROCs in both cases should be identical*/
      
      melogit c_use urban age child* || district:
      predict p_melogit
      roccomp c_use p_logistic_fe p_melogit
      /*The ROCs from both models should be nearly equal, the chi-square test for equality of ROC areas just barely fails to reject*/
      As to the sampling distribution of the chi-square statistic for the Hosmer-Lemeshow test, the Stata manual says it's G (i.e. # of groups) - 2 degrees of freedom when calculated within the estimation sample, and G degrees of freedom when calculated outside the estimation sample.
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Clyde Schechter I started thinking about the same approach a little while after submitting the question and was glad to know my intuition wasn't heading in the wrong direction.

        Comment


        • #5
          Clyde Schechter
          Also, glad that I started this thread a couple of years ago so I could find it now when the same question came up again.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            Well, both ROC plots and the Hosmer-Lemeshow goodness of fit statistic are calculated directly from the observed and predicted outcomes. Neither command will run directly after -melogit-, but it is easy to emulate them:

            Code:
            predict mu if e(sample), mu
            roctab observed_outcome mu // ROC CURVE & AUC
            
            xtile risk_decile = mu, nq(10) // DECILES OF PREDICTED RISK
            collapse (count) n_obs = mu (sum) observed_successes = observed_outcome predicted_successes = mu, by(risk_decile)
            At that point you will have, for each decile of predicted risk, a count of observations, the number of predicted successes and the number of observed successes. If you want to calculate the Hosmer-Lemeshow chi square statistic from that, it's just the standard chi square calculation of summing squared deviations divided by predicted. Personally, I don't much like the Hosmer-Lemeshow chi square, and usually I assess model fit by plotting the predicted successes against the observed successes, overlaid on a diagonal line, to see whether the calibration looks good overall, and also to see if there are particular ranges of predicted risk where the fit is not so good. Also, as others have commented in another recent thread, the ritualistic use of deciles of risk for this calculation is probably inappropriate. In large data sets it certainly makes sense to use more, narrower bins. In my personal practice, I generally choose a number of risk groups so as to have somewhere around 50-100 observations in each bin. I don't know what the sampling distribution of a chi square statistic calculated from this would be (my guess is approximately chi square with #bins - 2 df), but, as already said, I don't find the chi square and p-value particularly useful in any case and prefer to assess the calibration of the model visually.

            Hope this helps.
            please help me how to do a melogit diagnostic for a multilevel model of current use of contraception (binary) with lots of categorical covariates and few of continuous covariates.

            here is my commands:

            melogit current_user i.age_group i.education i.parity i.marital i.womans_autonomy ///
            i.wealth i.insurance distance_to_nearest_FPprovider ///
            i.residence i.region first_birth_before17 dissatisfied_with_FP wait_2ormore_years fp_messages visited_by_CHW ///
            i.sdp_authority i.sdp_type i.bpjs_contract i.qoc || EA_ID: , or

            please help me please

            Comment


            • #7
              Originally posted by Weiwen Ng View Post
              My reply crossed with Clyde's, but I'd written out some code so that users can verify for themselves that the roctab command works after feeding it predicted probabilities from a mixed effect logistic regression.

              Code:
              webuse bangladesh
              
              logit c_use urban age child*
              predict p_logistic
              lroc
              roctab c_use p_logistic
              
              logit c_use urban age child* i.district
              predict p_logistic_fe
              lroc
              roctab c_use p_logistic_fe
              /The ROCs in both cases should be identical*/
              
              melogit c_use urban age child* || district:
              predict p_melogit
              roccomp c_use p_logistic_fe p_melogit
              /*The ROCs from both models should be nearly equal, the chi-square test for equality of ROC areas just barely fails to reject*/
              As to the sampling distribution of the chi-square statistic for the Hosmer-Lemeshow test, the Stata manual says it's G (i.e. # of groups) - 2 degrees of freedom when calculated within the estimation sample, and G degrees of freedom when calculated outside the estimation sample.
              please help me how to do a melogit diagnostic for a multilevel model of current use of contraception (binary) with lots of categorical covariates and few of continuous covariates.

              here is my commands:

              melogit current_user i.age_group i.education i.parity i.marital i.womans_autonomy ///
              i.wealth i.insurance distance_to_nearest_FPprovider ///
              i.residence i.region first_birth_before17 dissatisfied_with_FP wait_2ormore_years fp_messages visited_by_CHW ///
              i.sdp_authority i.sdp_type i.bpjs_contract i.qoc || EA_ID: , or

              please help me please

              Comment


              • #8
                Look at the code you quoted in #6. You can use that exact same code after running your -melogit- command.

                Comment

                Working...
                X