AUC/Classification after Mixed Effects models

wbuchanan

Join Date: Mar 2014

Posts: 1362
#1

AUC/Classification after Mixed Effects models

15 Feb 2017, 06:21

Nothing is mentioned in help melogit_postestimation regarding ROC plots/AUC estimates and/or tests of goodness of fit that are available after logit. Just wondering if anyone has any suggestions of the best possible way to do something similar or if anyone knows of any user written commands to do this.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#2

15 Feb 2017, 12:28

Well, both ROC plots and the Hosmer-Lemeshow goodness of fit statistic are calculated directly from the observed and predicted outcomes. Neither command will run directly after -melogit-, but it is easy to emulate them:

Code:

predict mu if e(sample), mu roctab observed_outcome mu // ROC CURVE & AUC xtile risk_decile = mu, nq(10) // DECILES OF PREDICTED RISK collapse (count) n_obs = mu (sum) observed_successes = observed_outcome predicted_successes = mu, by(risk_decile)

At that point you will have, for each decile of predicted risk, a count of observations, the number of predicted successes and the number of observed successes. If you want to calculate the Hosmer-Lemeshow chi square statistic from that, it's just the standard chi square calculation of summing squared deviations divided by predicted. Personally, I don't much like the Hosmer-Lemeshow chi square, and usually I assess model fit by plotting the predicted successes against the observed successes, overlaid on a diagonal line, to see whether the calibration looks good overall, and also to see if there are particular ranges of predicted risk where the fit is not so good. Also, as others have commented in another recent thread, the ritualistic use of deciles of risk for this calculation is probably inappropriate. In large data sets it certainly makes sense to use more, narrower bins. In my personal practice, I generally choose a number of risk groups so as to have somewhere around 50-100 observations in each bin. I don't know what the sampling distribution of a chi square statistic calculated from this would be (my guess is approximately chi square with #bins - 2 df), but, as already said, I don't find the chi square and p-value particularly useful in any case and prefer to assess the calibration of the model visually.

Hope this helps.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

15 Feb 2017, 13:25

My reply crossed with Clyde's, but I'd written out some code so that users can verify for themselves that the roctab command works after feeding it predicted probabilities from a mixed effect logistic regression.

Code:

webuse bangladesh logit c_use urban age child* predict p_logistic lroc roctab c_use p_logistic logit c_use urban age child* i.district predict p_logistic_fe lroc roctab c_use p_logistic_fe /The ROCs in both cases should be identical*/ melogit c_use urban age child* || district: predict p_melogit roccomp c_use p_logistic_fe p_melogit /*The ROCs from both models should be nearly equal, the chi-square test for equality of ROC areas just barely fails to reject*/

As to the sampling distribution of the chi-square statistic for the Hosmer-Lemeshow test, the Stata manual says it's G (i.e. # of groups) - 2 degrees of freedom when calculated within the estimation sample, and G degrees of freedom when calculated outside the estimation sample.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#4

15 Feb 2017, 18:06

Clyde Schechter I started thinking about the same approach a little while after submitting the question and was glad to know my intuition wasn't heading in the wrong direction.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#5

28 Mar 2019, 16:57

Clyde Schechter
Also, glad that I started this thread a couple of years ago so I could find it now when the same question came up again.
Comment
Jay Abdullah

Join Date: Mar 2021

Posts: 3
#6

04 Mar 2021, 19:50

Originally posted by Clyde Schechter View Post

Well, both ROC plots and the Hosmer-Lemeshow goodness of fit statistic are calculated directly from the observed and predicted outcomes. Neither command will run directly after -melogit-, but it is easy to emulate them:

Code:

predict mu if e(sample), mu roctab observed_outcome mu // ROC CURVE & AUC xtile risk_decile = mu, nq(10) // DECILES OF PREDICTED RISK collapse (count) n_obs = mu (sum) observed_successes = observed_outcome predicted_successes = mu, by(risk_decile)

At that point you will have, for each decile of predicted risk, a count of observations, the number of predicted successes and the number of observed successes. If you want to calculate the Hosmer-Lemeshow chi square statistic from that, it's just the standard chi square calculation of summing squared deviations divided by predicted. Personally, I don't much like the Hosmer-Lemeshow chi square, and usually I assess model fit by plotting the predicted successes against the observed successes, overlaid on a diagonal line, to see whether the calibration looks good overall, and also to see if there are particular ranges of predicted risk where the fit is not so good. Also, as others have commented in another recent thread, the ritualistic use of deciles of risk for this calculation is probably inappropriate. In large data sets it certainly makes sense to use more, narrower bins. In my personal practice, I generally choose a number of risk groups so as to have somewhere around 50-100 observations in each bin. I don't know what the sampling distribution of a chi square statistic calculated from this would be (my guess is approximately chi square with #bins - 2 df), but, as already said, I don't find the chi square and p-value particularly useful in any case and prefer to assess the calibration of the model visually.

Hope this helps.

please help me how to do a melogit diagnostic for a multilevel model of current use of contraception (binary) with lots of categorical covariates and few of continuous covariates.

here is my commands:

melogit current_user i.age_group i.education i.parity i.marital i.womans_autonomy ///
i.wealth i.insurance distance_to_nearest_FPprovider ///
i.residence i.region first_birth_before17 dissatisfied_with_FP wait_2ormore_years fp_messages visited_by_CHW ///
i.sdp_authority i.sdp_type i.bpjs_contract i.qoc || EA_ID: , or

please help me please
Comment
Jay Abdullah

Join Date: Mar 2021

Posts: 3
#7

04 Mar 2021, 19:51

Originally posted by Weiwen Ng View Post

My reply crossed with Clyde's, but I'd written out some code so that users can verify for themselves that the roctab command works after feeding it predicted probabilities from a mixed effect logistic regression.

Code:

webuse bangladesh logit c_use urban age child* predict p_logistic lroc roctab c_use p_logistic logit c_use urban age child* i.district predict p_logistic_fe lroc roctab c_use p_logistic_fe /The ROCs in both cases should be identical*/ melogit c_use urban age child* || district: predict p_melogit roccomp c_use p_logistic_fe p_melogit /*The ROCs from both models should be nearly equal, the chi-square test for equality of ROC areas just barely fails to reject*/

As to the sampling distribution of the chi-square statistic for the Hosmer-Lemeshow test, the Stata manual says it's G (i.e. # of groups) - 2 degrees of freedom when calculated within the estimation sample, and G degrees of freedom when calculated outside the estimation sample.

please help me how to do a melogit diagnostic for a multilevel model of current use of contraception (binary) with lots of categorical covariates and few of continuous covariates.

here is my commands:

melogit current_user i.age_group i.education i.parity i.marital i.womans_autonomy ///
i.wealth i.insurance distance_to_nearest_FPprovider ///
i.residence i.region first_birth_before17 dissatisfied_with_FP wait_2ormore_years fp_messages visited_by_CHW ///
i.sdp_authority i.sdp_type i.bpjs_contract i.qoc || EA_ID: , or

please help me please
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#8

04 Mar 2021, 22:25

Look at the code you quoted in #6. You can use that exact same code after running your -melogit- command.
Comment

Announcement

AUC/Classification after Mixed Effects models

Comment

Comment

Comment

Comment

Comment

Comment

Comment