Goodness of fit test for logistic regression on survey data

Susanne Helmschrott

Join Date: Nov 2014

Posts: 9
#1

Goodness of fit test for logistic regression on survey data

04 Nov 2014, 15:06

Dear Statalist members,

I would like to perform a goodness-of-fit test for logistic regression models that were run on survey data. I got the suggestion to use AIC or BIC, but as far as I know these tests cannot be run on survey data. In several papers, I found the F-adjusted mean residual goodness-of-fit test to be the appropriate test and applied the estat gof command in Stata after running my svyset regressions. Unfortunately, I have problems interpreting the results.

More in detail, I would like to choose the better fitting model out of two with the following results:

First model:

. estat gof

Logistic model for status_r, goodness-of-fit test

F(9,71) = 1.04
Prob > F = 0.4192

Second model:

. estat gof

Logistic model for status_r, goodness-of-fit test

F(9,71) = 1.27
Prob > F = 0.2679

What does the test tell me about the goodness of fit of each model and which is the better fit?

Thank you very much for your help!
Susanne
Tags: None
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

04 Nov 2014, 18:36

Welcome to Statalist, Susanne!

Ordinarily for tests of fit, a small p-value indicates lack of fit. You could have gotten information about estat gof by typing "help estat gof" and following the links to the Stata Manual. estat gof employs a chi square test known as the Hosmer-Lemeshow test.

However, because you have survey data, you have a more serious problem: possibly all of your logistic results are wrong: certainly the standard errors and p-values, the coefficients also if you did not weight the logistic analysis. The problem is that survey analyses do not base inference standard likelihood theory. Rather inference depends on the weights and on aspects of the survey design, primarily variation between primary sampling units, the top level clusters are knowns for short as PSUs. To do the correct analysis, you will need to svyset your data and then run svy: logistic.

estat gof won't work after svy: logistic, but the user-written command svylogitgof will; this command is the adaptation of the Hosmer-test to survey data, not surprising because Stanley Lemeshow is one of the authors.You can install svylogitgof by typing "findit svylogitof" and clicking on the "st0099_1" in the first entry. You can download the article that describes svylogitgof via the following link: http://www.stata-journal.com/sjpdf.html?articlenum=st0099 There's also a recent thread connected with the test: http://www.statalist.org/forums/foru...ng-svylogitgof.

Another test of fit, useful if you have continuous predictors, is linktest, run after svy: logistic.

A personal note: long-time practice on Statalist is to register with a user-name that includes first and last name. Many of us consider this practice responsible for the collaborative and professional atmosphere that has long been a hallmark of Statalist. Please re-register with your full name; -just hit the "CONTACT US" button on the bottom. After you've done the proper analysis, come back to the list with any questions that remain.

Steve

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5010
#3

04 Nov 2014, 22:28

Actually, somewhere along the way, estat gof did start working after svy: http://www.stata.com/manuals13/svyestat.pdf. To Steve's advice, I'll add that it is good to show the code used before a post-estimation command. Otherwise we can't tell if there are mistakes there or not. Hopefully you don't have the problems Steve mentions, but when only a fragment of the code is presented it is hard to say.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Susanne Helmschrott

Join Date: Nov 2014

Posts: 9
#4

05 Nov 2014, 04:17

Dear Steve, Dear Richard,

thank you very much for your replies!

Yes, I svysetted the analysis. The full command I use is
svyset [pw=WT0], jkrw(WT1-WT80, mult(.9875)) vce(jackknife)

svy: logistic y x1 x2 x3 x4 (all binary variables)
estat gof

As I use Stata 13 (the version as of which svy supports estat gof) the variance estimation should be correct. Or should it be svy: estat gof?

However, I am still a bit confused about the interpretation. From your answer I see that the first model should be the better model because it has the higher p-value.
But is there a threshold as of which a model is good? I guess that p=0.4192 is quite high and the model is well fitted?
Is the interpretation of the p-value the same as for the Hosmer-Lemeshow test? I guess there is more advise on interpretation of this test on the web than for the adjusted test.

Thanks again!
Susanne
PS: I'll add my last name
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5010
#5

05 Nov 2014, 06:33

estat gof is correct. I am not clear what the difference is between your two models, but the test results indicate that neither is violating assumptions. You need other criteria for choosing between the two.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#6

05 Nov 2014, 09:22

The problem with the Hosmer-Lemeshow test is precisely that it is a goodness of fit test, and goodness of fit is really only minimally relevant for most practical purposes. The logistic model is almost always a mismatch to a real-life data generating process. If you have a sufficiently large sample, that misfit will be detected, even if the model is doing a pretty good job of matching predicted to observed probabilities. If you have a sufficiently small sample (it looks like you have around 80 observations), when you divide them into deciles, as H-L does, you will have, optimally (if there are no ties), 8 in each group--so your power to detect even substantive deviations between predicted and observed is going to be fairly low, and almost any model will pass muster on the p < 0.05 criterion.

What you really want to know is how close the model's predicted probabilities are to the observed probabilities--are they close enough for whatever practical purposes this model was designed for? So I would re-run estat, gof with the -table- option to see the observed and predicted counts in each decile. Actually, given your sample size, I would probably do -estat, gof group(4) table-, because with quartiles instead of deciles you'll have enough observations in each group to get a sense of what is going on. In particular, you will get to see whether your model is systematically over-predicting at one end and under-predicting at another--which might help you figure out a way to improve the model.

Personally, I generally disregard the p-values from the H-L test unless the sample size is in that sweet ballpark of a few hundred where statistical significance and substantive mismatch are likely to be concordant.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

06 Nov 2014, 17:32

I have to correct an error in my post: contrary to what I thought (and what was true for previous Stata versions), estat gof does work after svy binary regression models.(To answer your parenthetical question: estat gof doesn't take a svy prefix; no post-estimation command does. See the help for "svy_postestimation".) There is no longer a good reason to use svylogitgof, which is less flexible and can fail where estat gof does not.

I also agree in general with Clyde's thoughts about what are called "omnibus" tests of fit. It would help if, as the FAQ request, you show us actual logistic commands and results. If necessary for clarity, briefly describe the variables in each model. For example, I would not have recommended linktest had I known that there are four binary variables (in each model?).

Last edited by Steve Samuels; 06 Nov 2014, 17:44.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#8

26 May 2015, 16:03

Is estat gof good for the binary logistic regression (with svy), if there are only categorical variables as independent variables? And can I only interprete the output of estat gof, if I have two model for comparing (the higher Prob > F the better the model?). I have 8 categorical variable (not only binary) and about 11000 observation.
Comment
Andrew Lover

Join Date: Apr 2014

Posts: 182
#9

27 May 2015, 02:22

There's potentially a larger issue- the Hosmer-Lemeshow GOF test has been shown to be unstable based on exactly where the bin cut-points are, and Hosmer and coworkers recommend a newer test:

https://stat.duke.edu/~zo2/dropbox/goflogistic.pdf

but there's no implementation in Stata and the svy: performance are unknown. You could try using -Rsource- (SSC) and then run the Le Cessie test in R:

https://stat.ethz.ch/pipermail/r-sig...il/011534.html

__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
Comment
Andrew Lover

Join Date: Apr 2014

Posts: 182
#10

27 May 2015, 07:29

I'd be tempted to try -linktest- and -estat gof- and then take a good look at ROC curves. If everything looks consistent, then you're fine; if not, then you'll have consider your options in greater detail.

Last edited by Andrew Lover; 27 May 2015, 07:33.

__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
Comment
Baker Chowdhury

Join Date: Jul 2017

Posts: 20
#11

24 Jul 2017, 10:47

Dear Statalist members,

I would like to perform a goodness-of-fit test for logistic regression models with survey data. When I run the model for my entire sample using svy command I can do the goodness of fit test using estatgof. However, I need to do some subgroup analysis using svy,subpop command and estatgof does not work after subpopulations command. Also, I was wondering if there any command for Stata13 to plot ROC for the models with svy of svy, subpopcommands.

Thanks in advance.
Baker
Comment
Katharina Rogen

Join Date: Feb 2017

Posts: 16
#12

26 Jul 2017, 07:07

Hi all

I would like to pick up that post again. I have estimated several (svy) logit models and would like to determine the one with the best fit. I have calculated and compared the McKelvey and Zavoina's R2, the linktest results, the Hosmer-Lemeshow test and well as the lroc area under the curve.

For my main model all statistics look - as far as I can tell ok to pretty good-, however, the Hosmer-Lemeshow test is always very significant. If I change the number of groups however to a bit lower (5 instead of 10 groups) it falls insignificant. As I have a sample size larger than 1000 (n=approx. 4000) I dont think reducing the number of groups is the answer, though. I also estimated my model on a subsample of 500 cases and the Hosmer-Lemeshow test is significant as well (however, I am not quite comfortable with this result, because this sample may be heavily selected and include only very few members of subgroups).

My question now is: which statistic should I trust? Should the significant Hosmer-Lemeshow test be an indicator that something is worng with my model? I also omitted one predictor (set) at a time, but this didnt change the Hosmer-Lemeshow test results. I also added non-linearities, this did nothing either. Is there something else that could be wrong that I didnt think of before?

My syntax

PHP Code:

svy: logit $model`i' eststo m`i' linktest estat gof, group(10) fitstat qui: do "$dos\100 program post logit.do" //program that feeds the results from the svy logit in a normal logit model to be able to calculate additional tests logit_post lroc estat classification

The output

PHP Code:

mz r2: .53 ------------------------------------------------------------------------------ | Linearized t_s | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _hat | .9759862 .0544445 17.93 0.000 .8686139 1.083358 _hatsq | -.0143201 .0221965 -0.65 0.520 -.0580948 .0294546 _cons | .0188437 .0938886 0.20 0.841 -.1663179 .2040054 ------------------------------------------------------------------------------ Logistic model for t_sen_s4, goodness-of-fit test F(9,188) = 2.92 Prob > F = 0.0029 lroc: .87 sensitivity: 50.34 specificity: 95.90000000000001

Thank you & all the best,

Katharina
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4485
#13

26 Jul 2017, 07:19

first, do not reduce the number of groups for the H-L test; you need to increase them instead; guidance is given on p. 75 of Paul, P, Pennell, ML and Lemeshow, S (2013), "Standardizing the power of the Hosmer-Lemeshow goodness of fit test in large data sets," Statistics in Medicine, 32: 67-80

second, each of the summary statistics you are using assesses the model differently; you can be better on discrimination (lroc) and worse on calibration (H-L, or, better, a calibration plot), etc.; you need to decide what is important to you
1 like
Comment
Katharina Rogen

Join Date: Feb 2017

Posts: 16
#14

26 Jul 2017, 07:55

Dear Mr. Goldstein,

thank you!

I have tried increasing as well (followed the instructions of the paper) and used 130 groups instead. Here, the test gets even more significant.

But what does it mean that I am worse on calibration? Is there a way to assess where the difficulties may be stemming from (other than problematic predictors or non-linearities)? I am a bit at loss at how to imporve my model...

Thank you for your help & all the best,

Katharina
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#15

26 Jul 2017, 08:52

Run the Hosmer-Lemeshow test with the -table- option. Stata will show you the observed and expected (by the model) number of 1 outcomes in each of the 130 groups. You can then see which groups are over-predicting, which under-predicting, and which look pretty good.

All of that said, remember that the H-L test is testing the calibration of a logistic model here. Logistic models are convenient to work with statistically and are a good approximation to many things in nature. But they are hardly ever an exact description of a real-world data generating process. With a sample size of 4,000 you are in territory where the H-L is sensitive to the discrepancy between the logistic form and the real-world data generating process. It may or may not be possible to make that go away by adding/removing variables/interactions/higher-order terms. I wouldn't worry about the p-value per se. Rather, I would look at those observed and expected counts and ask whether, for practical purposes, they are close enough. Practical purposes has to be interpreted in light of the actual substance and science of your project and is not defined statistically. In any case, trying to get a non-significant H-L test from a logistic model in a large sample is often a fool's errand: if simple changes to your model don't accomplish it and if the observed-expected differences are reasonable, I wouldn't put any more effort into it.
Comment

Announcement

Goodness of fit test for logistic regression on survey data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment