Goodnes of fit logistic regression

Katarina Vlcakova

Join Date: Jan 2015

Posts: 6
#1

Goodnes of fit logistic regression

14 Jan 2015, 02:42

Hello everybody. I am quite new to STATA and working on survey data. I could figure most things out. I just have a simple question about the methodical background.
I have made a survey logistic regression (svy logistic) and abouve is a goodnes of fit. Normaly there should be the LR test, but in case of svy there is an F test

Number of strata = 1 Number of obs = 2622
Number of PSUs = 2622 Population size = 1883104.6
Design df = 2621
F( 29, 2593) = 8.68
Prob > F = 0.0000

Could someone please answer me and help me which test is used in such case?
Thank you
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

14 Jan 2015, 03:57

Hello Katarina,

Maybe you will want to type "help svy postestimation" in Stata.

Also, in pages 44 to 45 of the Stata Survey Data Reference Manual (for Stata 13) you will find a comment on a case similar to yours, plus many examples worth seeing throughout the whole manual.

In short, by seeing the results you got after "estat gof" (p < 0.05), and quoting the text of the manual: "the F statistic is significant at the 5% level, indicating that the model is not a good fit for these data".

Best,

Marcos

Best regards,

Marcos
Comment
Katarina Vlcakova

Join Date: Jan 2015

Posts: 6
#3

15 Jan 2015, 02:07

This is the test where the null hypothesis is that beta are equal 0.estat gof has a different hypothesis and the model fits according to what you say :-) I was just wondering... I loocked through the manual and couldnt find any good theoretical background... maybe I have overseen it. The test should be an equivalent for Wald's test or Likelihood ratio test... I am wondering which F-test it is...
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17856
#4

15 Jan 2015, 02:15

Katarina:
you may want to take a look at -help svy postestimation- and related entry in Stata 13.1 .pdf manual.

Kind regards,
Carlo
(Stata 19.0)
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#5

27 May 2015, 16:09

Hello everyone. Can we say that, the higher the p-value , the better the model is when using svy: logit with estat gof ? So if I have to choose between to models, I will choose that one with the higher p-value ? in stata 13.1 manual it is just wirtten that if the p-value <0.05 than our model is not good.

Last edited by Anshul Anand; 27 May 2015, 16:12.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30358
#6

27 May 2015, 16:34

Choosing a model on the basis of an overall goodness of fit statistic is usually a bad idea, no matter which statistic, and no matter what rule you apply to it.

The most important thing you need to think about is what purpose you will use your model for. Then you need to figure out what aspects of fit are most important for that use. For example, depending on what you are doing, it may be most important to accurately identify people with a low probability of the outcome. In some other application, the total number of correct classifications may be the most important thing. In some other application, an integrated measure of fit (like the Hosmer-Lemeshow) may be appropriate.

There is no automated, value-free, purpose-free way to select a "best" model. Optimality is always relative to some particular loss function that reflects the decisions that will be made using model and the consequences of getting those decisions right or wrong.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

27 May 2015, 20:12

Anshul Anand: You asked this in the middle of another old thread: http://www.statalist.org/forums/foru...on-survey-data

Asking a question twice is very poor Statallist etiquette-it starts parallel discussions, but the posters in one thread may not be aware of the other thread. Please don't do this again.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#8

28 May 2015, 03:06

I am very sorry for that Steve Samuels! Wouldn't happen again!

Thank you Clyde Schechter for your answer! But I am not sure if I am doing the right thing in my case: I only included the most important variable in my binary logistic regression (two regressions), which according to the theory should have an effect on the outcome. To be more precisely, I am looking which variables increases/decreases the probability of women being in a female occupation (first regression) and the probability of men being in a male occupation (second regression). Because this independent variables should have an effect on the outcome according to the theory, I can't neglect them. The only thing I can do is for looking if I should include interaction term. So shouldn't give me the goodness of fit, estat gof (because I am using xi:svy:logit y x...) the information whether I should include the interaction term or not? like if both model had a p-value above 5%, then should I take the one, which has a higher p-value?
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#9

29 May 2015, 04:38

According to this site on page 1: http://www.stata.com/manuals13/restatgof.pdf I can only use estat gof if my sample is weighted with fweights and not pweight. And my logistic regression is weighted with pweight. But what''s the difference of the output of svy: logit and the output I receive after doing estat gof? Both have included the F-statistic. The output I have is:

only with svy: logit: F (27, 11329) =25.14
Prob > F: 0.0000

with estat gof: F(9, 11347) = 0.69
Prob > F: 0.7175

So in the first one, the model is not good because F-statistic is signifikant and according do estat gof it is not a bad model because itsF-statistic is insignificant.

Last edited by Anshul Anand; 29 May 2015, 05:32.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30358
#10

29 May 2015, 08:50

The two have almost nothing to do with each other. The first F-test you show is the F-test of the null hypothesis that all of the coefficients in the logistic model are zero. It is a test, if you will, of your model versus a model with no predictors at all: just predict for all comers the overall probability of success observed in the sample. So your model is better than nothing; not a very high bar. Given your large sample size, it would be hard to come up with a model that wasn't better than nothing.

The second F-test you show is from the Hosmer-Lemeshow goodness of fit test. It shows that overall there is a reasonably good calibration between your model's predicted probabilities and the observed outcomes, at least when grouped into deciles of the former. That's very nice, but it may or may not mean that your model is suitable for the particular purpose you intend to apply it to (whatever that may be), and it is not a basis for picking this particular model over some other one.

I don't work with complex survey designs often, so I was not aware that -estat gof- doesn't support pweights. But I would take the manual at its word on that. And when Stata Corp. doesn't support something in a command, in my experience, there is always a very good reason. So that would seem to rule out applying -estat gof- to your problem in any case.
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#11

29 May 2015, 09:35

Firstly, Thank you very much for the answer! But something is not clear, a bit confused. If the first F-test is the f-test of the null hypothesis that all of the coefficients in the logistic model are zero, and it gives me a p-value under 5%, shouldn't that be bad? Because I thought, that we want to reject the nullhypothesis that all the coefficients in the model are zero? Did I unterstand it right, that 25.14 is the overall probability of success observed in the sample as you said?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30358
#12

29 May 2015, 09:44

No. The first F-test show p = 0.0000, which is highly significant, and is presumably a good thing (unless you were hoping to show that your model is useless). So you do reject the null hypothesis that all your coefficients are zero. Which means that your prediction model is doing better than a model with no predictors at all. That probability is not 25.14. First of all, probabilities range between 0 and 1. Second of all, that 25.14 is the value of the F statistic that was used to calculate that p-value. That overall predicted probability is not part of the logistic regression output from your model.
1 like
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#13

29 May 2015, 09:46

I lost track of this thread for a while. Only a few minutes ago did I come across the last messages.

This is to say that, when I commented on #2 about the interpretation of the F statistic after typing - estat gof - , well, I meant exactly the goodness of fit test, that is, contrary to what was point on #3, not at all the (overall) F test after typing - logit -.

Kind regards,

Marcos

Best regards,

Marcos
1 like
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#14

29 May 2015, 10:25

thanks a lot to both!!! Now I understand it So it's just to see if it's better than a model without predictors. But If I want to compare two models, one without interaction term and the other with interaction term, would be a Wald-test good to see, whether I should include the interaction term or not?
Comment

Announcement

Goodnes of fit logistic regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment