Goodness-of-fit test after logistic model

Hm Saleh

Join Date: Apr 2022

Posts: 59
#1

Goodness-of-fit test after logistic model

25 Oct 2023, 16:56

Dear Statalist,

I hope you are well. I have performed probit analysis in the Stata software the following regression:
The below-highlighted variables are the only significant variables in the model.
logit LIVE_BIRTHS ib3.Marital_Status HOUSEHOLD_INCOME Femaleearningsshareofhouseh AGE age2 age3 ib2.HISPANIC HISPANIC_BORN_OUT_THE_US i.White_NON o.Asian_NON i.Other_NON i.Black_NON i.EDUCATION

Number of obs = 3,981
LR chi2(16) = 1731.89
Prob > chi2 = 0.0000
Pseudo R2 = 0.3149
Log likelihood = -1884.3771

I would like to ask, please, if the value of prob > chi2 = 0.0000 is accepted to report that 'overall the model is significant' or not. Can I accept this constructed mode based on the outcomes of the goodness-of-fit test?
For more clarification, I got the following findings to present the goodness of fit for the model:

estat gof

Goodness-of-fit test after logistic model
Variable: LIVE_BIRTHS
Number of observations = 3,981
Number of covariate patterns = 3,883
Pearson chi2(3866) = 3870.90
Prob > chi2 = 0.4748

Many thanks for your support

Kind regards,

Last edited by Hm Saleh; 25 Oct 2023, 17:14.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30107
#2

25 Oct 2023, 17:55

I would like to ask, please, if the value of prob > chi2 = 0.0000 is accepted to report that 'overall the model is significant' or not.

Well, yes, but this is hardly ever worth reporting. The meaning of "overall the model is significant" is that the null hypothesis that all of the coefficients are zero can be rejected. That null hypothesis is almost never of any interest.

estat gof

Goodness-of-fit test after logistic model
Variable: LIVE_BIRTHS
Number of observations = 3,981
Number of covariate patterns = 3,883
Pearson chi2(3866) = 3870.90
Prob > chi2 = 0.4748

This is the Pearson goodness of fit test. With this test, a low p-value is evidence of poor fit of the model. A high p-value such as the one you have is suggestive of satisfactory fit (or, more accurately, is lack of evidence of poor fit). In a sample of this size, this is actually a pretty result. By the way, do not confuse goodness of fit with overall significance of the model; they have nothing to do with each other.

That said, I think the Hosmer-Lemeshow goodness of fit test is more widely used. The command for that is -estat gof, group(10)-, and the interpretation of the result is similar. In my own work, I usually also add the -table- option, which enables me to see the observed and expected outcomes in each of the deciles of predicted risk. This gives me a sense of where along the logistic curve the fit is closest and where it is farthest away, which can be helpful for going on to improve the model.
Comment
Hm Saleh

Join Date: Apr 2022

Posts: 59
#3

26 Oct 2023, 17:11

Thank you Clyde Schechter for replying to me.
This has been done for another dataset

Code:

estat gof

Number of observations = 26,871
Number of groups = 10
Hosmer–Lemeshow chi2(8) = 132.83
Prob > chi2 = 0.0000

The low p-value (0.0000) indicates a significant lack of fit. This means that the predicted probabilities from my model do not closely match the actual outcomes in the data.

How can I treat this?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30107
#4

26 Oct 2023, 17:38

With a sample size this large, you should probably ignore the result. The problem is simply that with N = 26,871, a "significant" lack of fit can be a small discrepancy that is of no practical importance.

I recommend you re-run this with the table option. And given the large sample size, I would also use more than 10 groups. Probably more like 50 or even 100. Then you can get a sense, looking at the tabled output, of how far off the fit really is. There is an excellent chance that the differences between observed and estimated will be negligibly small, notwithstanding the "significance" of the test. Statistical significance is a troublesome concept in almost any situation, and this is one of the situations where it is really quite misleading.
1 like
Comment
Hm Saleh

Join Date: Apr 2022

Posts: 59
#5

26 Oct 2023, 19:29

Hi Clyde Schechter,
I have done this but it is still Prob > chi2 = 0.0000

Number of observations = 29,657
Number of groups = 125
Hosmer–Lemeshow chi2(123) = 273.65
Prob > chi2 = 0.0000

If I ignore this result, what is the alternative? Do you think the use of the Bootstrap technique is a good idea?

Best,
Comment

Dart Stater

Join Date: Apr 2023
Posts: 18

27 Oct 2023, 00:49

may also be of interest:

Code:

. estat gof

Logistic model for crd, goodness-of-fit test

       number of observations =       314
 number of covariate patterns =       288
            Pearson chi2(273) =       291.80
                  Prob > chi2 =         0.2075

The same model after Poisson

Code:

. estat gof

         Goodness-of-fit chi2  =  162.5511
         Prob > chi2(299)      =    1.0000

Can the result of gof-testing be treated in favor of Poisson regression?

Comment

Maarten Buis

Join Date: Mar 2014

Posts: 3458
#7

27 Oct 2023, 01:34

Hm Saleh a model is defined as a simplification of reality, and simplification is just another word for "wrong in some useful way". For example, we can think of the number 3.14 as a model for the number $\pi$. It is wrong in the sense that it leaves a lot (I would argue that infinite is a lot) of digits out, but it is useful in the sense that it allows us to focus on the most important digits. So now we have a first result:

Result 1:A model cannot be true. If a model where true, it would not longer be a simplification and thus cease to be a model.

Whether 3.14 is a good enough model for $\pi$ depends on how you want to use it; for some applications it is just fine, for others you need a lot more digits. So, here we have a second result:

Result 2: The choice of whether a model is good enough cannot be absolute; it has to be relative to what you want to do with that model.

So what does that mean for goodness of fit tests? Simple: they are absolute rubbish. They test a hypothesis that we already know is false and they don't give us a metric that we can use to assess whether the model is good enough for our purpose. Instead we need a descriptive analysis of how good the fit is, and than your subjective assessment of whether that is good enough or not. That is why Clyde Schechter suggested using the table option after estat gov. This gives you the expected and observed counts, and that is one way to describe the fit that could be relevant for your application. If that is the case then you can compare them, and make up your mind whether you think that that is good enough for your application.

Bootstrap solves a completely different problem, so you can leave that out.

Dart Stater no, the p-value is not a relevant metric to decide whether a model is good enough.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment
Dart Stater

Join Date: Apr 2023

Posts: 18
#8

27 Oct 2023, 03:59

Originally posted by Maarten Buis View Post

no, the p-value is not a relevant metric to decide whether a model is good enough.

Thanks for the answer - I agree that p-value is not perfect metric. However, p value is actually the one criterion widely used to decide if the model fit is enough. Otherwise, all tests should be treated as non-informative.
Using the Table option, we immerse ourself deep in the sea of doubts - because we move from single criterion of decision to multiple criteria, which can be contradictory (somewhat better here/somewhat worse there)
Meanwhile, a solid conclusion must be made, if we done with the fit, or the model should be subsequently improved.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#9

27 Oct 2023, 05:28

Remember what a p-value is: It is the probability of drawing the data you have seen if the null hypothesis is true. How does that answer the question: Is my model good enough? Answer: it does not. If you use that p-value with a artificial cut-off point, you get an answer, but you could just as well (or probably better) have gone to the zoo and ask the local chimpanzee. P-values are pretty bad in general, combined with an artificial cut-off point awfully bad, but anyone who applies them to model selection should be boiled in oil before being (politely) asked to leave the profession.

False confidence is a lot worse than correct confusion.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment

Announcement

Goodness-of-fit test after logistic model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment