Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Goodness-of-fit test after logistic model

    Dear Statalist,

    I hope you are well. I have performed probit analysis in the Stata software the following regression:
    The below-highlighted variables are the only significant variables in the model.
    logit LIVE_BIRTHS ib3.Marital_Status HOUSEHOLD_INCOME Femaleearningsshareofhouseh AGE age2 age3 ib2.HISPANIC HISPANIC_BORN_OUT_THE_US i.White_NON o.Asian_NON i.Other_NON i.Black_NON i.EDUCATION

    Number of obs = 3,981
    LR chi2(16) = 1731.89
    Prob > chi2 = 0.0000
    Pseudo R2 = 0.3149
    Log likelihood = -1884.3771


    I would like to ask, please, if the value of prob > chi2 = 0.0000 is accepted to report that 'overall the model is significant' or not. Can I accept this constructed mode based on the outcomes of the goodness-of-fit test?
    For more clarification, I got the following findings to present the goodness of fit for the model:

    estat gof

    Goodness-of-fit test after logistic model
    Variable: LIVE_BIRTHS
    Number of observations = 3,981
    Number of covariate patterns = 3,883
    Pearson chi2(3866) = 3870.90
    Prob > chi2 = 0.4748

    Many thanks for your support

    Kind regards,
    Last edited by Hm Saleh; 25 Oct 2023, 17:14.

  • #2
    I would like to ask, please, if the value of prob > chi2 = 0.0000 is accepted to report that 'overall the model is significant' or not.
    Well, yes, but this is hardly ever worth reporting. The meaning of "overall the model is significant" is that the null hypothesis that all of the coefficients are zero can be rejected. That null hypothesis is almost never of any interest.

    estat gof

    Goodness-of-fit test after logistic model
    Variable: LIVE_BIRTHS
    Number of observations = 3,981
    Number of covariate patterns = 3,883
    Pearson chi2(3866) = 3870.90
    Prob > chi2 = 0.4748
    This is the Pearson goodness of fit test. With this test, a low p-value is evidence of poor fit of the model. A high p-value such as the one you have is suggestive of satisfactory fit (or, more accurately, is lack of evidence of poor fit). In a sample of this size, this is actually a pretty result. By the way, do not confuse goodness of fit with overall significance of the model; they have nothing to do with each other.

    That said, I think the Hosmer-Lemeshow goodness of fit test is more widely used. The command for that is -estat gof, group(10)-, and the interpretation of the result is similar. In my own work, I usually also add the -table- option, which enables me to see the observed and expected outcomes in each of the deciles of predicted risk. This gives me a sense of where along the logistic curve the fit is closest and where it is farthest away, which can be helpful for going on to improve the model.

    Comment


    • #3
      Thank you Clyde Schechter for replying to me.
      This has been done for another dataset

      Code:
      estat gof
      Number of observations = 26,871
      Number of groups = 10
      Hosmer–Lemeshow chi2(8) = 132.83
      Prob > chi2 = 0.0000

      The low p-value (0.0000) indicates a significant lack of fit. This means that the predicted probabilities from my model do not closely match the actual outcomes in the data.

      How can I treat this?

      Comment


      • #4
        With a sample size this large, you should probably ignore the result. The problem is simply that with N = 26,871, a "significant" lack of fit can be a small discrepancy that is of no practical importance.

        I recommend you re-run this with the table option. And given the large sample size, I would also use more than 10 groups. Probably more like 50 or even 100. Then you can get a sense, looking at the tabled output, of how far off the fit really is. There is an excellent chance that the differences between observed and estimated will be negligibly small, notwithstanding the "significance" of the test. Statistical significance is a troublesome concept in almost any situation, and this is one of the situations where it is really quite misleading.

        Comment


        • #5
          Hi Clyde Schechter,
          I have done this but it is still
          Prob > chi2 = 0.0000

          Number of observations = 29,657
          Number of groups = 125
          Hosmer–Lemeshow chi2(123) = 273.65
          Prob > chi2 = 0.0000

          If I ignore this result, what is the alternative? Do you think the use of the Bootstrap technique is a good idea?

          Best,

          Comment


          • #6
            may also be of interest:

            Code:
            . estat gof
            
            Logistic model for crd, goodness-of-fit test
            
                   number of observations =       314
             number of covariate patterns =       288
                        Pearson chi2(273) =       291.80
                              Prob > chi2 =         0.2075
            The same model after Poisson

            Code:
            . estat gof
            
                     Goodness-of-fit chi2  =  162.5511
                     Prob > chi2(299)      =    1.0000
            Can the result of gof-testing be treated in favor of Poisson regression?

            Comment


            • #7
              Hm Saleh a model is defined as a simplification of reality, and simplification is just another word for "wrong in some useful way". For example, we can think of the number 3.14 as a model for the number $\pi$. It is wrong in the sense that it leaves a lot (I would argue that infinite is a lot) of digits out, but it is useful in the sense that it allows us to focus on the most important digits. So now we have a first result:

              Result 1:A model cannot be true. If a model where true, it would not longer be a simplification and thus cease to be a model.
              Whether 3.14 is a good enough model for $\pi$ depends on how you want to use it; for some applications it is just fine, for others you need a lot more digits. So, here we have a second result:

              Result 2: The choice of whether a model is good enough cannot be absolute; it has to be relative to what you want to do with that model.
              So what does that mean for goodness of fit tests? Simple: they are absolute rubbish. They test a hypothesis that we already know is false and they don't give us a metric that we can use to assess whether the model is good enough for our purpose. Instead we need a descriptive analysis of how good the fit is, and than your subjective assessment of whether that is good enough or not. That is why Clyde Schechter suggested using the table option after estat gov. This gives you the expected and observed counts, and that is one way to describe the fit that could be relevant for your application. If that is the case then you can compare them, and make up your mind whether you think that that is good enough for your application.

              Bootstrap solves a completely different problem, so you can leave that out.

              Dart Stater no, the p-value is not a relevant metric to decide whether a model is good enough.
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                Originally posted by Maarten Buis View Post
                no, the p-value is not a relevant metric to decide whether a model is good enough.
                Thanks for the answer - I agree that p-value is not perfect metric. However, p value is actually the one criterion widely used to decide if the model fit is enough. Otherwise, all tests should be treated as non-informative.
                Using the Table option, we immerse ourself deep in the sea of doubts - because we move from single criterion of decision to multiple criteria, which can be contradictory (somewhat better here/somewhat worse there)
                Meanwhile, a solid conclusion must be made, if we done with the fit, or the model should be subsequently improved.

                Comment


                • #9
                  Remember what a p-value is: It is the probability of drawing the data you have seen if the null hypothesis is true. How does that answer the question: Is my model good enough? Answer: it does not. If you use that p-value with a artificial cut-off point, you get an answer, but you could just as well (or probably better) have gone to the zoo and ask the local chimpanzee. P-values are pretty bad in general, combined with an artificial cut-off point awfully bad, but anyone who applies them to model selection should be boiled in oil before being (politely) asked to leave the profession.

                  False confidence is a lot worse than correct confusion.
                  ---------------------------------
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz
                  Germany
                  http://www.maartenbuis.nl
                  ---------------------------------

                  Comment

                  Working...
                  X