Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • goodness of fit outcomes

    Dear Statalists,

    I hope you are well. I would like to ask please if I got Pearson Chi2 at a significant (estat gof) but the Hosmer test is insignificant, does this means that my model fit reasonable. In fact, I have used these tests on probit regression (11 categorical independent variables for 300 firms).

    These are the outcomes of the goodness of fit tests

    . estat gof

    Probit model for Disc_APP_NO_informal_01, goodness-of-fit test

    number of observations = 152
    number of covariate patterns = 142
    Pearson chi2(115) = 141.36
    Prob > chi2 = 0.0481



    estat gof, group(10) table

    number of observations = 152
    number of groups = 10
    Hosmer-Lemeshow chi2(8) = 11.45
    Prob > chi2 = 0.1774

    lroc

    Probit model for Disc_APP_NO_informal_01

    number of observations = 152
    area under ROC curve = 0.8294


    Can you please help in interpreting these results? Does this show an acceptable fit for the model? I have attached also graph for Iroc and lsens syntax command.

    Greatly appreciate your help and support

    Kind regards,
    Rabab
    Attached Files

  • #2
    These two statistics have nothing in common except that they both have (approximate) chi square distributions under their associated null hypothesis.

    The Pearson chi square statistic you get from -probit- has nothing to do with the fit of your model to the data. It is a test of the null hypothesis that all of the coefficients in the model are 0. I should point out that this null hypothesis is almost always, at best, a straw man, and in most circumstances doesn't even rise to that level. It is usually simply not a question anybody really wants to ask and answer. Unless that was a specific research hypothesis, you surely have better things to do with your time than ponder that statistic.

    The Hosmer-Lemeshow chi square is a test of goodness of fit. Its null hypothesis is that the observed numbers of successes and failures in each decile of predictive risk is consistent with the numers predicted by the model along with some Poisson-type noise. A low p-value on that test suggests some kind of systematic departure of the data from the predictions of the model. As with any test statistic, properly interpreting it depends on understanding the role of things like biased sampling or measurement and misclassification error, and, in very large sample, whether a departure from the predictions of the model might be due to trivialities such as the minor (and usually completely inconsequential) differences in the shape of the normal and logistic distributions. I am not a fan of hypothesis testing in general, and I think that in the context of assessing model goodness of fit the p-values are especially unhelpful. I do think, however, that if you add the -table- option to your -estat, gof group(10)- command (which you did, but you didn't show that part of the output), you will get an actual enumeration of the observed and expected results. Exploring those can give you a good sense not only of whether, in general, the model predictions are sufficiently good for the use you intend to put them to. And if they aren't, this table can also give you an understanding of where your model needs to be improved: you can see, for example, if it is overpredicting in the mid-range and under-predicting at the extremes, or whatever other pattern their might be. That, in turn, can help you figure out how you might modify the model to get a better fit. (No p-value can ever give you that kind of really valuable information. So focus on that table--that's the useful part of -estat, gof group(10) table-).

    As for the area under the ROC curve, there are no hard and fast criteria for interpreting those. A value in the vicinity of 0.8, as you have gotten, is generally regarded as reasonably good performance for a model that is being used to characterize populations, but not really good enough for classifying individuals if anything important is at stake.
    Last edited by Clyde Schechter; 12 Aug 2020, 18:17.

    Comment


    • #3
      Dear Clyde

      Thank you very much for your prompt reply and provide me with the above explanations. I am sorry but I do not know how to interpret or read the Table of Hosmer. What does the Hosmer table tell you please? it is attached within this message. I not an econometrician but doing my best to learn.


      The model consist 11 independent variable and 2 control variables
      number of observations 152
      Group Prob Obs_1 Exp_1 Obs_0 Exp_0 Total
      1 0.2032 0 2 16 14 16
      2 0.4752 6 5.9 10 10.1 16
      3 0.5447 10 7.2 4 6.8 14
      4 0.6384 11 9.1 4 5.9 15
      5 0.7368 11 10.3 4 4.7 15
      6 0.8156 10 12.6 6 3.4 16
      7 0.8737 14 12.7 1 2.3 15
      8 0.9187 13 13.5 2 1.5 15
      9 0.9853 13 14.2 2 0.8 15
      10 1 15 14.9 0 0.1 15

      number of observations = 152
      number of groups = 10
      Hosmer-Lemeshow chi2(8) = 11.45

      Prob > chi2 = 0.1774



      In addition, I confused between the ratio of the correctly classification and probability cutoff. Are they same. Should the correctly classification be greater than 0.5, is it the greater the better or what?and How about the probability cutoff what is the best result less than 0.5 or greater? I am sorry if my question seems silly but I really become confused about these statistics. I need help to please to understand them on how to interpret them


      Can I ask you please to simply interpret the following results of estat class:

      probit model for Disc_APP_NO_informal

      True --------
      Classified D ~D Total

      75 27 102
      28 76 104
      Total 103 103 206

      Classified + if predicted Pr(D) >= .5
      True D defined as Disc_APP_NO_informal != 0

      Sensitivity Pr( + D) 72.82%
      Specificity Pr( -~D) 73.79%
      Positive predictive value Pr( D +) 73.53%
      Negative predictive value Pr(~D -) 73.08%

      False + rate for true ~D Pr( +~D) 26.21%
      False - rate for true D Pr( - D) 27.18%
      False + rate for classified + Pr(~D +) 26.47%
      False - rate for classified - Pr( D -) 26.92%

      Correctly classified 73.30%

      Many thanks for kind support and help

      Thank you very much for understanding


      Kind regards,
      Rabab

      Comment


      • #4
        Dear Clyde,

        I got information about the Hosmer test from this website https://www.statisticshowto.com/hosmer-lemeshow-test/. it says that "Specifically, the HL test calculates if the observed event rates match the expected event rates in population subgroups."

        So I can understand that if the observed results of outcome 1 match with the expected result of outcome 1 means the test is fit. Is this way on how to read the Hosmer table and if the result is close to each other but not exactly the same can I accept the model?

        for instance if the observed 1 = 0 while the expected 1 = 2.0

        another example observed 1= 6 while expected 1= 5.9

        I have attached the table for Hosmer test in my previous message. what do you think the higher p-value could give a better model?


        Many thanks again for your kind efforts to help

        Kind regards,
        Rabab

        Comment


        • #5
          If you look at the numbers in the Obs_0 and Exp_0 columns, in every single row of that table the numbers are very close, never off by more than 3, and often much closer. This is an excellent fit! You will rarely see it better than that.

          Concerning the other matter, the reason you're confused about these statistics is that they are inherently confusing. The results you get from that depend on your choice of a cutoff defining a "positive" test. And, except for the sensitivity and specificity, the outputs also depend on the prevalence of true positives in the data. As such, these statistics are of dubious value. (The "percent correctly classified" is a particularly slippery statistic and is often quite misleading. Avoid even looking at it.) The choice of an appropriate cutoff for defining the positive test is a difficult matter and needs to be based on a decision analysis that takes into account the disutility of false positives and false negatives.
          The default value of 0.5 that Stata uses is rarely helpful by itself.
          If a decision analysis is not possible, it is best to run -estat classification- several times, each with different cutoff probabilities, spanning the range from low to high and impressionistically select the cutoff that seems to produce the most acceptable results. But, again, what you consider "acceptable results" ideally requires something along the lines of a decision analysis to do it right.

          The area under the ROC curve, which you got from -lroc- is a better way to assess the overall discriminatory value of the predictor than anything that comes out of -estat classification-.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            If you look at the numbers in the Obs_0 and Exp_0 columns, in every single row of that table the numbers are very close, never off by more than 3, and often much closer. This is an excellent fit! You will rarely see it better than that.

            Concerning the other matter, the reason you're confused about these statistics is that they are inherently confusing. The results you get from that depend on your choice of a cutoff defining a "positive" test. And, except for the sensitivity and specificity, the outputs also depend on the prevalence of true positives in the data. As such, these statistics are of dubious value. (The "percent correctly classified" is a particularly slippery statistic and is often quite misleading. Avoid even looking at it.) The choice of an appropriate cutoff for defining the positive test is a difficult matter and needs to be based on a decision analysis that takes into account the disutility of false positives and false negatives.
            The default value of 0.5 that Stata uses is rarely helpful by itself.
            If a decision analysis is not possible, it is best to run -estat classification- several times, each with different cutoff probabilities, spanning the range from low to high and impressionistically select the cutoff that seems to produce the most acceptable results. But, again, what you consider "acceptable results" ideally requires something along the lines of a decision analysis to do it right.

            The area under the ROC curve, which you got from -lroc- is a better way to assess the overall discriminatory value of the predictor than anything that comes out of -estat classification-.




            Dear Clyde

            Million thanks for this clarification. It became clear to me now on how to build my opinion regarding the goodness of fit.


            Greatly appreciate your kind effort to help
            Kind regards,
            Rabab

            Comment

            Working...
            X