Prob > chi2 discrepancy when running logistic model versus estat gof

Arielle Bell

Join Date: Nov 2022

Posts: 1
#1

Prob > chi2 discrepancy when running logistic model versus estat gof

01 Nov 2022, 21:39

Hi,

Thank you in advance for your support! I'm trying to understand if/how my model is better if I remove two variables. When I run my model, I see the Prob > chi2 in bold here below:
logistic ab_ph_bp_high i.cn_agecat z_gender_n i.c_wi_quintiles i.c_sp_fr_cat c_sp_whodas_norm i.c_mh_phq_ca
> t_n hu_bp_m hu_bp_td hu_drg_any

Logistic regression Number of obs = 2,804
LR chi2(16) = 161.41
Prob > chi2 = 0.0000
Log likelihood = -1511.2756 Pseudo R2 = 0.0507
It is statistically significant, which I thought was a good sign. However, when I run estat gof on the same model, I get a *different* value:
. estat gof

Goodness-of-fit test after logistic model
Variable: ab_ph_bp_high

Number of observations = 2,804
Number of covariate patterns = 2,164
Pearson chi2(2147) = 2176.84
Prob > chi2 = 0.3215
This is not statistically significant, which I have read is what you want from the estat gof output. I'm confused how the model can have two different prob > chi2 values though depending on which command I use to call it in Stata?

Thanks for your help!

Arielle
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17710

02 Nov 2022, 02:28

Arielle:
welcome to this forum.
The two statistics measures different features of the model:
1) pseudo R2, when statistical signiifcanc, is telling you that, given the set of predictors, a logistic/logit regression is more informative than a constant-only model:

Code:

. use "C:\Program Files\Stata17\ado\base\a\auto.dta"
(1978 automobile data)

. logit foreign

Iteration 0:   log likelihood =  -45.03321  
Iteration 1:   log likelihood =  -45.03321  

Logistic regression                                     Number of obs =     74
                                                        LR chi2(0)    =   0.00
                                                        Prob > chi2   =      .
Log likelihood = -45.03321                              Pseudo R2     = 0.0000

------------------------------------------------------------------------------
     foreign | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _cons |  -.8602013   .2543331    -3.38   0.001    -1.358685   -.3617176
------------------------------------------------------------------------------

. logit foreign price

Iteration 0:   log likelihood =  -45.03321  
Iteration 1:   log likelihood = -44.947363  
Iteration 2:   log likelihood =  -44.94724  
Iteration 3:   log likelihood =  -44.94724  

Logistic regression                                     Number of obs =     74
                                                        LR chi2(1)    =   0.17
                                                        Prob > chi2   = 0.6784
Log likelihood = -44.94724                              Pseudo R2     = 0.0019

------------------------------------------------------------------------------
     foreign | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       price |   .0000353   .0000844     0.42   0.676    -.0001301    .0002006
       _cons |  -1.079792   .5878344    -1.84   0.066    -2.231927    .0723419
------------------------------------------------------------------------------

. di 1-( -44.94724/-45.03321)
.00190904

.

2) -estat go- calls the Pearson chi2 goodness-of-fit test, that tests "...the observed against expected number of responses using cells defined by the covariate patterns." (see -estat gof- entry, Stata .pdf manual).
As your statistic does not reject the null, your model is reasonably well specified

Kind regards,
Carlo
(Stata 19.0)

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10195

02 Nov 2022, 02:51

The only thing common between the two results that you show is that the test statistics follow a \(\chi^2\) distribution under the null hypotheses. The likelihood ratio (LR) test at the head of the table has the null hypothesis that the maximized likelihood of the full model is the same as that of the model with intercept only vs. the alternative hypothesis of inequality. A replication of the test by storing the estimates of the intercept-only and full model and using lrtest is the following

Code:

webuse lbw, clear
logit low
est sto m1
logit low lwt smoke i.race ui
est sto m2
lrtest m1 m2

Res.:

Code:

. logit low lwt smoke i.race ui

Iteration 0:   log likelihood =   -117.336  
Iteration 1:   log likelihood =  -106.2426  
Iteration 2:   log likelihood = -105.96551  
Iteration 3:   log likelihood = -105.96529  
Iteration 4:   log likelihood = -105.96529  

Logistic regression                             Number of obs     =        189
                                                LR chi2(5)        =      22.74
                                                Prob > chi2       =     0.0004
Log likelihood = -105.96529                     Pseudo R2         =     0.0969

------------------------------------------------------------------------------
         low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         lwt |  -.0119539   .0063357    -1.89   0.059    -.0243717    .0004638
       smoke |   1.029195    .381993     2.69   0.007      .280502    1.777887
             |
        race |
      black  |   1.309936   .5096596     2.57   0.010     .3110214     2.30885
      other  |   .9561105   .4172589     2.29   0.022      .138298    1.773923
             |
          ui |   .7785572   .4391435     1.77   0.076    -.0821481    1.639263
       _cons |  -.3897565   .8977654    -0.43   0.664    -2.149344    1.369831
------------------------------------------------------------------------------

.
. est sto m2

.
. lrtest m1 m2

Likelihood-ratio test                                 LR chi2(5)  =     22.74
(Assumption: m1 nested in m2)                         Prob > chi2 =    0.0004

The logic is similar to the F-test of joint significance of the regressors in linear regression. If the added regressors do not improve the likelihood of the model with intercept-only, then they do not add anything. On the other hand, estat gof implements the Pearson goodness-of-fit test (i.e. observed against expected number of responses using cells defined by the covariate patterns). A rejection of the null hypothesis indicates a poor fit. See the PDF manual entry of the test for a more detailed discussion and some other options available which enable you to compute the Hosmer–Lemeshow test using the command: https://www.stata.com/manuals/restatgof.pdf

Note: Crossed with #2 which makes the same points.

Last edited by Andrew Musau; 02 Nov 2022, 03:03.

Announcement