Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Prob > chi2 discrepancy when running logistic model versus estat gof

    Hi,

    Thank you in advance for your support! I'm trying to understand if/how my model is better if I remove two variables. When I run my model, I see the Prob > chi2 in bold here below:
    logistic ab_ph_bp_high i.cn_agecat z_gender_n i.c_wi_quintiles i.c_sp_fr_cat c_sp_whodas_norm i.c_mh_phq_ca
    > t_n hu_bp_m hu_bp_td hu_drg_any

    Logistic regression Number of obs = 2,804
    LR chi2(16) = 161.41
    Prob > chi2 = 0.0000
    Log likelihood = -1511.2756 Pseudo R2 = 0.0507
    It is statistically significant, which I thought was a good sign. However, when I run estat gof on the same model, I get a *different* value:
    . estat gof

    Goodness-of-fit test after logistic model
    Variable: ab_ph_bp_high

    Number of observations = 2,804
    Number of covariate patterns = 2,164
    Pearson chi2(2147) = 2176.84
    Prob > chi2 = 0.3215
    This is not statistically significant, which I have read is what you want from the estat gof output. I'm confused how the model can have two different prob > chi2 values though depending on which command I use to call it in Stata?

    Thanks for your help!

    Arielle

  • #2
    Arielle:
    welcome to this forum.
    The two statistics measures different features of the model:
    1) pseudo R2, when statistical signiifcanc, is telling you that, given the set of predictors, a logistic/logit regression is more informative than a constant-only model:
    Code:
    . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
    (1978 automobile data)
    
    . logit foreign
    
    Iteration 0:   log likelihood =  -45.03321  
    Iteration 1:   log likelihood =  -45.03321  
    
    Logistic regression                                     Number of obs =     74
                                                            LR chi2(0)    =   0.00
                                                            Prob > chi2   =      .
    Log likelihood = -45.03321                              Pseudo R2     = 0.0000
    
    ------------------------------------------------------------------------------
         foreign | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           _cons |  -.8602013   .2543331    -3.38   0.001    -1.358685   -.3617176
    ------------------------------------------------------------------------------
    
    . logit foreign price
    
    Iteration 0:   log likelihood =  -45.03321  
    Iteration 1:   log likelihood = -44.947363  
    Iteration 2:   log likelihood =  -44.94724  
    Iteration 3:   log likelihood =  -44.94724  
    
    Logistic regression                                     Number of obs =     74
                                                            LR chi2(1)    =   0.17
                                                            Prob > chi2   = 0.6784
    Log likelihood = -44.94724                              Pseudo R2     = 0.0019
    
    ------------------------------------------------------------------------------
         foreign | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           price |   .0000353   .0000844     0.42   0.676    -.0001301    .0002006
           _cons |  -1.079792   .5878344    -1.84   0.066    -2.231927    .0723419
    ------------------------------------------------------------------------------
    
    . di 1-( -44.94724/-45.03321)
    .00190904
    
    .
    2) -estat go- calls the Pearson chi2 goodness-of-fit test, that tests "...the observed against expected number of responses using cells defined by the covariate patterns." (see -estat gof- entry, Stata .pdf manual).
    As your statistic does not reject the null, your model is reasonably well specified
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      The only thing common between the two results that you show is that the test statistics follow a \(\chi^2\) distribution under the null hypotheses. The likelihood ratio (LR) test at the head of the table has the null hypothesis that the maximized likelihood of the full model is the same as that of the model with intercept only vs. the alternative hypothesis of inequality. A replication of the test by storing the estimates of the intercept-only and full model and using lrtest is the following

      Code:
      webuse lbw, clear
      logit low
      est sto m1
      logit low lwt smoke i.race ui
      est sto m2
      lrtest m1 m2
      Res.:

      Code:
      . logit low lwt smoke i.race ui
      
      Iteration 0:   log likelihood =   -117.336  
      Iteration 1:   log likelihood =  -106.2426  
      Iteration 2:   log likelihood = -105.96551  
      Iteration 3:   log likelihood = -105.96529  
      Iteration 4:   log likelihood = -105.96529  
      
      Logistic regression                             Number of obs     =        189
                                                      LR chi2(5)        =      22.74
                                                      Prob > chi2       =     0.0004
      Log likelihood = -105.96529                     Pseudo R2         =     0.0969
      
      ------------------------------------------------------------------------------
               low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               lwt |  -.0119539   .0063357    -1.89   0.059    -.0243717    .0004638
             smoke |   1.029195    .381993     2.69   0.007      .280502    1.777887
                   |
              race |
            black  |   1.309936   .5096596     2.57   0.010     .3110214     2.30885
            other  |   .9561105   .4172589     2.29   0.022      .138298    1.773923
                   |
                ui |   .7785572   .4391435     1.77   0.076    -.0821481    1.639263
             _cons |  -.3897565   .8977654    -0.43   0.664    -2.149344    1.369831
      ------------------------------------------------------------------------------
      
      .
      . est sto m2
      
      .
      . lrtest m1 m2
      
      Likelihood-ratio test                                 LR chi2(5)  =     22.74
      (Assumption: m1 nested in m2)                         Prob > chi2 =    0.0004

      The logic is similar to the F-test of joint significance of the regressors in linear regression. If the added regressors do not improve the likelihood of the model with intercept-only, then they do not add anything. On the other hand, estat gof implements the Pearson goodness-of-fit test (i.e. observed against expected number of responses using cells defined by the covariate patterns). A rejection of the null hypothesis indicates a poor fit. See the PDF manual entry of the test for a more detailed discussion and some other options available which enable you to compute the Hosmer–Lemeshow test using the command: https://www.stata.com/manuals/restatgof.pdf

      Note: Crossed with #2 which makes the same points.
      Last edited by Andrew Musau; 02 Nov 2022, 03:03.

      Comment

      Working...
      X