Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GOF of Logit Model: Pearson's chi2, Hosmer and Lemeshow's test

    Hi everyone,

    I am using a logit model (attached below) to investigate the impact of minority status of borrowers on the loan approval probability, but both the Pearson's chi2 and HL test indicated a poor gof.

    So I have the following questions,

    1) Is the poor gof caused by the large sample, which is in a size of 2,491,476 ? I think my model has already included a rich set of controls that are in appropriate forms because I followed the controls recent studies used.

    2) Despite the poor gof from Pearson and HL, the "percent correctly predicted" of the model is around 87%, which is very high. Can I regard my model as very predictive even though the poor gof from Pearson and HL?

    Thanks!
    Lei


    The following is the test result:

    I used Pearson's chi2 to exam the gof of the model and got :
    Number of observations = 2,491,476
    Number of covariate patterns = 1,636,678
    Pearson chi2(1636649) = 2.48e+06
    Prob > chi2 = 0.0000

    which indicates a poor gof for the model.

    In addition, I used HL test to exam the gof and got:
    Number of observations = 2,491,476
    Number of groups = 10
    Hosmer–Lemeshow chi2(8) = 260.64
    Prob > chi2 = 0.0000

    which also indicates a poor gof. But look at the table below, the observed and expected cell frequencies in each group are in very good agreement, at this point, I think the model's gof should be good.

    Table collapsed on quantiles of estimated probabilities
    +-----------------------------------------------------------------+
    | Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |
    |-------+--------+--------+----------+--------+----------+--------|
    | 1 | 0.6193 | 77335 | 77379.4 | 171813 | 171768.6 | 249148 |
    | 2 | 0.7471 | 170945 | 172131.2 | 78204 | 77017.8 | 249149 |
    | 3 | 0.8465 | 200800 | 198841.8 | 48346 | 50304.2 | 249146 |
    | 4 | 0.8855 | 216880 | 216712.1 | 32268 | 32435.9 | 249148 |
    | 5 | 0.9037 | 223495 | 223061.9 | 25652 | 26085.1 | 249147 |
    |-------+--------+--------+----------+--------+----------+--------|
    | 6 | 0.9166 | 227089 | 226821.4 | 22059 | 22326.6 | 249148 |
    | 7 | 0.9275 | 229835 | 229762.4 | 19314 | 19386.6 | 249149 |
    | 8 | 0.9378 | 232215 | 232373.8 | 16932 | 16773.2 | 249147 |
    | 9 | 0.9492 | 234556 | 235019.0 | 14591 | 14128.0 | 249147 |
    | 10 | 0.9900 | 237511 | 238557.9 | 11636 | 10589.1 | 249147 |
    +-----------------------------------------------------------------+


    The following is the logit model, with approval decision as the outcome variable, and a set of explanatory variables which are either dummy or continuous variables, there is no interaction or squared term:

    logit approval income_w dti20 dti20_30 dti30_36 dti36_49 dti50_60 fico680_699 fico700_719 fico720_739 ltv80 ltv80_85 ltv85_90 ltv90_95 origination_2019 refinance minority female age62 lender_top100 shadowbank fintech aus tract_minority_population_percen tract_owner_occupied_units tract_one_to_four_family_homes tract_median_age_of_housing_unit cra fhfa_index

    Here is the sample data, I divided it into two parts, due to the variables number limited by dataex:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float approval long income_w float(dti20 dti20_30 dti30_36 dti36_49 dti50_60 fico680_699 fico700_719 fico720_739 ltv80 ltv80_85 ltv85_90 ltv90_95 origination_2019 refinance)
    1 208 0 0 1 0 0 1 0 0 0 0 0 0 1 0
    1 190 0 0 0 1 0 1 0 0 0 0 0 0 1 0
    1 132 0 0 0 1 0 1 0 0 0 0 0 0 1 0
    1 127 0 0 0 1 0 1 0 0 0 0 0 0 1 0
    1 171 0 0 0 1 0 1 0 0 0 0 0 0 0 0
    1 125 0 0 0 1 0 1 0 0 0 0 0 0 1 0
    1 152 0 0 0 1 0 1 0 0 0 0 0 0 0 0
    1 150 0 0 0 1 0 0 1 0 0 0 0 0 1 0
    1 208 0 0 0 1 0 0 1 0 0 0 0 0 1 0
    1 208 0 0 0 1 0 1 0 0 0 0 0 0 1 0
    end
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(minority female age62 lender_top100 shadowbank fintech aus tract_minority_population_percen) int(tract_owner_occupied_units tract_one_to_four_family_homes) byte tract_median_age_of_housing_unit float cra double fhfa_index
    0 0 0 0 0 0 1 46.07 13975 15386  8 0  4.47
    0 0 0 0 0 0 1 46.07 13975 15386  8 0  4.47
    0 0 0 0 0 0 1 46.07 13975 15386  8 0  4.47
    0 0 0 0 0 0 1 46.07 13975 15386  8 0  4.47
    0 0 0 0 0 0 1 46.07 13975 15386  8 0  5.11
    0 0 0 0 0 0 1 46.07 13975 15386  8 0  4.47
    0 0 0 0 0 0 1 46.07 13975 15386  8 0  5.11
    0 0 0 0 0 0 1 11.43  6612  7636 12 0 11.99
    0 0 0 0 0 0 1  3.55  6004  6742 12 0  5.76
    0 1 0 0 0 0 1 34.96  6938  8788 13 0  6.11
    end






  • #2
    Code:
    logit approval income_w dti20 dti20_30 dti30_36 dti36_49 dti50_60 fico680_699 fico700_719 fico720_739 ltv80 ltv80_85 ltv85_90 ltv90_95 origination_2019 refinance minority female age62 lender_top100 shadowbank fintech aus tract_minority_population_percen tract_owner_occupied_units tract_one_to_four_family_homes tract_median_age_of_housing_unit cra fhfa_index

    Comment


    • #3
      Code:
      Table collapsed on quantiles of estimated probabilities
      +-----------------------------------------------------------------+
      | Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |
      |-------+--------+--------+----------+--------+----------+--------|
      | 1 | 0.6193 | 77335 | 77379.4 | 171813 | 171768.6 | 249148 |
      | 2 | 0.7471 | 170945 | 172131.2 | 78204 | 77017.8 | 249149 |
      | 3 | 0.8465 | 200800 | 198841.8 | 48346 | 50304.2 | 249146 |
      | 4 | 0.8855 | 216880 | 216712.1 | 32268 | 32435.9 | 249148 |
      | 5 | 0.9037 | 223495 | 223061.9 | 25652 | 26085.1 | 249147 |
      |-------+--------+--------+----------+--------+----------+--------|
      | 6 | 0.9166 | 227089 | 226821.4 | 22059 | 22326.6 | 249148 |
      | 7 | 0.9275 | 229835 | 229762.4 | 19314 | 19386.6 | 249149 |
      | 8 | 0.9378 | 232215 | 232373.8 | 16932 | 16773.2 | 249147 |
      | 9 | 0.9492 | 234556 | 235019.0 | 14591 | 14128.0 | 249147 |
      | 10 | 0.9900 | 237511 | 238557.9 | 11636 | 10589.1 | 249147 |
      +-----------------------------------------------------------------+

      Comment


      • #4
        Code:
        Number of observations = 2,491,476
        Number of covariate patterns = 1,636,678
        Pearson chi2(1636649) = 2.48e+06
        Prob > chi2 = 0.0000

        Comment

        Working...
        X