I'm writing my master's thesis about macroeconomic influence on company bankruptcies in europe. The dependent variable assumes 1 if the company is bankrupted and 0 otherwise. The dataset is composed by 60 000 year-firm observations (panel data).

I've estimated a logit model with 6 variables, all of them with a p-value below 0.05 and with a pseudo-R2 around 0.40. When i run ROC analysis, i get an AUC of 0.98, what it seems to be good.

In the other hand, when i perform the hosmer-lemeshow test, the model does not seem to be fit.

Am i misinterpretating the statistics? This is valid model to use?

I'm sorry for this question, i'm a stata newbie.

Code:

. logit status V1 V4 V16 V19 V23 V24 Iteration 0: log likelihood = -2697.1256 Iteration 1: log likelihood = -2567.349 Iteration 2: log likelihood = -2058.2462 (backed up) Iteration 3: log likelihood = -1741.7657 Iteration 4: log likelihood = -1669.0762 Iteration 5: log likelihood = -1637.6074 Iteration 6: log likelihood = -1632.0543 Iteration 7: log likelihood = -1631.2841 Iteration 8: log likelihood = -1623.069 Iteration 9: log likelihood = -1620.5694 Iteration 10: log likelihood = -1620.115 Iteration 11: log likelihood = -1620.1138 Iteration 12: log likelihood = -1620.1138 Logistic regression Number of obs = 52,876 LR chi2(6) = 2154.02 Prob > chi2 = 0.0000 Log likelihood = -1620.1138 Pseudo R2 = 0.3993 ------------------------------------------------------------------------------ status | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- V1 | .0862743 .0220024 3.92 0.000 .0431504 .1293982 V4 | -5.499801 .1735809 -31.68 0.000 -5.840013 -5.159589 V16 | .1206167 .0415667 2.90 0.004 .0391474 .202086 V19 | -.6504074 .1139776 -5.71 0.000 -.8737993 -.4270155 V23 | 1.833306 .2800535 6.55 0.000 1.284411 2.382201 V24 | 3.021288 .3109562 9.72 0.000 2.411826 3.630751 _cons | -5.772843 .5186897 -11.13 0.000 -6.789457 -4.75623 ------------------------------------------------------------------------------ Note: 86 failures and 0 successes completely determined.

**Hosmer-Lameshow test**Code:

Logistic model for status, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) number of observations = 52876 number of groups = 10 Hosmer-Lemeshow chi2(8) = 88.71 Prob > chi2 = 0.0000

