Classification of logistic regression predicted probability score to include unequal misclassification cost

Mouldi Ben Ammar

Join Date: Oct 2018

Posts: 15
#1

Classification of logistic regression predicted probability score to include unequal misclassification cost

08 Oct 2018, 01:10

Dear All,

I'm working on bankruptcy prediction models using logit regression with unbalanced data between the size of bankrupt and healthy firms (as mentioned in the table below ) , i found Corrected classification rate around 95% but this rate ignore low rate of prediction of bankrupt firms ( this due to the assumption of equal cost of misclassification for both classes). Please how i can give a higher rate (cost) to the false negative than the false positive ? to minimize the average cost of misclassification. Thank you

. estat classification

Logistic model for dep

estat classification

Logistic model for dep

-------- True --------
Classified | D ~D | Total
-----------+--------------------------+-----------
+ | 2 8 | 10
- | 596 99342 | 99938
-----------+--------------------------+-----------
Total | 598 99350 | 99948

Classified + if predicted Pr(D) >= .5
True D defined as dep != 0
--------------------------------------------------
Sensitivity Pr( +| D) 0.33%
Specificity Pr( -|~D) 99.99%
Positive predictive value Pr( D| +) 20.00%
Negative predictive value Pr(~D| -) 99.40%
--------------------------------------------------
False + rate for true ~D Pr( +|~D) 0.01%
False - rate for true D Pr( -| D) 99.67%
False + rate for classified + Pr(~D| +) 80.00%
False - rate for classified - Pr( D| -) 0.60%
--------------------------------------------------
Correctly classified 99.40%
--------------------------------------------------
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#2

08 Oct 2018, 09:21

There are two issues here, one of which you raise, and the other of which you have said nothing about.

First the one you don't mention: -estat classification- by default calculates the predictions by saying that a bankruptcy is predicted if the logistic model calculates a probability of > 50%. But in using a predictive model, you can choose a different cutpoint. So for example, if you were to use a calculated probability of, say, 30% as the cutoff, more of the predictions would be considered positive, and so you would have fewer false negatives and the scheme might work better for you. You can do this in Stata by specifying the -cutoff()- option in your -estat classification- command. The impact of this on the "correctly classified" statistic is not predictable, however, because it depends on how well calibrated the model is.

As for what you do ask about, -estat classification- simply calculates the frequency with which the prediction and the observed agree. What you are asking for, having false negatives cost more than false positives, is a different matter altogether. You are talking about a loss function or a utility function. If you want a false negative to cost, say, twice as much as a false positive, you can just calculate Loss = 2*(1-sensitivity) + (1-specificity). If you need to program that calculation, -estat classification- leaves the sensitivity behind as r(P_p1) and the specificity as r(P_n0). Added: That will give you a cost-weighted assessment of the frequency of prediction errors. But, in actual use, you would have to further weight this loss function by the frequency of bankruptcy. So if 5% of the entities you are evaluating go bankrupt, your expected loss will be 0.05*2*(1-sensitivity) + 0.95*(1-specificity).

By the way, these two issues, though distinct, are by no means mutually exclusive approaches to your problem. You will, in any case, want to calculate a loss function that weights false positives and negatives in proportion to their costs. But you may also want to choose a different cutoff to define the predictions in the first place. In fact, a sensible decision theoretic approach would be to run this analysis using a range of cutoffs, calculate the corresponding loss function, and then use the cutoff that provides the smallest expected loss function.

Finally, I will just use this opportunity to reiterate my rant against the "correctly classified" statistic. It is useless. In real life there is almost never a situation in which false positives and false negatives are equally important. So for decision making, using a statistic that counts them equally is dangerous and misleading. Moreover, the "correctly classified" statistic is also contaminated by the prevalence of actual positives and negatives. So you can have a prediction mechanism that is seriously deficient at predicting, say, negatives, but as long as negatives are rare, the correctly classified statistic looks very good. So correctly classified is neither a decent measure of the accuracy of a prediction mechanism, nor is it useful for decision making. The only thing that can really be said for it is that if you only think about it superficially it's easy to understand.

Last edited by Clyde Schechter; 08 Oct 2018, 09:36.
5 likes
Comment
Mouldi Ben Ammar

Join Date: Oct 2018

Posts: 15
#3

08 Oct 2018, 11:51

Thank you for quick response , when we would like predict the probability of default , it is too costly if you classify a bankrupt firm as healthy rather than a healthy company as bankrupt . that is why i would like incorporate weighted cost related to bankrupt firms in loss function . Please , i used youden index ( sum of false rates ( liu product of false rates) command stata <cutpt >to find the empirical cut point (as endogenous threshold) and i found at .0077 this results below: True --------
Classified D ~D Total

405 15686 16091
193 83664 83857

Total 598 99350 99948

Classified + if predicted Pr(D) >= .0077005
True D defined as dep != 0

Sensitivity Pr( + D) 67.73%
Specificity Pr( -~D) 84.21%
Positive predictive value Pr( D +) 2.52%
Negative predictive value Pr(~D -) 99.77%

False + rate for true ~D Pr( +~D) 15.79%
False - rate for true D Pr( - D) 32.27%
False + rate for classified + Pr(~D +) 97.48%
False - rate for classified - Pr( D -) 0.23%

Correctly classified 84.11%

is it this classification with unequal cost misclassification. Thank you
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#4

08 Oct 2018, 12:39

No, the Youden index is not useful for this. It may be better than the arbitrary 50% cutoff, but it still fails to take into account your specific weighting of the costs of false postives and false negatives, and it also fails to take into account the frequency with which bankruptcies actually occur.

You need to first decide just what the costs of false positive and false negative predictions in your situation are. Let's call them C_fp and C_fn, respectively. You also need an estimate of what proportion of the entities you are interested in actually go bankrupt. Let's call that B (between 0 and 1). (If your data are a random sample of the population of entities you plan to apply your prediction rule to, then you can get this from your data as just the proportion of bankruptcies observed in the data.)

Then you need to run -estat classification- with a range of values of cutpoints. And then you have to calculate the weighted expression I showed in #2. Then you can pick the cutoff that has the lowest expected loss. Code would look something like this:

[code]
forvalues c = 5(5)95 {
estat classification, cutoff(`=`c'/100')
local fn = 1-r(P_p1)
local fp = 1-r(P_n0)
// IN THE NEXT LINE REPLACE B, C_fn, AND C_fp
// BY THEIR ACTUAL VALUES
local expected_loss = B*C_fn*`fn' + (1-B)*C_fp*`fp'
display "At cutoff `c', expected loss is `expected_loss'
}

Then you select the value of the cutoff that gives you the lowest expected loss. Evidently this approach is approximate, because we only look at cutoffs of 5%, 10%, ... 95%. If you want a more refined approach, find the values here that bracket the apparently maximum value, and then do a more fine-grained search on c between those values by modifying the -forvalues- command.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#5

08 Oct 2018, 13:13

Dear Mouldi Ben Ammar,

Clyde Schechter has provided great advice (I particularly liked the rant against the "correctly classified" statistic) but let me try to add something to this interesting thread.

1 - If the two kinds of errors have different costs, you have an asymmetric loss function as Clyde noted. In that case, you may want to estimate a quantile regression using the so-called (smooth) maximum score estimator.

2 - There is potentially a bigger problem here. If the data you have was collected by a bank it will not be representative of the population of interest because the bank only lent money to clients considered creditworthy.

Best wishes,

Joao
Comment
Mouldi Ben Ammar

Join Date: Oct 2018

Posts: 15
#6

08 Oct 2018, 20:13

Thank you so much @Clyde Schechter for your interest and details explanation , the code works fine just i would like store the results (all cutoff & expected loss correspond ) and return the optimal value of cutoff & minimum expected loss ? please how i can incorporate in the code #4 this two issue?
Comment
Mouldi Ben Ammar

Join Date: Oct 2018

Posts: 15
#7

08 Oct 2018, 20:22

Thank you @Joao Santos Silva for your help , my data was collected from Compustat focused only on non-financial firms . please by using the Logistic quantile regression (lqreg) , how i can include the unequal misclassification cost ? Thank you
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30116

08 Oct 2018, 21:18

You could do this:

Code:

capture postutil clear
postfile handle float cutoff double expected_loss using results, replace

forvalues c = 5(5)95 {
    estat classification, cutoff(`=`c'/100')
    local fn = 1-r(P_p1)
    local fp = 1-r(P_n0)
    // IN THE NEXT LINE REPLACE B, C_fn, AND C_fp
    // BY THEIR ACTUAL VALUES
    local expected_loss = B*C_fn*`fn' + (1-B)*C_fp*`fp'
    display "At cutoff `c'%, expected loss is `expected_loss'
    post handle (`c') (`expected_loss')
}

postclose handle

use results, clear
summ results
list if float(expected_results) == float(`r(min)'), noobs clean

The last thing that gets displayed will be the cutoff(s) that produce the minimum expected loss, and the data set results will contain the expected loss generated at each cutoff tried in the loop.

Comment

Mouldi Ben Ammar

Join Date: Oct 2018

Posts: 15
#9

09 Oct 2018, 04:21

Thank you @Clyde Schechter . the code is fine but usually i found the smallest expected loss with the first value of cutoff point (0) especially when i use imbalanced samples between bankrupt & healthy firms ( in my example proportion of bankrupt equal 598/99948=0.006) whereas when i used expected loss= fn+fp , i found the result already estimated by Youden (0.008) ( loss function +-+ )
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#10

09 Oct 2018, 05:36

Dear Mouldi Ben Ammar,

If your data was not collected by a bank then it should be OK. About quantile regression, note that I was not talking about logistic quantile regression (lqreg) but about the (smoothed) maximum score estimator:

Horowitz, J.L. (1992): A Smoothed Maximum Score Estimator for the Binary Response Model, Econometrica, 60, 505-531.
Manski, C.F. (1975): Maximum Score Estimation of the Stochastic Utility Model of Choice, Journal of Econometrics, 3, 205-228.
Manski, C.F. (1985): Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator, Journal of Econometrics, 27, 313-333.

Best wishes,

Joao
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#11

09 Oct 2018, 18:20

the code is fine but usually i found the smallest expected loss with the first value of cutoff point (0) especially when i use imbalanced samples between bankrupt & healthy firms ( in my example proportion of bankrupt equal 598/99948=0.006) whereas when i used expected loss= fn+fp , i found the result already estimated by Youden (0.008) ( loss function +-+ )

The proportion of bankrupts you need to use is the proportion of bankrupts in the population of firms to which you will be applying your prediction rule. That may be different from the proportion of bankrupts in your sample. In your sample the proportion of bankrupts is approximately 6 per 1,000. Is that representative of the population to which you want to apply your prediction rule?

It is possible that your loss function coincidentally does correctly turn out to closely match fp + fn in the following circumstance, however: if B*C_fn is nearly equal to (1-B)*C_fp. Another way of saying this is that your loss function will correctly match fp + fn if the odds of bankruptcy is (very nearly) equal to C_fp/C_fn. Perhaps that is the case for you. If not, then ending up with a loss function that looks like fp + fn, means that you have made an error somewhere in your work. But it is easy enough for you to calculate B/(1-B) and C_fp/C_fn to see if they are (very nearly) equal.

Now, looking back at your original classification table, which was generated with the default cutoff of 0.5

Code:

-------- True -------- Classified | D ~D | Total -----------+--------------------------+----------- + | 2 8 | 10 - | 596 99342 | 99938 -----------+--------------------------+----------- Total | 598 99350 | 99948

this confirms that in your sample the bankrupts are a very small fraction of the total sample. It is also clearly true that requiring a 50% predicted probability to identify them misses all but 2 of them, leaving an enormous number of false negatives. So it is clear that to capture more bankrupts as true positives, the cutoff needs to be much lower than 50%. If it turns out to be very near 0, that wouldn't be terribly surprising, but then you will pay a huge price of false positives. In fact, if the cutoff is exactly 0, that means that everybody is predicted to be a bankrupt, which means that all of the non-bankrupts are false positives. I don't know what the costs of false positives and false negatives are, but unless the cost of a false positive is essentially 0, or the cost of a false negative is astronomical, this isn't likely to be right, especially since there are so many non-bankrupts and so few bankrupts (at least in your data). Without knowing more about the data and the costs of false positives and false negatives I can't say anything more specific.

Finally, there is another thing we need to look at here. How well does your logistic model actually discriminate the bankrupts from the non-bankrupts. You can get that by running the command -lroc- after your logistic regression. That will tell you the area under the receiver operating curve, which is the commonest statistic used to assess discrimination. It ranges, in principle, between 0.5 and 1.0. A value of 0.5 means that your model has no discriminatory power at all: it is just making random predictions. A value of 1.0 means that the model perfectly distinguishes bankrupts from non-bankrupts, and that there is a cutoff that will provide you with no false positives and no false negatives. Neither of those situations arises often in real life. Usually you get values in between. If your value is close to 0.5, it means that no matter where you set your cutpoint, you are going to have poor sensitivity and specificity. Generally speaking, models that have an area under the ROC curve less than 0.7 are not useful for practical purposes, and trying to calibrate a best cutpoint to use with them will typically be futile as no cutpoint will be very good.

Anyway, these are some things you can think about and pursue.
Comment
Mouldi Ben Ammar

Join Date: Oct 2018

Posts: 15
#12

10 Oct 2018, 00:37

Dear Clyde Schechter, my target population to witch i want apply the prediction rule ( all bankruptcies) i will check the model accuracy in sample as Shumway did in 2001 ( he used all database iid panel data not a paired matching). regarding the Area under Roc , i found area under ROC curve = 0.8045 may be we can interpret as good means the model able to discriminant or classify around 80%
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#13

10 Oct 2018, 21:45

An area under the ROC curve of 0.80 is very good, indeed. Your interpretation of it is not quite correct, however.

One way of interpreting the ROC curve area is called the two-point forced-choice probability. That means that if you were to randomly select two firms, one of them a bankrupt and the other not, the probability that the model will predict a higher probability of bankruptcy to the one that actually goes bankrupt is the area under the ROC curve--in your case, 80%.
1 like
Comment
Mouldi Ben Ammar

Join Date: Oct 2018

Posts: 15
#14

13 Oct 2018, 01:19

Thank you Clyde Schechter for your valuable helps and advises .
Comment

Announcement

Classification of logistic regression predicted probability score to include unequal misclassification cost

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment