Predict command

Xinruo Wang

Join Date: Feb 2019

Posts: 10
#1

Predict command

07 Apr 2019, 16:11

Hi,

I estimated the probability of default of loans using borrower characteristics. First, I run a probit regression. The dependent variable is a dummy variable that equals one if the loan is defaulted and equals zero otherwise. The independent variables are borrower characteristics, such as homeowner dummy, credit history length, and etc. I use the "predict" command in Stata to estimate the probability of default of each loan as percentages.

I'm now trying to run the probit regression on only half of loans in my sample and use the coefficients generated in this regression to predict the probability of default as percentages for the whole sample. How can I do it? Please see the following data sample.

input float(default_1 homeowner_1) long amount_delinquent float bankcard_utilization long revolving_balance float credit_history_length byte delinquencies_over60_days
0 0 0 .83 22144 19.164955 0
0 1 0 .45 23427 21.08145 3
1 1 0 .4 29815 26.55989 0
0 1 0 .2 7484 16.347708 0
0 1 0 .75 3622 21.141684 0
0 0 0 .2 331 14.557153 0
1 1 0 .67 67001 18.283367 0
0 1 0 .95 62094 17.18549 0
0 1 0 0 1092 12.364134 0
0 1 0 .5 15593 23.140314 1
0 1 0 .33 14145 28.249144 0
0 1 0 .77 50439 27.54278 0
0 1 0 .69 27053 24.0219 1
1 1 0 .71 15149 33.99589 2
0 1 232 .75 58265 19.28268 0
0 0 0 .01 29 12.427105 0
1 0 0 .59 55194 22.29432 0
Tags: predict, predicted probabilities
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

07 Apr 2019, 18:17

So the first step is to select the two halves of the sample. If you want a random selection, as is usually the case for this kind of analysis, you can do that with:

Code:

set seed 1234 // OR YOUR FAVORITE RANDOM NUMBER SEED gen byte first_sample = runiformint(0, 1)

Then run your probit regression conditioned on that

Code:

probit outcome predictors if first_sample

Now, run -predict- and you will get predicted probabilities both in and out of sample

Code:

predict predicted_prob

Then you have to see how well your predicted outcomes work in the observed ones in the sample that was not used in the probit regression. There are several ways to do this. I like to test discrimination with the area under the ROC curve, and calibration with the Hosmer-Lemeshow statistic. To do that:

Code:

estat gof if !first_sample, group(10) outsample lroc if !first_sample
2 likes
Comment
Xinruo Wang

Join Date: Feb 2019

Posts: 10
#3

12 Apr 2019, 15:36

Thank you, Clyde! It worked very well.
Comment

Announcement

Comment

Comment