Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predict command

    Hi,

    I estimated the probability of default of loans using borrower characteristics. First, I run a probit regression. The dependent variable is a dummy variable that equals one if the loan is defaulted and equals zero otherwise. The independent variables are borrower characteristics, such as homeowner dummy, credit history length, and etc. I use the "predict" command in Stata to estimate the probability of default of each loan as percentages.

    I'm now trying to run the probit regression on only half of loans in my sample and use the coefficients generated in this regression to predict the probability of default as percentages for the whole sample. How can I do it? Please see the following data sample.

    input float(default_1 homeowner_1) long amount_delinquent float bankcard_utilization long revolving_balance float credit_history_length byte delinquencies_over60_days
    0 0 0 .83 22144 19.164955 0
    0 1 0 .45 23427 21.08145 3
    1 1 0 .4 29815 26.55989 0
    0 1 0 .2 7484 16.347708 0
    0 1 0 .75 3622 21.141684 0
    0 0 0 .2 331 14.557153 0
    1 1 0 .67 67001 18.283367 0
    0 1 0 .95 62094 17.18549 0
    0 1 0 0 1092 12.364134 0
    0 1 0 .5 15593 23.140314 1
    0 1 0 .33 14145 28.249144 0
    0 1 0 .77 50439 27.54278 0
    0 1 0 .69 27053 24.0219 1
    1 1 0 .71 15149 33.99589 2
    0 1 232 .75 58265 19.28268 0
    0 0 0 .01 29 12.427105 0
    1 0 0 .59 55194 22.29432 0

  • #2
    So the first step is to select the two halves of the sample. If you want a random selection, as is usually the case for this kind of analysis, you can do that with:
    Code:
    set seed 1234 // OR YOUR FAVORITE RANDOM NUMBER SEED
    gen byte first_sample = runiformint(0, 1)
    Then run your probit regression conditioned on that
    Code:
    probit outcome predictors if first_sample
    Now, run -predict- and you will get predicted probabilities both in and out of sample
    Code:
    predict predicted_prob
    Then you have to see how well your predicted outcomes work in the observed ones in the sample that was not used in the probit regression. There are several ways to do this. I like to test discrimination with the area under the ROC curve, and calibration with the Hosmer-Lemeshow statistic. To do that:

    Code:
    estat gof if !first_sample, group(10) outsample
    lroc if !first_sample


    Comment


    • #3
      Thank you, Clyde! It worked very well.

      Comment

      Working...
      X