Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Maximum Likelihood Estimation for a bankruptcy prediction model

    Dear Stata community,

    I’d like to start by saying I’m a beginning Stata user. I am working on my thesis. My thesis is about bankruptcy prediction models. I want to calibrate one of the prediction models. The prediction model uses nine variables (the nine variables are accounting ratios). Based on these nine variables the prediction model tries to predict future bankruptcies. To do this I want to use the maximum likelihood estimation. I have found a formula to use but I can’t seem to get it to work to find an estimation for all nine variables.
    My data list is as follows:

    Bankrupt x1 x2 x3 x4 x5 x6 x7 x8 x9
    1 6,8 0,8 0,2 0,4 0,0 0,1 0,2 0,0 -20,3
    1 7,0 0,8 0,2 0,4 0,0 -0,1 -0,1 1,0 0,6
    1 6,3 0,8 0,4 0,5 0,0 -0,2 -0,3 1,0 0,2
    1 6,6 4,5 -2,2 12,6 1,0 0,1 0,0 0,0 1,6
    0 6,6 0,6 0,3 0,4 0,0 0,0 0,1 0,0 0,1
    0 6,5 0,5 0,0 0,0 0,0 0,2 0,6 0,0 0,3
    0 7,6 0,1 0,6 0,2 0,0 0,1 1,5 0,0 0,1

    If a firm goes bankrupt in the following year it receives a ‘1’ in the Bankrupt column, ‘0’ otherwise. To try and find the maximum likelihood estimation I use the following program code:

    program define mylogit
    args lnf Xb

    quietly replace `lnf' = -ln(1+exp(-`Xb')) if $ML_y1==1

    quietly replace `lnf' = -`Xb' - ln(1+exp(-`Xb')) if $ML_y1==0

    end


    To estimate my variables I run the program as follows:

    ml model lf mylogit (Bankrupt =x1 x2 x3 x4 x5 x6 x7 x8 x9)
    ml maximize

    However, if I try to use this many variables I get the following error:
    “Could not calculate numerical derivatives – discontinuous region with missing values encountered
    r(430)”

    If I don’t use all my variables I do receive estimations, for example if I only use the first four variables:
    ml model lf mylogit (Bankrupt =x1 x2 x3 x4)
    ml maximize

    The estimations are found after 8 iterations, when I use more variables, it seems more iterations are needed. I have been able to receive estimations for all variables as long as I do not use more than five at the same time.

    I found in an earlier forum post that the reason for errors such as the one I am receiving might be because the program perhaps needs a better hint for starting values for parameters. However, it could also be I that I am trying to fit a model which is too complicated, might be due to the use of nine variables. I am in dire need of some directions to find my way in Stata. Any help is much appreciated.

    I look forward to any suggestions.

    Yours sincerely,

    Antonie Pronk

  • #2
    Antonie Pronk --

    Perhaps it is the case that you one variable doesn't change in the sample, perfectly predicts the outcome, or that some subset of variables are perfectly co-linear. I might suggest running the model with the built-in logit command to see if this is the case, as Stata does a pretty good job of checking these things beforehand, while a user-written ML routine will not.

    Hope that helps!

    Matthew J. Baker

    Comment


    • #3
      Thanks for your reply Matthew J. Baker! This helped me out a great deal.

      Kind regards,
      Antonie Pronk

      Comment

      Working...
      X