Dear Stata community,
I’d like to start by saying I’m a beginning Stata user. I am working on my thesis. My thesis is about bankruptcy prediction models. I want to calibrate one of the prediction models. The prediction model uses nine variables (the nine variables are accounting ratios). Based on these nine variables the prediction model tries to predict future bankruptcies. To do this I want to use the maximum likelihood estimation. I have found a formula to use but I can’t seem to get it to work to find an estimation for all nine variables.
My data list is as follows:
Bankrupt x1 x2 x3 x4 x5 x6 x7 x8 x9
1 6,8 0,8 0,2 0,4 0,0 0,1 0,2 0,0 -20,3
1 7,0 0,8 0,2 0,4 0,0 -0,1 -0,1 1,0 0,6
1 6,3 0,8 0,4 0,5 0,0 -0,2 -0,3 1,0 0,2
1 6,6 4,5 -2,2 12,6 1,0 0,1 0,0 0,0 1,6
0 6,6 0,6 0,3 0,4 0,0 0,0 0,1 0,0 0,1
0 6,5 0,5 0,0 0,0 0,0 0,2 0,6 0,0 0,3
0 7,6 0,1 0,6 0,2 0,0 0,1 1,5 0,0 0,1
If a firm goes bankrupt in the following year it receives a ‘1’ in the Bankrupt column, ‘0’ otherwise. To try and find the maximum likelihood estimation I use the following program code:
program define mylogit
args lnf Xb
quietly replace `lnf' = -ln(1+exp(-`Xb')) if $ML_y1==1
quietly replace `lnf' = -`Xb' - ln(1+exp(-`Xb')) if $ML_y1==0
end
To estimate my variables I run the program as follows:
ml model lf mylogit (Bankrupt =x1 x2 x3 x4 x5 x6 x7 x8 x9)
ml maximize
However, if I try to use this many variables I get the following error:
“Could not calculate numerical derivatives – discontinuous region with missing values encountered
r(430)”
If I don’t use all my variables I do receive estimations, for example if I only use the first four variables:
ml model lf mylogit (Bankrupt =x1 x2 x3 x4)
ml maximize
The estimations are found after 8 iterations, when I use more variables, it seems more iterations are needed. I have been able to receive estimations for all variables as long as I do not use more than five at the same time.
I found in an earlier forum post that the reason for errors such as the one I am receiving might be because the program perhaps needs a better hint for starting values for parameters. However, it could also be I that I am trying to fit a model which is too complicated, might be due to the use of nine variables. I am in dire need of some directions to find my way in Stata. Any help is much appreciated.
I look forward to any suggestions.
Yours sincerely,
Antonie Pronk
I’d like to start by saying I’m a beginning Stata user. I am working on my thesis. My thesis is about bankruptcy prediction models. I want to calibrate one of the prediction models. The prediction model uses nine variables (the nine variables are accounting ratios). Based on these nine variables the prediction model tries to predict future bankruptcies. To do this I want to use the maximum likelihood estimation. I have found a formula to use but I can’t seem to get it to work to find an estimation for all nine variables.
My data list is as follows:
Bankrupt x1 x2 x3 x4 x5 x6 x7 x8 x9
1 6,8 0,8 0,2 0,4 0,0 0,1 0,2 0,0 -20,3
1 7,0 0,8 0,2 0,4 0,0 -0,1 -0,1 1,0 0,6
1 6,3 0,8 0,4 0,5 0,0 -0,2 -0,3 1,0 0,2
1 6,6 4,5 -2,2 12,6 1,0 0,1 0,0 0,0 1,6
0 6,6 0,6 0,3 0,4 0,0 0,0 0,1 0,0 0,1
0 6,5 0,5 0,0 0,0 0,0 0,2 0,6 0,0 0,3
0 7,6 0,1 0,6 0,2 0,0 0,1 1,5 0,0 0,1
If a firm goes bankrupt in the following year it receives a ‘1’ in the Bankrupt column, ‘0’ otherwise. To try and find the maximum likelihood estimation I use the following program code:
program define mylogit
args lnf Xb
quietly replace `lnf' = -ln(1+exp(-`Xb')) if $ML_y1==1
quietly replace `lnf' = -`Xb' - ln(1+exp(-`Xb')) if $ML_y1==0
end
To estimate my variables I run the program as follows:
ml model lf mylogit (Bankrupt =x1 x2 x3 x4 x5 x6 x7 x8 x9)
ml maximize
However, if I try to use this many variables I get the following error:
“Could not calculate numerical derivatives – discontinuous region with missing values encountered
r(430)”
If I don’t use all my variables I do receive estimations, for example if I only use the first four variables:
ml model lf mylogit (Bankrupt =x1 x2 x3 x4)
ml maximize
The estimations are found after 8 iterations, when I use more variables, it seems more iterations are needed. I have been able to receive estimations for all variables as long as I do not use more than five at the same time.
I found in an earlier forum post that the reason for errors such as the one I am receiving might be because the program perhaps needs a better hint for starting values for parameters. However, it could also be I that I am trying to fit a model which is too complicated, might be due to the use of nine variables. I am in dire need of some directions to find my way in Stata. Any help is much appreciated.
I look forward to any suggestions.
Yours sincerely,
Antonie Pronk
Comment