Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Discriminant Analysis

    Hey everyone,

    my goal is to calculate a credit score for new clients and define with the help of the score a group membership. The higher the score the better the rating.
    Therefore I want to use the discriminant analysis from Stata.

    I have data from 2012-2014 and a file for new clients from 2015. The goal is to provide a score for the new clients from 2015.

    The first step is to run the analysis for the old clients.

    discrim lda term_d emp_length q_annual_inc q_dti q_fico_high inq_last_6mths, group(grade) [done with data from 2012-2014]

    estat loadings, unstandardized standardized
    predict d_score, dscore
    predict probability, pr
    predict newgroup, classification

    I have several questions:
    1) Is it possible to calculate the dscore by hand? That would help me for a better understanding.
    2) How can I get the critical values for each grade. For example if dscore is bigger 15 than grade 1, between 12-15 grade 2, etc. and is it dependent on the dscore?
    3) Are there ways to improve the model? At the moment it fits not that well.
    4) Do you have an example how I can apply the second data set on the first result of grouping? I cannot find it in the literature.




  • #2
    Christoph, the "dscore" is a simple linear combination of the discriminant function, the same as linear regression (y=a+b1xa+b2x2...)
    the easiest way to use the variable from 2012 to 2015 data is to append the two datasets so that you have a variable year equal 2012-2015. Then, you can use the if condition to restrict the model to the relevant years and predict on all other years.
    Code:
    discrim lda term_d emp_length q_annual_inc q_dti q_fico_high inq_last_6mths if year<2015, group(grade)
    predict d_score, dscore
    predict probability, pr
    predict newgroup, classification
    improving the fit is based on different aspects of your data. First, how do you decide if the model fit or not? based on existing literature in the field or just rule of thumb. If you have enough information to argue that, the first step is to fit the model with agreed set of variables that other scholar in your field use. Then, you can search for another variable(s) that might discriminant the groups. Another way, is to to check if your data is really represent the population and not specific clients.
    As for your other questions, I'm not sure I understand it.

    Comment

    Working...
    X