Corporate failure prediction thesis- STATA help required

Matthias Demandt

Join Date: Mar 2016

Posts: 5
#1

Corporate failure prediction thesis- STATA help required

31 Mar 2016, 05:59

Dear all,

for our master thesis we are writing about corporate failure prediction, therefore we have to use STATA for numerous research purposes. But as we are real STATA dummies we are stranded at this point since we do not know how to solve our first STATA problem.

We have a sample of both failed and non-failed companies for each year (2010 - 2014), know we want to match a company in the failed group to a company in the non-failed group. We want to do this on the following variables: Country, Industry (NACE codes), Publicly quoted and size. Our work leader suggested we do this by propensity score matching and the psmatch2 in STATA. We have tried to start with this problem with the 'joinby' command in STATA, but we noticed that this is a merge command and therefore we got duplicates in the results.

So:
Problem: matching of failed and non-failed sample

Variables for matching:
Country

Industry

Size

Publicly quoted

Is there anybody of the other Statalist forum members who can help us with this problem?

PS: our work leader suggested that we had to 'append' the datasets of failed and non-failed companies in in the initial stage, this step is completed but we do not know if this is still necessary for us...
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2449
#2

31 Mar 2016, 08:44

For reasons I describe below, I think your study design is likely not a good choice, but I'll set that aside and offer some code that should be pretty close to doing what you want:

Code:

// Assumes you start with a file in which you have appended all failed and nonfailed corporations // Assumes fail = 1 and fail = 0 codes the outcome; x1,..., xk are the variables on which to match // Note that -joinby- is not relevant gen int pairid = . foreach yr in 2010 2011 2012 2013 2014 { psmatch2 fail x1 x2 ...xk if (year == `yr'), neighbor(1) replace pairid = cond(case ==1, _n1, _id) // tag matched pairs drop _* // clean up from psmatch2 }

Why I think your study design is not a good choice:

My impression is that you want to treat "failure" as the outcome variable in this study. If that is correct, and as you may well know, your study design is what epidemiologists would call a "case-control" study, i.e., one in which you have a sample stratified on the response variable. This kind of design is useful when the cases (the "fails" in your study) are rare and therefore expensive to find. I don't see that to be true here, since you apparently already have all the data on the cases and controls. In that situation, a case-control design is almost certain a harmful choice, as you are throwing away data on the controls that you already have! Further, matching is not always a good idea in case-control studies, for reasons beyond the current discussion. I hope some of the epi folk on this list may will chime in here.

Regards, Mike
2 likes
Comment
Roberto Liebscher

Join Date: Mar 2014

Posts: 92
#3

31 Mar 2016, 09:15

Adding to Mike's comment that you probably are about to model failure as a function of the named variables: Is there any reason that keeps you from running a logit or probit model on the covariates (country, industry etc.)?

Clearly, it is the objective of your research that determines what you should do. So stating your research question here might help clarifying things.
Comment
Matthias Demandt

Join Date: Mar 2016

Posts: 5
#4

01 Apr 2016, 02:30

Thank you for your responses!
Firstly our research purpose is to determine if corporate failure prediction models are as accurate in transition economies as they are for developed economies. Therefore we have a dataset of companies in Europe (which we are going to clean on the restrictions of earlier models developed for corporate failure prediction)
Now we want to find a sample of failed companies and a matched sample of non-failed companies (matching is done in earlier research, so for consistency matters we have to match) from which we can test the variables that predict failure the best. (Z-score model of Altman and O-Score of Ohlsen)

In this research we want to determine if the variables of earlier models are as effective for transition economies as they where for developed economies, and if they aren't we want to find which variables are the best addition for transition economies.

I hope I have answered your questions so our goal is more clear.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2449
#5

01 Apr 2016, 07:59

For a sample stratified on the response variable, the predicted probability of failure from a logistic regression model will not be correct. (Odds ratios are invariant with respect to such stratification, but predicted probabilities are not.) It is possible to adjust for this fact, but that requires (if I recall correctly) that one know or be able to estimate the base rate of failure in the population. I presume that something like this was done in the original literature you are trying to follow, so if you don't mind giving an exact citation to one of the articles that describes the procedure, I'd be interested.
Comment
Roberto Liebscher

Join Date: Mar 2014

Posts: 92
#6

01 Apr 2016, 08:34

Okay, Matthias. Now I believe to have better understanding of what you are trying to do. Still, I do not know how propensity score matching relates to your research. Usually, the propensity score method is used to achieve indepence of the potential outcome (the y-variable) of treatment assignment. So for instance, if you would have a sample of firms of which some get subsidies and other's do not and you are interested whether this predicts failure one way could be:
To run a model of the form $$Pr(Subsidies_i=1|\mathbf{X})=\Phi(\mathbf{X^T\be ta})$$ where $$X$$ contains the variables that could affect the probability of receiving subsidies. The predicted probabilities from this model are called the propensity score.

Match the companies with subsidies with the compnanies without subsidies but with similar propensity score.

Compare failure rates of the matched subsamples (Subsidies vs. Without) to get the ATT (average treatment effect of the treatment, here: the effect of subsidies on failure rates).

As you do not seem to have such a treatment variable I am wondering what it is that you are want to estimate the propensity for. Like Mike, I would be happy to know the original paper. This would certainly help.
Comment
Matthias Demandt

Join Date: Mar 2016

Posts: 5
#7

12 Apr 2016, 01:42

Hello! Sorry for my late response but I was on vacation the last ten days and didn't have a properly working internet connection.
Your responses are a lot of help to us so I would like to thank you all in advance for the efforts you have already made.
Regarding the research we are trying to follow it concerns:
Edward Altman, The journal of Finance (1968)

James A. Ohlson, financial ratios and the probabilistic prediction of bankruptcy (1980)

Massimiliano Celli, Can Z-score model predict listed companies failures in Italy (2014)

Alareeni & Branson, predicting listed companies Failure in Jordan using Altman's models: a case study (2012)

These are the main papers we are following in our research, we are trying to replicate the models created by Altman and Olson in order to know if their models are applicable in transition economies. Hopefully these papers are a help to understand our research purpose.

In regards to the matching, as can be read in the papers, the matching is only used to create a sample of matched failed and non-failed firms. On this sample the actual research is performed.(Matching is controversial in CFP, but we have to do it in order to have a real duplicate of the initial research)

Greetings, Matthias
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36053
#8

12 Apr 2016, 02:10

Not my field at all, but to try to keep this thread useful to everyone, I suggest that you ask only precise, specific questions based on data examples and Stata commands. Very general questions that invite (or even instruct) us to tell you what to do in entire projects almost always do not work. I could expand on that, but a strong hint I trust is enough.
Comment
Matthias Demandt

Join Date: Mar 2016

Posts: 5
#9

12 Apr 2016, 02:33

Dear Nick, thank you for your reply. I understand that we have wandered a bit off topic, but the main purpose of us on this moment is to establish the code to match our 2 samples of failed and non-failed firms in order to have a single sample with failed and non-failed firms.
We were instructed by our team leader to do this by propensity score matching because she suggested that this is the most commonly used method for the matching of a failed firm to an equal non-failed firm.
I am very sorry that we aren't able to speak in more detail about Stata related matter, because we haven't had a good introduction or explanation of the methods and commands used in Stata.

greetings Matthias
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36053
#10

12 Apr 2016, 02:43

Matthias: Thanks for your response, but it doesn't change my advice. Anyone can ask a question here, but it's a waste of your time to ask questions that won't be answered. Until and unless you learn to pose specific Stata questions you may get good strategic hints, but not much else.
Comment

Roberto Liebscher

Join Date: Mar 2014
Posts: 92

#11

01 May 2016, 10:29

Matthias, I cannot see that in any of the papers a case is made for Propensity Score Matching. If matching is used then for the construction of the sample so that for any bankrupt company one non-bankrupt company is matched. However, at no point a propensity score is computed. Based on what you wrote in this thread and the literature you gave I think your agend should look like this:

Find a sample of bankrupt companies and gather all balance sheet information you need to compute the Z-Score (for an (arbitrary) number of years before the bankruptcy).
For each company(-year?) of a bankrupt company find one company(-year?) within the same industry and country and similar size.
Compute the z-score based on the gathered information.
Find the numer of correctly classified banktruptcies based on the z-score (1-year ahead, 5-year ahead, etc.).
Repeat this exercise for another market and compare the prediction rates between the markets.

One could discuss why a 1 to 1 match should be used here but assuming that you want to follow this approach, here is one (quick) example using joinby:

Code:

//Set up
version 13.1
clear
set seed 452

// Constructing an artificial dataset
set obs 5
gen year = _n
expand 8
bysort year: gen id = _n
sort id year
gen bankrupt = cond(id<3, 1, 0)
gen industry = 1 if inlist(id, 1, 3, 4, 5)
replace industry = 2 if inlist(id, 2, 6, 7, 8)
gen size = round(runiform()*100,1) if year == 1
gen tmpvar = rnormal(0, 2)
bysort id (year): gen sumtmpvar = sum(tmpvar)
bysort id (year): replace size = size[_n-1]+sumtmpvar if year != 1
drop *tmpvar

****************************************************************
** Find for any company that goes bankrupt, a company that does not
** in the same industry and with similar size in year 1
****************************************************************

//Dataset with the non-bankrupt companies saved in extra file
tempfile bankrupts
preserve
keep if bankrupt == 0
rename (id bankrupt size) =0
save "`bankrupts'"
restore

//Match with dataset of bankrupt companies (only match companies of same industry)
keep if bankrupt == 1
joinby year industry using "`bankrupts'"

//Find the company with smallest difference to bankrupt company in terms of size in year 1
gen sizediff = abs(size-size0) if year == 1
bysort id: egen mindiff = min(sizediff)
gen matchid0 = id0 if sizediff == mindiff
bysort id: egen matchid = max(matchid0)
keep if id0 == matchid

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#12

01 May 2016, 12:55

There is good reason to reject matching by propensity score as an analytic tactic. See the reference and link at http://www.statalist.org/forums/foru...y-king-article

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment

Announcement