Binary dependent variable with heavily skewed zeroes; convergence not obtained for logit, probit, or cloglog

Neil Meredith

Join Date: Jul 2015

Posts: 12
#1

Binary dependent variable with heavily skewed zeroes; convergence not obtained for logit, probit, or cloglog

04 Oct 2016, 12:45

Hello Stata Listers,

I am using a quarterly panel dataset of U.S. commercial banks from 2000 to 2015 to predict bank performance. One of my dependent variables, failure, is 1 when a bank fails in a given quarter and 0 otherwise. Because bank failures do not occur often, only 0.15% of observations out of 355,844 observations are a "1" for failure while the other 99.85% of observations are "0". My independent variables are a standard capital ratio, year dummies, and quarter dummies. I have attempted running the following commands:

xi: xtlogit failure capital_ratio i.year i.quarter, fe vce(bootstrap)
xi: logit failure capital_ratio i.year i.quarter, cluster(bank_id)
xi: probit failure capital_ratio i.year i.quarter, cluster(bank_id)
xi: cloglog failure capital_ratio i.year i.quarter, cluster(bank_id)

I was not able to obtain convergence with any of the above estimators. I believe my highly skewed dependent variable is causing the problem. I am able to obtain linear probability model estimates using the following command:

xi: xtreg failure capital_ratio i.year i.quarter, fe vce(robust)

I believe my highly skewed dependent variable is causing the problem with convergence using logit, probit, or cloglog. If it is and I can obtain results from the linear probability model that are flawed (e.g. predicted negative values for failure) but useful for my research question (in my case, I only care about the sign of the capital ratio coefficient and its magnitude does not matter much), should I use the linear probability model results? Is there another estimator or technique for handling a dependent variable that is as heavily skewed as the failure variable?

Thank you for any help you can provide.

Sincerely,
Neil
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

04 Oct 2016, 13:26

Hello Neil,

Welcome to the Stata Forum!

You theme is far from my field. That said, a zero-inflated Poisson (- zip- ) or Negative Binomial (- zinb-) regression model may be what you wish.

Best,

Marcos

Best regards,

Marcos
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#3

04 Oct 2016, 13:37

Well, I don't know if it will work for you or not, but take a look at -firthlogit-, which you can download from SSC. You have correctly identified the problem as being due to the extremely low probability of a failure. That tends to make parameter estimates very large numbers, which subvert the efforts to converge upon them. -firthlogit- uses penalized maximum likelihood estimation, which, for example, overcomes the problem of complete separation (wherein the maximum likelihood estimate is actually infinite), and it might help here.

A couple of other issues that, I think, are unrelated to your convergence issues.

1. Don't use -xi:-. In fact, unless you use some pretty esoteric, mostly out-of-date, commands that don't support factor variable notation, or some odd-ball situations in multilevel modeling, you should try to forget that you ever knew about -xi-. Factor variable notation has replaced it, and offers you the opportunity to use the -margins- command after estimation. Do read -help fvvarlist-, and see Richard Williams excellent http://www.stata-journal.com/sjpdf.h...iclenum=st0260 for an introduction to the wonderful world of -margins-.

2. The use of i.year and i.quarter as separate variables is odd. It implies that you expect year to year shocks, and that you also expect seasonal shocks in failure rates. The former seems sensible, but I'd be surprised if there is seasonality to bank failure. I'm not an economist or finance professional, but I would urge you to check with someone in your field to see if this really makes sense. If what you really intend is that there could be shocks each quarter of any year, then you need to combine the year and quarter into a single quarterly date variable, and include that variable in the regression instead of year and quarter separately. Probably the simplest way for you to get from year and quarter to a quarterly date variable would be with Nick Cox's new -numdate- command, which you can also get from SSC.

3. Finally, venturing way beyond my expertise into rank speculation, I wonder whether the use of a fixed effects model, which is inherently a within-bank model is appropriate here. Again, not really knowing anything about finance, I imagine that the variation in capital ratios between banks is a far more potent predictor of bank failure than the fluctuations over time of any single bank's capital ratio. I could, of course, be wrong about that. But if I'm not wrong, you are running a model that overlooks the bigger effect and tries to zero-in on the fine-tuning. Perhaps that's actually your intent, and perhaps I have it all wrong. But just something to think about--again, if I were you I'd consult somebody in your discipline about this substantive question.

Added: Crossed with Marcos' reply. I like his suggestions as well.
2 likes
Comment
Neil Meredith

Join Date: Jul 2015

Posts: 12
#4

05 Oct 2016, 13:42

Marcos and Clyde, thank you both for your helpful suggestions. I greatly appreciate your time and willingness to help. I am going to explore each of your suggestions and see what I find out. Clyde, I want to especially thank you for suggesting -firthlogit- as it is a command I otherwise believe I would not have come across. I'll also be reading through factor variable notation to update my knowledge and use of commands.
Comment

Announcement

Binary dependent variable with heavily skewed zeroes; convergence not obtained for logit, probit, or cloglog

Comment

Comment

Comment