logit, vce(cluster) vs xtlogit, fe: coefficients' sign and r-squared

Sofi Gomes

Join Date: Nov 2017

Posts: 10
#1

logit, vce(cluster) vs xtlogit, fe: coefficients' sign and r-squared

07 Dec 2017, 16:26

Hello,

I am trying to study the effect of gender (and other accounting and market variables) on the probability of filing for bankruptcy in the previous period and it was suggested by my advisor that I should use -logit, vce(cluster)- and -xtlogit, fe- , since I am using panel data (N=13006, T=40).

I had asked on the possibility of using -logit- instead of -xtlogit- with panel data before in this post and I was told that no, I couldn't. But I found later that day a lot of posts where it was suggested to people to use standard error clustering with -logit- instead of -xtlogit- and when I discussed that with my advisor, his opinion was that I should present my results for both.

However, I find very different (and both significant) results for both functions.
For instance, for -logit, vce(cluster id)-, I get:
gender coefficient of 0.49 with z = 2.09

=> -margins, dydx(gender) atmeans- of 0.0000459 with z=2.01

r-squared of 0.2487

And for -xtlogit, fe- I get:
gender coefficient of -1.47 with z = -1.74

r-squared of 0.5785

The most obvious problem is that the coefficients have different signs! The huge difference in the r-squareds (which I am not sure how to interpret in terms of logit) is also quite striking.

How can I justify these differences? And how can I be sure on which results should I focus on?

Thank you so much in advance!

Best,
Sofi

Last edited by Sofi Gomes; 07 Dec 2017, 16:28.
Tags: coefficients, logit, panel, R-squared, xtlogit
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

07 Dec 2017, 17:36

Well, you don't tell us what the panel variable is and how it relates to your other variables, so it is hard to give a specific response here.

But there are some general principles to bear in mind:

1. When you use -xtlogit, fe-, you get answers that are conditional on the panel effects. This has the merit of adjusting for any time (or replication) invariant attributes of the panels that might also be confounding variables. Thus, this eliminates omitted variable bias that might be attributable to those confounding variables.

2. -logit, vce(cluster panel)- adjusts the standard errors for clustering of observations within panels. But it does not do anything to mitigate missing variable bias. It is for this reason that people are usually advised to use -xtlogit, fe-.

3. There are two circumstances where one would revert to -logit, vce(cluster panel)-, however. One is when the output of -xtlogit, fe- shows that in fact there is only negligible variance at the panel level. In that case any time-invariant attributes of the panels will not be confounding variables, so there is no omitted variable bias (from these attributes) to adjust for, and the -logit- model is simpler.

3 (cont'd.) The other circumstance is more complicated. It is quite common for models that include a covariate or set of covariates to produce very different results from a model that excludes them. (An -xtlogit, fe- can be thought of as including covariates representing the panels--that's not literally true, but for present purposes it works.) The differences can be dramatic, including opposite signs. This is known as Simpson's paradox and it is a direct reflection of omitted variable "bias." However, there's a catch. Depending on the actual causal relations, sometimes it is the model with the variables omitted that represents the true causal effect and the inclusion of the variables results in bias. This occurs, for example, if the covariates in question lie on the causal path between the predictor of interest and the outcome. In this situation, it is the -logit- model that is correct, and -xtlogit- would be wrong.

So you need to sit down and think through carefully what the causal pathways are in your data. If there are time-invariant attributes of the panels that lie on the pathway between gender and your outcome, then you must use -logit-, not -xtlogit-. On the other hand if the time-invariant attributes of the panels all precede gender on the causal paths to your outomce (or are not on the causal paths at all) then you must use -xtlogit-, not -logit-.
1 like
Comment
Sofi Gomes

Join Date: Nov 2017

Posts: 10
#3

08 Dec 2017, 08:59

Dear Clyde Schechter,

I am using panel data since I have information for 13,006 companies for 40 quarters, and I want to study the impact of accounting and market variables (such as net income, leverage, price per share, stock price volatility,...) and gender of CEO (for which I have a dummy) on probability of filing for bankruptcy (and for bankruptcy filing I also have a dummy that equals 1 when there is a filing). I am using year dummies, industry and country as control variables. (Industry and country would be my time-invariant variables.)

I do not have a very advanced knowledge of Econometrics; in fact, all I learnt was related to time series, so I am obviously having quite some trouble working with panel data. But this is what I understood and my questions for each of the points you made:

1. What you are saying is that using -xtlogit, fe- will eliminate omitted variable bias since time-invariant (possibly confounding) variables will automatically be dropped from the regression? On this point, I must add that my regression using -xtlogit, fe- automatically drops 245,909 observations, using simply 3,306 obs. and analysing only 214 of my companies. (While using -logit, vce(cluste)- will use 249,215 observations.) To what extent is it feasible to use a function which drops so many observations?

2. I understand this. But then how could I then check if there is any omitted variable bias in my -logit, vce(cluster)-?

3. I can't seem to find how, in the output for -xtlogit, fe- can I see if the panel-level variance is negligible. When using -xtlogit, re- the output shows rho and the LR test, but not when using -xtlogit, fe-.

3. (cont'd.) I guess, in my case, this set of covariates would be industry and country. And I think there might be a relationship between gender and industry, and I start to think that might be the reason behind these very different results. But what is really confusing me is that these are omitted in the -xtlogit, fe- (and not in the -logit-). Also, I have no idea how to study this connection. I tried something along the lines of:

Code:

logit bnkr i.gender##industry nimta tlmta price, nolog

(using this I cannot do vce(cluster), can I?) and

Code:

xtlogit bnkr i.gender##industry nimta tlmta price, fe nolog

And for the -logit- I got the following results:

And for -xtlogit, fe-, the following:

It seems to me I should not even do this with -xtlogit, fe- since it does not take into account industry in first place.
But I believe these show me how gender impacts probability of bankuptcy filing depending on industry and that this relation impacts on other explanatory variables. Please correct me if I am wrong in assuming this.

Thank you so much for the prompt and extensive reply!

Last edited by Sofi Gomes; 08 Dec 2017, 09:03.
Comment

Announcement

logit, vce(cluster) vs xtlogit, fe: coefficients' sign and r-squared

Comment

Comment