Probit regression including(but not limited to) firm fixed effects

Ruud Elzen

Join Date: Jun 2016

Posts: 9
#1

Probit regression including(but not limited to) firm fixed effects

16 Jun 2016, 11:48

Hello everybody,

I am doing a research and I finished recently with collecting my data and I am running my regressions. I have an issue with my regression because of the fixed effects. My supervisor did not know how to do this either so he advised to go to the internet for help. Google/YouTube was not working out, I'm hoping it will now!

My Dependend variable is a Dummy so I cannot do a normal OLS regression. I am doing a probit regression. The first problem I had was the following note:
note: X9_D != 1 predicts failure perfectly. As a result my sample decreased with 70%. On the internet I found to delete this variable from the regression, and then it was fine.
(I think I understand the principle because in my sample, whenever Y=1, then X9=1, always. So that is OK

I have a problem still though , and that is Firm fixed effects. In a normal OLS regression I can do it, but not in a probit regression. I made a dataset with dataex and will explain the variables/problems. I simplified the names etc to make it as simple as possible.

I collected my data by analysing news articles that report about a fraud in a company. I investigated three companies(Ari, Ben and Clair). Examples such as Size and Press coverage will always be the same, regardless of the particular article I am analysing. In the article though many things vary (certain words said in the article, amount of words) based on the article.

These are variables that do not vary per article, but just per company:
X1FE
X2FE
X3FE_D

The others ( X4_D X5_D X6_D X7 X8_D X9_D) vary per article.

My depended variable is Y_D and will be coded either 1 or 0 depending on the content of the article.

"probit Y_D X1FE X2FE X3FE_D X4_D X5_D X6_D X7 X8_D" is what I used for my regression, but now I do not take into account that X1 X2 X3 are firm fixed effects. Note: I deleted X9_D here (the reasons I explained early in this post).

Hopefully you can help me with a probit regression with fixed effects!

Thanks in advance!

Ruud

The data:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str5 Firm double X1FE int X2FE byte(X3FE_D X4_D X5_D X6_D) int X7 byte(X8_D X9_D Y_D L M N) "Ari" 9.985248048844232 1339 0 1 0 0 823 0 0 0 . . . "Ari" 9.985248048844232 1339 0 1 0 1 226 0 1 1 . . . "Ari" 9.985248048844232 1339 0 1 0 0 114 0 1 0 . . . "Ari" 9.985248048844232 1339 0 1 0 0 192 1 0 0 . . . "Ari" 9.985248048844232 1339 0 1 0 0 244 0 0 0 . . . "Ari" 9.985248048844232 1339 0 1 0 1 262 1 1 0 . . . "Ari" 9.985248048844232 1339 0 1 0 0 128 0 0 0 . . . "Ben" 9.226599905207358 1521 1 0 1 0 519 1 0 0 . . . "Ben" 9.226599905207358 1521 1 0 0 0 166 0 0 0 . . . "Ben" 9.226599905207358 1521 1 0 0 0 325 0 0 0 . . . "Ben" 9.226599905207358 1521 1 0 1 0 469 1 1 0 . . . "Ben" 9.226599905207358 1521 1 0 0 0 525 0 0 0 . . . "Ben" 9.226599905207358 1521 1 0 1 0 222 1 1 0 . . . "Ben" 9.226599905207358 1521 1 0 0 0 708 1 0 0 . . . "Ben" 9.226599905207358 1521 1 0 0 0 770 1 0 0 . . . "Ben" 9.226599905207358 1521 1 0 0 0 1421 0 0 0 . . . "Ben" 9.226599905207358 1521 1 0 0 0 435 0 1 0 . . . "Ben" 9.226599905207358 1521 1 0 0 0 370 1 0 0 . . . "Ben" 9.226599905207358 1521 1 0 1 0 800 0 0 0 . . . "Clair" 7.879730947834448 24 1 1 0 0 97 0 1 0 . . . "Clair" 7.879730947834448 24 1 0 0 0 272 1 0 0 . . . "Clair" 7.879730947834448 24 1 0 0 1 227 0 0 0 . . . "Clair" 7.879730947834448 24 1 0 0 1 281 0 1 0 . . . "Clair" 7.879730947834448 24 1 1 0 0 171 0 1 0 . . . end

Last edited by Ruud Elzen; 16 Jun 2016, 12:05. Reason: Deleted of bunch of the data so you do not have to scroll/copy so much here;)
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#2

16 Jun 2016, 12:56

Dear Ruud,

With respect to the first problem; dropping X9 is not a good option; I would leave it in and let Stata deal with it.

On the fixed effects, probit with fixed effects is not consistent. So, I would suggest that you switch to a logit and use -xtlogit- with FE.

Best regards,

Joao
Comment
Ruud Elzen

Join Date: Jun 2016

Posts: 9
#3

16 Jun 2016, 14:35

Dear Joao,

Thank you for your response!
I switched to xtlogit.

First I turned my data into panel data with the command:
xtset X1FE
and then do the same for all variables.

Next command:
xtlogit Y_D X1FE X2FE X3_D X4_D X5_D X6_D X7 X8_D X9_D
ml
ml display

Then i see the following:

Fitting comparison model:

Iteration 0: log likelihood = -110.35618
Iteration 1: log likelihood = -76.047226
Iteration 2: log likelihood = -67.157988
Iteration 3: log likelihood = -66.125677
Iteration 4: log likelihood = -65.921194
Iteration 5: log likelihood = -65.872987
Iteration 6: log likelihood = -65.862886
Iteration 7: log likelihood = -65.861288
Iteration 8: log likelihood = -65.861104
Iteration 9: log likelihood = -65.861068
Iteration 10: log likelihood = -65.86106 (not concave)
Iteration 11: log likelihood = -65.86106 (not concave)
.
.(this continues to number 1000 eventually)
.
Iteration 999: log likelihood = -52.773798 (not concave)
Iteration 1000: log likelihood = -52.773798 (not concave)
convergence not achieved

Fitting full model:

tau = 0.0 log likelihood = -52.773798
tau = 0.1 log likelihood = -35.588533
tau = 0.2 log likelihood = -27.290324
tau = 0.3 log likelihood = -21.911782
tau = 0.4 log likelihood = -17.930294
tau = 0.5 log likelihood = -14.740434
tau = 0.6 log likelihood = -12.034424
tau = 0.7 log likelihood = -9.599892
tau = 0.8 log likelihood = -7.3295819

Iteration 0: log likelihood = -9.6020362
Iteration 1: log likelihood = -1.5379352 (not concave)
Iteration 2: log likelihood = -1.4171534 (not concave)
cannot compute an improvement -- discontinuous region encountered
r(430);

.
. ml
no ml model defined

.
. ml display
last estimates not found
r(301);

Could you please tell me how to fix this Joao? ( If I delete the variable X9_D then the regression does work and I see the results... so it has something to do with this variable I suspect)

Kind regards,

Ruud

Last edited by Ruud Elzen; 16 Jun 2016, 14:39.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#4

16 Jun 2016, 14:52

Dear Ruud,

That suggests that -xtlogit- is not able to deal with perfect predictors, which I find surprising.

The solution for this depends on how interesting is X9_D. If from a theory point of view this is not an important variable, you might as well leave it out. If you want to keep it in, I guess that one possible solution for this is to first estimate the model using plain -logit- and then estimating with -xtlogit- using the estimation sub-sample from the -logit- (saved in e(sample)). I am not sure this will work, but it is worth a try.

All the best,

Joao
Comment
Ruud Elzen

Join Date: Jun 2016

Posts: 9
#5

16 Jun 2016, 15:33

Dear Joao,

Thanks for getting back to me! The variable is not a necessity, but it would be nice to have another variable that is a significant predicter for my Dependend variable (currently, I only have one)

I run the regression with -logit- which worked when it reached Iteration 16000. The output is normal (R2 is just 0.25 instead of 0.40 but okay)
except for the variable X9_D. its coefficent is 0 and the std error says 'omitted'. The the other values of X9_D from the regression show only '.'
Is that a problem?
Next, I did the following command

estimates save "logic", replace

then stata said:

(note: file logic.ster not found)
file logic.ster saved

then I did this command:
estimates use "logic"

and then this command:
xtlogit Y_D X1FE X2FE X3_D X4_D X5_D X6_D X7 X8_D X9_D
ml
ml display

then stata said:

note: X9_D != 1 predicts failure perfectly
X9_D dropped and 319 obs not used

Fitting comparison model:

Iteration 0: log likelihood = -71.003427
Iteration 1: log likelihood = -55.384707
Iteration 2: log likelihood = -53.220854
Iteration 3: log likelihood = -52.869259
Iteration 4: log likelihood = -52.795908
Iteration 5: log likelihood = -52.778511
Iteration 6: log likelihood = -52.7748
Iteration 7: log likelihood = -52.774024
Iteration 8: log likelihood = -52.77385
Iteration 9: log likelihood = -52.773808
Iteration 10: log likelihood = -52.773799 (not concave)
Iteration 11: log likelihood = -52.773798 (not concave)
.
.
.
Iteration 999: log likelihood = -52.773798 (not concave)
Iteration 1000: log likelihood = -52.773798 (not concave)
convergence not achieved

Fitting full model:

tau = 0.0 log likelihood = -52.773798
tau = 0.1 log likelihood = -35.588533
tau = 0.2 log likelihood = -27.290324
tau = 0.3 log likelihood = -21.911782
tau = 0.4 log likelihood = -17.930294
tau = 0.5 log likelihood = -14.740434
tau = 0.6 log likelihood = -12.034424
tau = 0.7 log likelihood = -9.599892
tau = 0.8 log likelihood = -7.3295819

Iteration 0: log likelihood = -9.6020362
Iteration 1: log likelihood = -1.5379352 (not concave)
Iteration 2: log likelihood = -1.4171534 (not concave)
cannot compute an improvement -- discontinuous region encountered
r(430);

.
. ml
no ml model defined

.
. ml display
last estimates not found
r(301);

Did I take the right steps? I hope to hear back from you again Thanks again, I really appreciate it!

Ruud
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#6

16 Jun 2016, 15:58

Dear Ruud,

If the variable is not very important, you might as well drop it because you won't be able to estimate a meaningful coefficient associated with it.

What you did is not exactly what I had in mind. My idea was something like this:

Code:

logit y x1 x2.... xtlogit y x1 x2... if e(sample)==1, fe

So, to differences: a) you need to include the -fe- option; b) use -if- to use just the subsample selected by logit.

All the best,

Joao
Comment
Ruud Elzen

Join Date: Jun 2016

Posts: 9
#7

16 Jun 2016, 17:22

Dear Joao,

Sorry for my late response

Thanks for getting back again. You were right, I it was not meaningfull... I did more research and I discovered that I did not do the ' fe' in the end. Then when I found out I saw it here haha, well either way it worked now! When I do:

xtlogit Y_D X1FE X2FE X3_D X4_D X5_D X6_D X7 X8_D X9_D, fe
ml
ml display

then it works! X9_D had a p-value of 0.95 so not significant and therefore not meaningful at all indeed. Thank you very much for the help. It is much appreciated!

Have a good day/night,

Ruud
Comment
Roger Clements

Join Date: Jun 2017

Posts: 40
#8

27 Nov 2019, 12:27

Hi,

I am trying to run an xtlogit with firm fixed effect using a matched pair sample and panel data. The outcome variable is whether a firm went bankrupt or not (0 = non bankrupt control firms; 1 = for bankrupt firms in year of bankruptcy and all following years). My trouble is that when I include firm fixed effects, Stata drops them, I assume because they are perfectly colinear with the outcome variable for the matched control firms (which always equals zero).

Is there anyway to include firm fixed effects using a matched sample in a logistic regression model where the dependent variable does not vary at all for one group?

Thank you in advance.

RC
Comment
ruby swinny

Join Date: Dec 2019

Posts: 1
#9

03 Dec 2019, 08:25

Originally posted by Roger Clements View Post

Hi,

I am trying to run an xtlogit with firm fixed effect using a matched pair sample and panel data. The outcome variable is whether a firm went bankrupt or not (0 = non bankrupt control firms; 1 = for bankrupt firms in year of bankruptcy and all following years). My trouble is that when I include firm fixed effects, Stata drops them, I assume because they are perfectly colinear with the outcome variable for the matched control firms (which always equals zero).

Is there anyway to include firm fixed effects using a matched sample in a logistic regression model where the dependent variable does not vary at all for one group?

Thank you in advance.

RC

Hello there , I was wondering if you found a solution to your problem, I have the same issue, used logit with firm and year fixed effect and it just kept dropping the firm varaibles,while it worked with areg , it just doesnt with logit ( my indeptendant variable is bianary)

xi: logit Y X1 X2 i.gvkey2 i.year , robust

Please help
Comment
Roger Clements

Join Date: Jun 2017

Posts: 40
#10

06 Dec 2019, 09:53

I did not Ruby, so sorry I can't be much help. Still waiting for some help myself. Anyone?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#11

08 Dec 2019, 21:30

I'm not sure what you mean by a "matched pairs" analysis. You can't match on the outcome variable -- at least not in general. And, yes, any firm whose response doesn't change over time does not contribute to the logit FE (conditional MLE) estimates. It is NOT true that these are kept in a linear estimation using areg or xtreg, fe. Stata doesn't tell you these firms are being dropped, but they are.

What is the explanatory variable that you're interested in? This seems more like a duration data with grouped duration data: essentially a model of how long it takes before a firm goes bankrupt. Firms that do not go bankrupt play a role in duration analysis, but that is because you make functional form assumptions.
Comment
Roger Clements

Join Date: Jun 2017

Posts: 40
#12

31 Jan 2020, 04:04

Sorry for the delay Jeff. I wanted to make more progress in the data before responding to you. The outcome variable is the likelihood a firm goes bankrupt based on whether or not the firm was targeted by consumer boycotts. I see now how the sample size drops a lot when using FE so that it helpful. So is it true that you can never include a control group AND include firm FE when the treatment variable (e.g., boycott) equals 0 and is constant throughout the panel (i.e., does not change from zero)?

Using another dependent variable (firm sales), I have two types of boycotts -- boycotts by consumers and boycotts by employees. How would I test which type of boycott has a stronger effect on sales? I assume I generate one treatment variable (consumer_boycott) using just consumer boycotts in one model and one treatment variable (employee_boycott) in another model and then test whether the coefficients on the treatment variables are significantly different between the two models? E.g., using suest. Does that make sense?
Comment

Announcement

Probit regression including(but not limited to) firm fixed effects

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment