Probit - omitted variables

Li Mueller

Join Date: Feb 2016

Posts: 16
#1

Probit - omitted variables

07 Mar 2016, 11:19

Hi all,

I am currently encountering difficulties concering a probit analysis.

I am researching the factors that influence a company's decision to withdraw. So my dependent variable is withdrawn, equaling 1 if an offer is withdrawn, zero otherwise.

Now I am trying to run a probit on roughly 20 variables.

. tab withdrawn

Withdrawn | Freq. Percent Cum.
------------+-----------------------------------
completed | 5,644 72.81 72.81
withdrawn | 2,108 27.19 100.00
------------+-----------------------------------
Total | 7,752 100.00

I ran the following Regression as an example with the follwoing results:
. probit withdrawn log_filing_size ff_tech_dummy age vc_dummy numberofbookrunners

Probit regression Number of obs = 1323
LR chi2(5) = 299.51
Prob > chi2 = 0.0000
Log likelihood = -168.45198 Pseudo R2 = 0.4706

-------------------------------------------------------------------------------------
withdrawn | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
log_filing_size | -.4581109 .0412589 -11.10 0.000 -.5389768 -.377245
ff_tech_dummy | -.225391 .2643792 -0.85 0.394 -.7435647 .2927826
age | -.0174898 .0078032 -2.24 0.025 -.0327838 -.0021958
vc_dummy | -.9559697 .2673385 -3.58 0.000 -1.479943 -.431996
numberofbookrunners | .3216903 .4369625 0.74 0.462 -.5347405 1.178121
_cons | -.2143128 .4574461 -0.47 0.639 -1.110891 .682265
-------------------------------------------------------------------------------------

However, if I run it with one more variable included, the results for vc_dummy are omitted:
probit withdrawn log_filing_size ff_tech_dummy age vc_dummy numberofbookrunners uw_ranking

note: vc_dummy != 0 predicts failure perfectly
vc_dummy dropped and 333 obs not used

Probit regression Number of obs = 490
LR chi2(5) = 2.17
Prob > chi2 = 0.8252
Log likelihood = -55.281123 Pseudo R2 = 0.0192

-------------------------------------------------------------------------------------
withdrawn | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
log_filing_size | .06986 .1482238 0.47 0.637 -.2206532 .3603733
ff_tech_dummy | -.129397 .4416797 -0.29 0.770 -.9950733 .7362794
age | -.0087463 .0086231 -1.01 0.310 -.0256472 .0081546
vc_dummy | 0 (omitted)
numberofbookrunners | .295596 .3318717 0.89 0.373 -.3548605 .9460524
uw_ranking | -.0446172 .0654074 -0.68 0.495 -.1728133 .0835789
_cons | -2.120634 .5625254 -3.77 0.000 -3.223163 -1.018104
-------------------------------------------------------------------------------------

tab vc_dummy

VC-backed | Freq. Percent Cum.
------------+-----------------------------------
0 | 3,666 61.60 61.60
1 | 2,285 38.40 100.00
------------+-----------------------------------
Total | 5,951 100.00

If I include all the variables I want to run the regression on, I get the following message and no output is created:

note: vc_dummy != 0 predicts failure perfectly
vc_dummy dropped and 76 obs not used

note: numberofbookrunners != 1 predicts failure perfectly
numberofbookrunners dropped and 1 obs not used

note: no_industry_filings != 56 predicts failure perfectly
no_industry_filings dropped and 42 obs not used

outcome = age <= 1 predicts data perfectly

I don't really understand why the vc_dummy is omitted in the 2nd regression while it is included in the first one.
Also I don't know why I don't get results for the Regression on all the variables.

I would really apreciate any help.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

07 Mar 2016, 11:42

So, when you add a new variable to get your second regression, notice that your estimation sample size drops drastically, from 1323 in the first regression to 490. Most of that is because you have many observations with missing values for the new variable. Always remember that a regression model uses only those observations that have no missing values on any of the variables mentioned in the command. Now, if you restrict your attention to those remaining observations for which uw_ranking is not missing (and neither are any of the other variables in the regression command), you will find that whenever vc_dummy = 1, you have withdrawn = 0. This phenomenon, where one variable guarantees a specific result for the dependent variable is known as perfect prediction (referred to in the warning message Stata gave you). Either way, it is not possible to include a variable like that in the model because the maximum likelihood estimate of the corresponding coefficient will be infinite and the calculations have no hope of converging. To salvage the situation, Stata drops that variable and the observations containing the offending value. You can see for yourself what is happening by re-running your second regression and then running:

Code:

tab withdrawn vc_dummy if e(sample)

As you further increase to all variables included in the model, you run into a similar problem with even more of your variables. Each time you add a new variable, you are losing more observations because of missing values. By the time you put all of your variables into the model, you are left with no usable observations at all--which is why you get no regression output.

The underlying problem is that there are too many missing values in your data set to support the analyses you are trying to do. The only fix for this is to get better data, or to scale back your research goals and work only with fewer variables.
1 like
Comment
Li Mueller

Join Date: Feb 2016

Posts: 16
#3

07 Mar 2016, 11:57

Thanks so much for your detailed explanations, Clyde! Helped a lot!
Comment

Announcement

Probit - omitted variables

Comment

Comment