Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Probit - omitted variables

    Hi all,

    I am currently encountering difficulties concering a probit analysis.

    I am researching the factors that influence a company's decision to withdraw. So my dependent variable is withdrawn, equaling 1 if an offer is withdrawn, zero otherwise.

    Now I am trying to run a probit on roughly 20 variables.


    . tab withdrawn

    Withdrawn | Freq. Percent Cum.
    ------------+-----------------------------------
    completed | 5,644 72.81 72.81
    withdrawn | 2,108 27.19 100.00
    ------------+-----------------------------------
    Total | 7,752 100.00

    I ran the following Regression as an example with the follwoing results:
    . probit withdrawn log_filing_size ff_tech_dummy age vc_dummy numberofbookrunners

    Probit regression Number of obs = 1323
    LR chi2(5) = 299.51
    Prob > chi2 = 0.0000
    Log likelihood = -168.45198 Pseudo R2 = 0.4706

    -------------------------------------------------------------------------------------
    withdrawn | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
    log_filing_size | -.4581109 .0412589 -11.10 0.000 -.5389768 -.377245
    ff_tech_dummy | -.225391 .2643792 -0.85 0.394 -.7435647 .2927826
    age | -.0174898 .0078032 -2.24 0.025 -.0327838 -.0021958
    vc_dummy | -.9559697 .2673385 -3.58 0.000 -1.479943 -.431996
    numberofbookrunners | .3216903 .4369625 0.74 0.462 -.5347405 1.178121
    _cons | -.2143128 .4574461 -0.47 0.639 -1.110891 .682265
    -------------------------------------------------------------------------------------

    However, if I run it with one more variable included, the results for vc_dummy are omitted:
    probit withdrawn log_filing_size ff_tech_dummy age vc_dummy numberofbookrunners uw_ranking


    note: vc_dummy != 0 predicts failure perfectly
    vc_dummy dropped and 333 obs not used

    Probit regression Number of obs = 490
    LR chi2(5) = 2.17
    Prob > chi2 = 0.8252
    Log likelihood = -55.281123 Pseudo R2 = 0.0192

    -------------------------------------------------------------------------------------
    withdrawn | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
    log_filing_size | .06986 .1482238 0.47 0.637 -.2206532 .3603733
    ff_tech_dummy | -.129397 .4416797 -0.29 0.770 -.9950733 .7362794
    age | -.0087463 .0086231 -1.01 0.310 -.0256472 .0081546
    vc_dummy | 0 (omitted)
    numberofbookrunners | .295596 .3318717 0.89 0.373 -.3548605 .9460524
    uw_ranking | -.0446172 .0654074 -0.68 0.495 -.1728133 .0835789
    _cons | -2.120634 .5625254 -3.77 0.000 -3.223163 -1.018104
    -------------------------------------------------------------------------------------

    tab vc_dummy

    VC-backed | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 3,666 61.60 61.60
    1 | 2,285 38.40 100.00
    ------------+-----------------------------------
    Total | 5,951 100.00



    If I include all the variables I want to run the regression on, I get the following message and no output is created:

    note: vc_dummy != 0 predicts failure perfectly
    vc_dummy dropped and 76 obs not used

    note: numberofbookrunners != 1 predicts failure perfectly
    numberofbookrunners dropped and 1 obs not used

    note: no_industry_filings != 56 predicts failure perfectly
    no_industry_filings dropped and 42 obs not used

    outcome = age <= 1 predicts data perfectly

    I don't really understand why the vc_dummy is omitted in the 2nd regression while it is included in the first one.
    Also I don't know why I don't get results for the Regression on all the variables.

    I would really apreciate any help.


  • #2
    So, when you add a new variable to get your second regression, notice that your estimation sample size drops drastically, from 1323 in the first regression to 490. Most of that is because you have many observations with missing values for the new variable. Always remember that a regression model uses only those observations that have no missing values on any of the variables mentioned in the command. Now, if you restrict your attention to those remaining observations for which uw_ranking is not missing (and neither are any of the other variables in the regression command), you will find that whenever vc_dummy = 1, you have withdrawn = 0. This phenomenon, where one variable guarantees a specific result for the dependent variable is known as perfect prediction (referred to in the warning message Stata gave you). Either way, it is not possible to include a variable like that in the model because the maximum likelihood estimate of the corresponding coefficient will be infinite and the calculations have no hope of converging. To salvage the situation, Stata drops that variable and the observations containing the offending value. You can see for yourself what is happening by re-running your second regression and then running:

    Code:
    tab withdrawn vc_dummy if e(sample)
    As you further increase to all variables included in the model, you run into a similar problem with even more of your variables. Each time you add a new variable, you are losing more observations because of missing values. By the time you put all of your variables into the model, you are left with no usable observations at all--which is why you get no regression output.

    The underlying problem is that there are too many missing values in your data set to support the analyses you are trying to do. The only fix for this is to get better data, or to scale back your research goals and work only with fewer variables.

    Comment


    • #3
      Thanks so much for your detailed explanations, Clyde! Helped a lot!

      Comment

      Working...
      X