perfect prediction mlogit

natalia malancu

Join Date: Apr 2014

Posts: 110
#1

perfect prediction mlogit

26 Jan 2015, 02:57

hi guys!

if for logit models stata automatically deals with perfect prediction, this is is not the case for mlogit. i have two questions:
a. if a covariate perfectly predicts the outcome of one equation, but not of the other(s), should it be disregarded all together
b. a more genera question: when encountering a p value aproaching 1 (say 0.96) should one consider this a perfect prediction and disregard the covariate.

any and all help is much appreciated
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

26 Jan 2015, 08:30

a) No. It's still relevant to distinguishing other options in any case.

b) Again, no. If the estimated coefficient is within reasonable bounds and the standard error is neither missing nor astronomical, you can treat it just like any other covariate.

P.S. I assume you didn't really mean p-value, but meant a predicted outcome probability approaching 1 for one value of the predictor. An actual p-value near 1 really has no connection to perfect prediction: if anything it is more like no prediction.
Comment
natalia malancu

Join Date: Apr 2014

Posts: 110
#3

26 Jan 2015, 10:58

I meant z = 0 (and p > |z| = 1). Long & Freese (REGRESSION MODELS FOR CATEGORICAL DEPENDENT VARIABLES USING STATA) note that
"mlogit handles perfect prediction somewhat differently than the estimations commands for binary
and ordinal models that we have discussed. logit and probit automatically remove the observations
that imply perfect prediction and compute estimates accordingly. ologit and oprobit keep
these observations in the model, estimate the z for the problem variable as 0, and provide an incorrect
LR chi-squared, but also warn that a given number of observations are completely determined.
You should delete these observations and re-estimate the model. mlogit is just like ologit and
oprobit except that you do not receive a warning message. You will see, however, that all coeffi-
cients associated with the variable causing the problem have z = 0 (and p > |z| = 1). You should
re-estimate the model, excluding the problem variable and deleting the observations that imply the
perfect predictions. Using the tabulate command to generate a cross-tabulation of the problem
variable and the dependent variable should reveal the combination that results in perfect prediction." hence my question.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

26 Jan 2015, 12:21

Well, except for retracting the post script, my answer is really the same.

Code:

 . sysuse auto
(1978 Automobile Data)
  . tab rep78 foreign
      Repair |
    Record |       Car type
      1978 |  Domestic    Foreign |     Total
-----------+----------------------+----------
         1 |         2          0 |         2
         2 |         8          0 |         8
         3 |        27          3 |        30
         4 |         9          9 |        18
         5 |         2          9 |        11
-----------+----------------------+----------
     Total |        48         21 |        69
  
. mlogit rep78 i.foreign
  Iteration 0:   log likelihood = -93.692061 
Iteration 1:   log likelihood = -80.470733 
Iteration 2:   log likelihood = -78.986489 
Iteration 3:   log likelihood = -78.794109 
Iteration 4:   log likelihood = -78.748501 
Iteration 5:   log likelihood = -78.738643 
Iteration 6:   log likelihood = -78.736602 
Iteration 7:   log likelihood = -78.736144 
Iteration 8:   log likelihood = -78.736032 
Iteration 9:   log likelihood = -78.736009 
Iteration 10:  log likelihood = -78.736004 
  Multinomial logistic regression                   Number of obs   =         69
                                                  LR chi2(4)      =      29.91
                                                  Prob > chi2     =     0.0000
Log likelihood = -78.736004                       Pseudo R2       =     0.1596
  ------------------------------------------------------------------------------
       rep78 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1            |
     foreign |
    Foreign  |   -13.6844   1986.817    -0.01   0.995    -3907.774    3880.405
       _cons |  -2.602702   .7328328    -3.55   0.000    -4.039028   -1.166376
-------------+----------------------------------------------------------------
2            |
     foreign |
    Foreign  |   -13.6844   993.4086    -0.01   0.989    -1960.729    1933.361
       _cons |  -1.216408   .4025404    -3.02   0.003    -2.005373    -.427443
-------------+----------------------------------------------------------------
3            |  (base outcome)
-------------+----------------------------------------------------------------
4            |
     foreign |
    Foreign  |   2.197466   .7698095     2.85   0.004     .6886674    3.706265
       _cons |  -1.098617   .3849011    -2.85   0.004    -1.853009   -.3442246
-------------+----------------------------------------------------------------
5            |
     foreign |
    Foreign  |    3.70116   .9906909     3.74   0.000     1.759442    5.642879
       _cons |  -2.602577     .73279    -3.55   0.000    -4.038819   -1.166335
------------------------------------------------------------------------------

Looking at the -tab- results, there should be a problem with outcomes 1 and 2 here. Looking at the -mlogit- results we do not get the z-value of zero and P>|z| = 1 that were promised, though they are close. But the diagnostic information here is really the coefficients and their standard errors. The standard errors are astronomical, and the coefficients are completely unrealistic: nothing on earth has odds ratios of exp(-13.6844)! That's the tip-off that something is seriously wrong. Now, had we not done the -tab- beforehand, we wouldn't necessarily know that this is due to perfect (or near-perfect) prediction. Other things can cause a serious problem: strong multicollinearity, or other issues that can cause the model to be poorly identified. So, if the first sign of trouble is in -mlogit- outputs like this, the next step is to figure out why. Omitting the offending covariate may be the solution--but other actions may be more appropriate in different situations. For example, it might make sense to combine some categories of the outcome variable. Or if collinearity is the issue, removing one of the other covariates might be best. Or excluding certain subsets of observations, etc.

But if you run an -mlogit- and find z's near 0 and P>|z|'s near 1 at the same time that the coefficients and their standard errors are reasonable, then you don't have an estimation problem. Now, you might want to get rid of that covariate anyway because it evidently isn't predicting much--but the reasonableness of that action would depend on other circumstances.

Comment

natalia malancu

Join Date: Apr 2014

Posts: 110
#5

26 Jan 2015, 13:37

once again, thank you clyde. their conclusion was a bit too drastic - glad I asked for more inside.
Comment

Announcement