Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • perfect prediction mlogit

    hi guys!

    if for logit models stata automatically deals with perfect prediction, this is is not the case for mlogit. i have two questions:
    a. if a covariate perfectly predicts the outcome of one equation, but not of the other(s), should it be disregarded all together
    b. a more genera question: when encountering a p value aproaching 1 (say 0.96) should one consider this a perfect prediction and disregard the covariate.

    any and all help is much appreciated

  • #2
    a) No. It's still relevant to distinguishing other options in any case.

    b) Again, no. If the estimated coefficient is within reasonable bounds and the standard error is neither missing nor astronomical, you can treat it just like any other covariate.

    P.S. I assume you didn't really mean p-value, but meant a predicted outcome probability approaching 1 for one value of the predictor. An actual p-value near 1 really has no connection to perfect prediction: if anything it is more like no prediction.

    Comment


    • #3
      I meant z = 0 (and p > |z| = 1). Long & Freese (REGRESSION MODELS FOR CATEGORICAL DEPENDENT VARIABLES USING STATA) note that
      "mlogit handles perfect prediction somewhat differently than the estimations commands for binary
      and ordinal models that we have discussed. logit and probit automatically remove the observations
      that imply perfect prediction and compute estimates accordingly. ologit and oprobit keep
      these observations in the model, estimate the z for the problem variable as 0, and provide an incorrect
      LR chi-squared, but also warn that a given number of observations are completely determined.
      You should delete these observations and re-estimate the model. mlogit is just like ologit and
      oprobit except that you do not receive a warning message. You will see, however, that all coeffi-
      cients associated with the variable causing the problem have z = 0 (and p > |z| = 1). You should
      re-estimate the model, excluding the problem variable and deleting the observations that imply the
      perfect predictions. Using the tabulate command to generate a cross-tabulation of the problem
      variable and the dependent variable should reveal the combination that results in perfect prediction." hence my question.

      Comment


      • #4
        Well, except for retracting the post script, my answer is really the same.
        Code:
         . sysuse auto
        (1978 Automobile Data)
          . tab rep78 foreign
              Repair |
            Record |       Car type
              1978 |  Domestic    Foreign |     Total
        -----------+----------------------+----------
                 1 |         2          0 |         2
                 2 |         8          0 |         8
                 3 |        27          3 |        30
                 4 |         9          9 |        18
                 5 |         2          9 |        11
        -----------+----------------------+----------
             Total |        48         21 |        69
          
        . mlogit rep78 i.foreign
          Iteration 0:   log likelihood = -93.692061 
        Iteration 1:   log likelihood = -80.470733 
        Iteration 2:   log likelihood = -78.986489 
        Iteration 3:   log likelihood = -78.794109 
        Iteration 4:   log likelihood = -78.748501 
        Iteration 5:   log likelihood = -78.738643 
        Iteration 6:   log likelihood = -78.736602 
        Iteration 7:   log likelihood = -78.736144 
        Iteration 8:   log likelihood = -78.736032 
        Iteration 9:   log likelihood = -78.736009 
        Iteration 10:  log likelihood = -78.736004 
          Multinomial logistic regression                   Number of obs   =         69
                                                          LR chi2(4)      =      29.91
                                                          Prob > chi2     =     0.0000
        Log likelihood = -78.736004                       Pseudo R2       =     0.1596
          ------------------------------------------------------------------------------
               rep78 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        1            |
             foreign |
            Foreign  |   -13.6844   1986.817    -0.01   0.995    -3907.774    3880.405
               _cons |  -2.602702   .7328328    -3.55   0.000    -4.039028   -1.166376
        -------------+----------------------------------------------------------------
        2            |
             foreign |
            Foreign  |   -13.6844   993.4086    -0.01   0.989    -1960.729    1933.361
               _cons |  -1.216408   .4025404    -3.02   0.003    -2.005373    -.427443
        -------------+----------------------------------------------------------------
        3            |  (base outcome)
        -------------+----------------------------------------------------------------
        4            |
             foreign |
            Foreign  |   2.197466   .7698095     2.85   0.004     .6886674    3.706265
               _cons |  -1.098617   .3849011    -2.85   0.004    -1.853009   -.3442246
        -------------+----------------------------------------------------------------
        5            |
             foreign |
            Foreign  |    3.70116   .9906909     3.74   0.000     1.759442    5.642879
               _cons |  -2.602577     .73279    -3.55   0.000    -4.038819   -1.166335
        ------------------------------------------------------------------------------
        Looking at the -tab- results, there should be a problem with outcomes 1 and 2 here. Looking at the -mlogit- results we do not get the z-value of zero and P>|z| = 1 that were promised, though they are close. But the diagnostic information here is really the coefficients and their standard errors. The standard errors are astronomical, and the coefficients are completely unrealistic: nothing on earth has odds ratios of exp(-13.6844)! That's the tip-off that something is seriously wrong. Now, had we not done the -tab- beforehand, we wouldn't necessarily know that this is due to perfect (or near-perfect) prediction. Other things can cause a serious problem: strong multicollinearity, or other issues that can cause the model to be poorly identified. So, if the first sign of trouble is in -mlogit- outputs like this, the next step is to figure out why. Omitting the offending covariate may be the solution--but other actions may be more appropriate in different situations. For example, it might make sense to combine some categories of the outcome variable. Or if collinearity is the issue, removing one of the other covariates might be best. Or excluding certain subsets of observations, etc.

        But if you run an -mlogit- and find z's near 0 and P>|z|'s near 1 at the same time that the coefficients and their standard errors are reasonable, then you don't have an estimation problem. Now, you might want to get rid of that covariate anyway because it evidently isn't predicting much--but the reasonableness of that action would depend on other circumstances.


        Comment


        • #5
          once again, thank you clyde. their conclusion was a bit too drastic - glad I asked for more inside.

          Comment

          Working...
          X