Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variable predicts success perfectly

    For my stats training purposes, I would like to run a logit regression calculating the likelihood of a country applying a certain directive (yes or no). One of my independent variables is a dichotomous variable indicating (simplified to reduce it to the stats issue) whether the respective country is extremely wealthy or not. Only seven out of 190 cases meet this criteria (ergo are coded with 1 on this variable). All seven apply the directive, thus the variable gets dropped in the logit regression: Variable predicts success perfectly.

    My question is: Can I still do anything useful with this variable? I thought about conducting Chi² and/or Pearson's r tests in a contingency table, but would the results even sensibly be interpretable?

    Thank you in advance for your assistance.

  • #2
    Hello Jonas,

    Welcome to the Stata Forum / Statalist.

    This text may interest you.
    Best regards,

    Marcos

    Comment


    • #3
      Jonas:
      thers's nothing you can do, but searching for another predictor.

      PS: crossed in the cyberspace with Marcos' reply (and related interresting reference).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thanks to both of you. I've read the article and there is a suggestion of using firthlogit to deal with complete separation. I'm not sure if this will make any sense in my model, especially since I - on a further step - will have to include fixed effects (xtlogit). So I wouldn't know how to combine firthlogit with xtlogit... Is something like this possible? Or maybe I should just drop the variable

        Comment


        • #5
          In addition, what I don't understand is why there is _nothing_ we can do with a variable like this. Suppose it wouldn't have just seven cases, but 150 in which all predicted the success perfectly. Does it not seem that the presence of extreme wealthiness has then something to do with the outcome? In another example, if from a sample of 100 persons, 60 of them would be rich and all of them would own an expensive car, how can it not tell us anything about the relationship of wealth and owning an expensive car? My admittedly naive common sense can't get a grip on this.

          Comment


          • #6
            If we take a look at the text shared in #2, particularly from "What happens when...", we realize that it should be so.

            Besides, how come a "predictive" model when using a variable that "predicts" 100% of cases? That would be a deterministic model, so to speak...

            To end, the sentences issued from SAS output are quite clear when stating the issue:

            Complete separation of data points detected. WARNING: The maximum likelihood estimate does not exist. WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable.
            All in all, the gist of the matter is: if we know for sure that the parameter estimate for such a predictor is incorrect, we shouldn't use it in our "predictions".
            Last edited by Marcos Almeida; 06 Apr 2017, 07:42.
            Best regards,

            Marcos

            Comment

            Working...
            X