Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Separation e Quasi-Complete Separation in Logistic Regression

    Hello, Statalists.

    When I try to run the following logistic regression:

    logistic eletri presmae metrop area logrenpcdef nmorad refsexo refraca ncrian medescresp difescresp
    Stata returns me the following result, which is related to the fact that my database has the problem of separation or quasi-separation:

    note: area != 0 predicts success perfectly
    area dropped and 470 obs not used

    note: refsexo != 1 predicts success perfectly
    refsexo dropped and 21 obs not used

    note: refraca != 0 predicts success perfectly
    refraca dropped and 17 obs not used

    note: metrop != 0 predicts failure perfectly
    metrop dropped and 2 obs not used

    outcome = logrenpcdef <= 6.828236 predicts data perfectly
    r(2000);
    Interpreting this result I know that the variable "logrenpcdef " being responsible for separation, but do not know what is meant by the message in relation to other variables, which are binary.


    Thanks in advance

  • #2
    note: area != 0 predicts success perfectly
    area dropped and 470 obs not used
    means that, in the estimation sample, whenever area != 0, we have eletri = 1. It differs from perfect prediction in that if area == 0, we may have eletri = 0 or 1.

    But the effect is the same: the maximum likelihood estimate for the coefficient would be (negative) infinite, so the variable must be omitted from the model.

    Comment


    • #3
      Is correct my assertion that the variable "logrenpcdef" is responsible for the separation? Are also responsible for the separation all other variables ("area", "refsexo", "refraca" and "Metrop")? What is the difference in the analysis of these two groups of variables ("logrenpcdef") e ("area", "refsexo", "refraca" and "Metrop")?
      Since no estimation was generated, which means the second line of each note?

      Comment


      • #4
        Yes, all of these other variables are also responsible for (partial) separation. There is no difference in the analysis: separation, whether complete or partial, requires omitting the variable from the model. "area dropped and 470 obs not used" means that the variable area was taken out of the model, and the 470 observations in which area != 0 were also dropped from the estimation sample.

        Comment


        • #5
          Thanks Clyde,

          But i don't understand why the Stata did not report the "drop" and the number of observations not used for variable "logrenpcdef".

          I also have the following questions:

          The FAQ that http://www.ats.ucla.edu/stat/mult_pk...git_models.htm link had made clear to me what was the complete separation and the partial separation (quasi complete separation) and he led me to believe that the logistic regression I ran was showing me there was almost complete separation. However, the paper attached reports on page 7 that:

          [...] At one extreme, some software packages (eg Stata) are aggressively proactive, automatically Omitting variables and dropping observations from the analysis quasicomplete When separation is present and simply failing to Provide any estimate at all When separation is complete. [ ...] .
          And that made me doubt if really the logistic regression that I ran was showing me there was quasi complete separation because Stata did not provide me any estimate, which would be compatible, according to the text above, with complete separation.

          Attached Files

          Comment


          • #6
            [...] At one extreme, some software packages (eg Stata) are aggressively proactive, automatically Omitting variables and dropping observations from the analysis quasicomplete When separation is present and simply failing to Provide any estimate at all When separation is complete. [ ...] .


            That's exactly what Stata did. At first it encountered quasi-complete separation due to variables like area, and each time it omitted those variables and dropped the offending observations. Then it looked at logrenpcdef and found complete separation, so it just gave you an error message and stopped (i.e. did not provide any estimates.)

            As the paper you attached suggests, you have two ways forward. You can either pursue your investigation without using the variables that Stata has omitted. Or you can keep all your variables and estimate your logistic model with penalized maximum likelihood: this estimator can converge in the presence of separation. If you run -findit penalized logistic-, you will see links to several user-written programs that do this.

            Comment


            • #7
              Thank you very much, Clyde

              Your explanation was very clear and removed all my doubts.

              Comment

              Working...
              X