Separation e Quasi-Complete Separation in Logistic Regression

Girlan Oliveira

Join Date: Feb 2016

Posts: 99
#1

Separation e Quasi-Complete Separation in Logistic Regression

14 Jun 2016, 12:24

Hello, Statalists.

When I try to run the following logistic regression:

logistic eletri presmae metrop area logrenpcdef nmorad refsexo refraca ncrian medescresp difescresp

Stata returns me the following result, which is related to the fact that my database has the problem of separation or quasi-separation:

note: area != 0 predicts success perfectly
area dropped and 470 obs not used

note: refsexo != 1 predicts success perfectly
refsexo dropped and 21 obs not used

note: refraca != 0 predicts success perfectly
refraca dropped and 17 obs not used

note: metrop != 0 predicts failure perfectly
metrop dropped and 2 obs not used

outcome = logrenpcdef <= 6.828236 predicts data perfectly
r(2000);

Interpreting this result I know that the variable "logrenpcdef " being responsible for separation, but do not know what is meant by the message in relation to other variables, which are binary.

Thanks in advance
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

14 Jun 2016, 12:53

note: area != 0 predicts success perfectly
area dropped and 470 obs not used

means that, in the estimation sample, whenever area != 0, we have eletri = 1. It differs from perfect prediction in that if area == 0, we may have eletri = 0 or 1.

But the effect is the same: the maximum likelihood estimate for the coefficient would be (negative) infinite, so the variable must be omitted from the model.
Comment
Girlan Oliveira

Join Date: Feb 2016

Posts: 99
#3

14 Jun 2016, 13:21

Is correct my assertion that the variable "logrenpcdef" is responsible for the separation? Are also responsible for the separation all other variables ("area", "refsexo", "refraca" and "Metrop")? What is the difference in the analysis of these two groups of variables ("logrenpcdef") e ("area", "refsexo", "refraca" and "Metrop")?
Since no estimation was generated, which means the second line of each note?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

14 Jun 2016, 13:56

Yes, all of these other variables are also responsible for (partial) separation. There is no difference in the analysis: separation, whether complete or partial, requires omitting the variable from the model. "area dropped and 470 obs not used" means that the variable area was taken out of the model, and the 470 observations in which area != 0 were also dropped from the estimation sample.
Comment
Girlan Oliveira

Join Date: Feb 2016

Posts: 99
#5

14 Jun 2016, 21:31

Thanks Clyde,

But i don't understand why the Stata did not report the "drop" and the number of observations not used for variable "logrenpcdef".

I also have the following questions:

The FAQ that http://www.ats.ucla.edu/stat/mult_pk...git_models.htm link had made clear to me what was the complete separation and the partial separation (quasi complete separation) and he led me to believe that the logistic regression I ran was showing me there was almost complete separation. However, the paper attached reports on page 7 that:

[...] At one extreme, some software packages (eg Stata) are aggressively proactive, automatically Omitting variables and dropping observations from the analysis quasicomplete When separation is present and simply failing to Provide any estimate at all When separation is complete. [ ...] .

And that made me doubt if really the logistic regression that I ran was showing me there was quasi complete separation because Stata did not provide me any estimate, which would be compatible, according to the text above, with complete separation.

Attached Files

A Solution to Separation in Binary Response Models.pdf (227.0 KB, 1 view)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

15 Jun 2016, 07:39

[...] At one extreme, some software packages (eg Stata) are aggressively proactive, automatically Omitting variables and dropping observations from the analysis quasicomplete When separation is present and simply failing to Provide any estimate at all When separation is complete. [ ...] .

That's exactly what Stata did. At first it encountered quasi-complete separation due to variables like area, and each time it omitted those variables and dropped the offending observations. Then it looked at logrenpcdef and found complete separation, so it just gave you an error message and stopped (i.e. did not provide any estimates.)

As the paper you attached suggests, you have two ways forward. You can either pursue your investigation without using the variables that Stata has omitted. Or you can keep all your variables and estimate your logistic model with penalized maximum likelihood: this estimator can converge in the presence of separation. If you run -findit penalized logistic-, you will see links to several user-written programs that do this.
Comment
Girlan Oliveira

Join Date: Feb 2016

Posts: 99
#7

15 Jun 2016, 14:58

Thank you very much, Clyde

Your explanation was very clear and removed all my doubts.
Comment

Announcement

Separation e Quasi-Complete Separation in Logistic Regression

Comment

Comment

Comment

Comment

Comment

Comment