logit and multicollinearity

salam diam

Join Date: Nov 2014

Posts: 10
#1

logit and multicollinearity

16 May 2016, 16:47

Dear ALL
I'am working on data that looks like this format:
y x

1 50

1 5

1 10

1 15

1 20

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

1 12

1 15

1 45

1 78

1 12

1 13

1 11

1 7

1 4

Now my problem is that i'am running a Logit model and my dependent variables is Y. after running the model the results says i have multicolineraity coming from the section i highlighted in the data. So I was wondering how anyone can help fix the issues. Because my understanding is that i just cannot drop the variables or change the information.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

16 May 2016, 18:27

Really? If this is really true, pllease show the actual data (using -dataex-) and the complete output you got from Stata (copied from the Results window or your log file and pasted directly into a code block on this forum without any editing). When I run -logit y x- on your data, it gives me an error message, but it has nothing to do with multicolinearity. And, indeed, there is no multicollinearity in your data: there can't be because you have only one predictor variable, and it is not constant. Your problem is complete separation (also called perfect prediction): when x = 1 y = 0, when x > 1, y = 1.

-logit- estimates logistic regression models by maximum likelihood. When there is complete separation, as here, the maximum likelihood estimate of the regression coefficient is infinite (or negative infnite). Stata is able to recognize this situation and stop with an error message before wasting your time on an estimation that will never converge. So, if this is the way your data is, you need to use a command that estimates the logistic regression model without using maximum likelihood. The -exlogistic- command can do this, and it does converge in your sample data. It uses "exact" estimation (in the same sense that the Fisher exact test is "exact") and is suitable for small data sets. Another approach is Joseph Coveney's -firthlogit- (available from SSC): it uses penalized maximum likelihood estimation and usually converges in the presence of complete separation. I do not have this command installed myself, so I have not tested it on your sample data.
1 like
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#3

16 May 2016, 21:21

A further alternative is the user-written penlogit, available from The Stata Journal website.

Code:

search penlogit
Comment
salam diam

Join Date: Nov 2014

Posts: 10
#4

17 May 2016, 14:23

Dear Clyde and Joseph. As we are trying your suggestions we are very grateful to have you here and your recommendations are very helpfu. I will be soon posting the results for each one of the propositions. Thank you very much
Comment

salam diam

Join Date: Nov 2014
Posts: 10

17 May 2016, 16:19

ID	yes_no	wtp	riskrespo	age	educat	revenue
11	0	1	6	5		5
12	1	3	6	6	4	3
13	0	1		5	4	1
14	0	1	6	5		1
15	1	3	6	4	3	3
16	0	1	5	5	2	3
17	1	2	6	6	3	6
18	0	1	6	6	2	4
19	1	2		6	4	2
21	1	2		5	3	1
22	1	3		6	2	3
23	0	1	6	6	4	1
24	0	1	2	5	2	2
25	0	1	6	5	3	3
26	0	1	6	6	2	2
27	1	2	2	4	4	5
28	1	9	4	6	5	4
29	1	3	6	6	5	6
30	0	1	2	6	5	6
31	0	1	2	6		1
32	1	2	6	6	4	1
33	0	1		5	2	1
34	0	1		6	4	1
35	0	1	2	6	5	4
36	0	1	1	6	1	5
37	1	8	6	6	2	1
38	0	1	5	6	4	1
39	1	4	3	6	4	1
40	0	1	1	5	4	7
41	0	1	6	5	2	1
42	0	1	2	5	4
43	1	6	6	6	4	2
44	1	3	6	6	4	5
45	1	2	4	5	2	5
46	1	3	2	5	4
47	0	1	4	6	5	3
48	0	1	2	6	4	6
49	0	1	1	6	4	1
50	1	2	4	6	3	1
51	0	1	6	6	2	5
52	0	1	6	6	4	2
53	1	2	1	5	3	1
54	1	2	4	5	2	4
55	0	1	6	3	3	4
56	0	1		6	2	1
57	0	1		6	1	1
58	1	9	6	6	4	3
59	0	1	6	6	4	5
60	0	1	6	5	3	5
61	1	2		6	5	3
62	0	1	1	6	5	2
63	0	1	2	5		2
64	1	2	6	5	3	2
65	1	2	6	6	3	1
66	1	2	1	6	5	3
67	0	1	1	6	4	1
68	0	1	6	6	2	1
69	0	1	6			4
70	0	1		5	2	1

Comment

salam diam

Join Date: Nov 2014

Posts: 10
#6

17 May 2016, 16:19

Age, education and revenue are continuous, wtp is grouped in different category (1 for paying one or less, group is paying 10, 3 for 20 and so on). Output obtained is:
Comment
salam diam

Join Date: Nov 2014

Posts: 10
#7

17 May 2016, 16:29

Dear Joseph and Clyde. This is my data from a survey we conducted earlier this year.
Comment
salam diam

Join Date: Nov 2014

Posts: 10
#8

17 May 2016, 16:37

ogit wtpyn wtp educat owned_land rented_land riskrespo

note: wtp != 1 predicts success perfectly
wtp dropped and 28 obs not used
note: rented_land != 0 predicts failure perfectly
rented_land dropped and 24 obs not used
note: educat != 2 predicts failure perfectly
educat dropped and 7 obs not used
outcome = owned_land > 8 predicts data perfectly

logit wtpyn educat owned_land rented_land riskrespo

wtpyn Coef. Std. Err. Z P>|z| [95% Conf. Interval]
educat .20514 .29325 0.70 0.484 -.36963 .77992
owned_land .00117 .00086 1.37 0.172 -.00051 .00286
rented_land -.000087 .00024 -0.36 0.722 -.00056 .00039
riskrespo .156524 .15508 1.01 0.313 -.1474 .46049
_cons -1.72439 1.3865 -1.24 0.214 -4.4420 .99324
Comment
salam diam

Join Date: Nov 2014

Posts: 10
#9

17 May 2016, 16:38

Clyde and Joseph this is what i got after runing the model from stata output
Comment

Announcement

logit and multicollinearity

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment