Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • logit and multicollinearity

    Dear ALL
    I'am working on data that looks like this format:
    y x
    1 50
    1 5
    1 10
    1 15
    1 20
    0 1
    0 1
    0 1
    0 1
    0 1
    0 1
    0 1
    0 1
    1 12
    1 15
    1 45
    1 78
    1 12
    1 13
    1 11
    1 7
    1 4
    Now my problem is that i'am running a Logit model and my dependent variables is Y. after running the model the results says i have multicolineraity coming from the section i highlighted in the data. So I was wondering how anyone can help fix the issues. Because my understanding is that i just cannot drop the variables or change the information.

  • #2
    Really? If this is really true, pllease show the actual data (using -dataex-) and the complete output you got from Stata (copied from the Results window or your log file and pasted directly into a code block on this forum without any editing). When I run -logit y x- on your data, it gives me an error message, but it has nothing to do with multicolinearity. And, indeed, there is no multicollinearity in your data: there can't be because you have only one predictor variable, and it is not constant. Your problem is complete separation (also called perfect prediction): when x = 1 y = 0, when x > 1, y = 1.

    -logit- estimates logistic regression models by maximum likelihood. When there is complete separation, as here, the maximum likelihood estimate of the regression coefficient is infinite (or negative infnite). Stata is able to recognize this situation and stop with an error message before wasting your time on an estimation that will never converge. So, if this is the way your data is, you need to use a command that estimates the logistic regression model without using maximum likelihood. The -exlogistic- command can do this, and it does converge in your sample data. It uses "exact" estimation (in the same sense that the Fisher exact test is "exact") and is suitable for small data sets. Another approach is Joseph Coveney's -firthlogit- (available from SSC): it uses penalized maximum likelihood estimation and usually converges in the presence of complete separation. I do not have this command installed myself, so I have not tested it on your sample data.

    Comment


    • #3
      A further alternative is the user-written penlogit, available from The Stata Journal website.
      Code:
      search penlogit

      Comment


      • #4
        Dear Clyde and Joseph. As we are trying your suggestions we are very grateful to have you here and your recommendations are very helpfu. I will be soon posting the results for each one of the propositions. Thank you very much

        Comment


        • #5
          ID yes_no wtp riskrespo age educat revenue
          11 0 1 6 5 5
          12 1 3 6 6 4 3
          13 0 1 5 4 1
          14 0 1 6 5 1
          15 1 3 6 4 3 3
          16 0 1 5 5 2 3
          17 1 2 6 6 3 6
          18 0 1 6 6 2 4
          19 1 2 6 4 2
          21 1 2 5 3 1
          22 1 3 6 2 3
          23 0 1 6 6 4 1
          24 0 1 2 5 2 2
          25 0 1 6 5 3 3
          26 0 1 6 6 2 2
          27 1 2 2 4 4 5
          28 1 9 4 6 5 4
          29 1 3 6 6 5 6
          30 0 1 2 6 5 6
          31 0 1 2 6 1
          32 1 2 6 6 4 1
          33 0 1 5 2 1
          34 0 1 6 4 1
          35 0 1 2 6 5 4
          36 0 1 1 6 1 5
          37 1 8 6 6 2 1
          38 0 1 5 6 4 1
          39 1 4 3 6 4 1
          40 0 1 1 5 4 7
          41 0 1 6 5 2 1
          42 0 1 2 5 4
          43 1 6 6 6 4 2
          44 1 3 6 6 4 5
          45 1 2 4 5 2 5
          46 1 3 2 5 4
          47 0 1 4 6 5 3
          48 0 1 2 6 4 6
          49 0 1 1 6 4 1
          50 1 2 4 6 3 1
          51 0 1 6 6 2 5
          52 0 1 6 6 4 2
          53 1 2 1 5 3 1
          54 1 2 4 5 2 4
          55 0 1 6 3 3 4
          56 0 1 6 2 1
          57 0 1 6 1 1
          58 1 9 6 6 4 3
          59 0 1 6 6 4 5
          60 0 1 6 5 3 5
          61 1 2 6 5 3
          62 0 1 1 6 5 2
          63 0 1 2 5 2
          64 1 2 6 5 3 2
          65 1 2 6 6 3 1
          66 1 2 1 6 5 3
          67 0 1 1 6 4 1
          68 0 1 6 6 2 1
          69 0 1 6 4
          70 0 1 5 2 1

          Comment


          • #6
            Age, education and revenue are continuous, wtp is grouped in different category (1 for paying one or less, group is paying 10, 3 for 20 and so on). Output obtained is:

            Comment


            • #7
              Dear Joseph and Clyde. This is my data from a survey we conducted earlier this year.

              Comment


              • #8
                ogit wtpyn wtp educat owned_land rented_land riskrespo

                note: wtp != 1 predicts success perfectly
                wtp dropped and 28 obs not used
                note: rented_land != 0 predicts failure perfectly
                rented_land dropped and 24 obs not used
                note: educat != 2 predicts failure perfectly
                educat dropped and 7 obs not used
                outcome = owned_land > 8 predicts data perfectly


                logit wtpyn educat owned_land rented_land riskrespo

                wtpyn Coef. Std. Err. Z P>|z| [95% Conf. Interval]
                educat .20514 .29325 0.70 0.484 -.36963 .77992
                owned_land .00117 .00086 1.37 0.172 -.00051 .00286
                rented_land -.000087 .00024 -0.36 0.722 -.00056 .00039
                riskrespo .156524 .15508 1.01 0.313 -.1474 .46049
                _cons -1.72439 1.3865 -1.24 0.214 -4.4420 .99324

                Comment


                • #9
                  Clyde and Joseph this is what i got after runing the model from stata output

                  Comment

                  Working...
                  X