Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Residual plot after logistic regression. What is wrong?

    Dear all,
    I am running a logistic regression and want to check for influential observations. To do that, I plot the standardized residuals for each paricipant. The plot seems a little strange, since one would expect the residuals to be in one cluster around y=0. Does anyone have any suggestions to why I get a plot like the one below (in the lower cluster, none of the observations have outcome = 0, whereas in the upper cluster, there are both outcomes = 0 and =1)?

    Code:
    predict stdres, rstandard
    scatter stdres PREG_ID_1569
    Click image for larger version

Name:	image_5870.png
Views:	1
Size:	481.2 KB
ID:	1356571


    Code:
    logistic overweight c.noise##i.gender age_7y i.urbanity logincome i.education i.ethnicity i.emotional_7år dietscore_7y i.physact i.mat_smoke bmi_birth i.divorced if birthweight>2500 & zbfa_7år > -5 & zbfa_7år<5 & bmi_birth<25
    Output (Norwegian variable names):
    Code:
    Logistic regression                             Number of obs     =      1,484
                                                    LR chi2(19)       =      38.87
                                                    Prob > chi2       =     0.0046
    Log likelihood = -581.96559                     Pseudo R2         =     0.0323
     overvekt | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    
    --------------------------------------------------------------------------------------------------
                            overvekt | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------------------------+----------------------------------------------------------------
            Len_vei_7åringer_livstid |   1.005131   .0137301     0.37   0.708     .9785777    1.032405
                                     |
                               KJONN |
                               Pike  |   2.371608   2.357581     0.87   0.385     .3379681    16.64217
                                     |
    KJONN#c.Len_vei_7åringer_livstid |
                               Pike  |   .9743111   .0196758    -1.29   0.198     .9365005    1.013648
                                     |
                             age_7år |   1.000184   .0019205     0.10   0.923     .9964275    1.003956
                                     |
              sone_siste_adresse_7år |
                 Mellom Ring 2 og 3  |   1.137747   .2134702     0.69   0.492     .7876593    1.643435
                    Innenfor Ring 2  |   1.304274   .3594635     0.96   0.335     .7599305    2.238533
                                     |
                          loginntekt |   1.173329   .4466114     0.42   0.675     .5564443    2.474104
                                     |
                  mors_utdanning_num |
         Univ./høyskole inntil 4 år  |   1.414774   .2347284     2.09   0.037     1.022027    1.958448
      Inntil videregående påbygning  |   1.875568   .4195352     2.81   0.005     1.209852    2.907591
                                     |
                      landbakgr_barn |
                       Ikke-vestlig  |   1.509217   .3477814     1.79   0.074     .9607323    2.370834
                                     |
                       emotional_7år |
                 Vansker siste året  |   1.205125   .3618195     0.62   0.534     .6690715     2.17066
                        dietscore_7y |   .9987875   .0417213    -0.03   0.977     .9202731       1.084
                                     |
                          fysakt_7år |
                          3-7 t/uke  |   1.258417   .2436777     1.19   0.235     .8609925    1.839288
               Mindre enn 1-2 t/uke  |   1.102782   .2616349     0.41   0.680      .692695    1.755646
                                     |
                        røyk_mor_7år |
                                 Ja  |   1.511349    .364988     1.71   0.087     .9414586     2.42621
                           bmi_birth |   1.142935   .0669046     2.28   0.022     1.019047    1.281884
                                     |
                           skilt_7år |
                Foreldre bor sammen  |   1.202277   .2949885     0.75   0.453     .7432853    1.944706
                               _cons |   .0032108   .0179463    -1.03   0.304     5.61e-08    183.7438
    --------------------------------------------------------------------------------------------------

    Best,

    Kjell Weyde
    Last edited by Kjell Weyde; 14 Sep 2016, 08:52.

  • #2
    (Didn't see the post in the forum, so try posting a reply to make it appear there.)

    Comment


    • #3
      I'm not familiar with the standardized residuals, but let's think for a moment about residuals.

      With the actual outcome either 0 or 1, and the prediction a probability between 0 and 1, values just below zero are likely to represent observations with 0 outcome and a predicted probability only slightly above zero.

      In general I don't expect logistic regression residuals to be in a cluster around zero. I expect (perhaps) a cluster just below 0 from failures with a predicted probability just above 0, and I expect (perhaps) a cluster just above of successes with a predicted probability just below 1. And I would expect (perhaps) clusters around +/- 0.5 from successes and failures with a predicted probability of 1/2. And so forth.

      Comment


      • #4
        I believe deviance residuals are frequently used to assess the fit of logit models. Here's a brief comment on them from this UCLA web-page:

        Deviance residual is another type of residual. It measures the disagreement between the maxima of the observed and the fitted log likelihood functions. Since logistic regression uses the maximal likelihood principle, the goal in logistic regression is to minimize the sum of the deviance residuals. Therefore, this residual is parallel to the raw residual in OLS regression, where the goal is to minimize the sum of squared residuals.
        The same page shows that predict deviance generates the deviance residuals (following logit or logistic).

        HTH.
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          William and Bruce, thanks for your answers!
          I made additional plots of deviance residuals vs ID, standardized residuals vs predicted prob, and Pregibon leverage vs pred prob, as shown below. I think the upper and lower plots seem weird, at least compared to f.ex those shown at http://www.ats.ucla.edu/stat/stata/w.../statalog3.htm . Based on the plots shown, should I do something with my data, or do you think I can safely proceed with my analysis?


          Click image for larger version

Name:	dev vs id.png
Views:	1
Size:	120.3 KB
ID:	1356685
          Click image for larger version

Name:	stdres vs pred.png
Views:	1
Size:	92.2 KB
ID:	1356686
          Click image for larger version

Name:	hat vs pred.png
Views:	1
Size:	101.9 KB
ID:	1356687

          Comment

          Working...
          X