Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • outlier results interpretation

    Dear Statalits,

    I hope you are well. Could you please help me on interpret the below table shows results of outlier tests.


    Is the stdres similar to Studentized residual? does the below result mean that I have outliers?


    sum stdres

    Variable Obs Mean Std. Dev. Min Max

    stdres 300 -.0055664 1.1132 -2.326778 5.3328


    Firm Number Pearson Residuals (stdres) Deviance Residuals (dv) Pregibon leverage (hat)
    92 3.9 2.3 .092
    194 4.2 2.4 .075
    53 5.1 2.6 .026
    148 5.3 2.6 .031















    Many thanks for your support
    Kind regards,
    Rabab

  • #2
    You'll increase your chances of a useful answer by following the FAQ on asking questions. The regression postestimation documentation has extensive discussions of tools for outliers.

    There have been many discussions of outliers on this list serve. Things like leverage are most easily interpreted relative to the distribution of leverage values. There are no hard and fast rules on outliers. The documentation does provide some rules of thumb.

    One way to check this is to rerun the regression and see if dropping these 4 observations changes the results. I think of outliers as a generalization issue - if 3 or 4 unusual observations are driving the results, then I would be concerned about generalizing from results driven by those few observations to the larger portion of the sample/population.

    Comment


    • #3
      Dear Phil

      Many thanks for your explanations.

      In fact, I am confused with a different opinion in the literature. Some analyst believes that categorical variables with 0, 1,2, for instance, cannot refer them as an outlier because of their scale limit between the minimum and maximum value. What do you think? do you think dataset of categorical nature variables may consider them as outliers if their values of stdres (studentized residuals) and hat (leverage residuals) are high?

      I have checked the observed outliers ( firm number 53 and 92 and 148) when I compare the model with and without outliers I found some variables their coefficient signs have changed. What do you think? Should I exclude them?

      Many thanks for your support

      Rabab

      Comment


      • #4
        The other thread you started recently https://www.statalist.org/forums/for...or2-procedures contains one answer to your question.

        In that thread you had a concrete example of a categorical variable that is 0 or 1 or 2.

        Code:
        . tab value [w=freq]
        (frequency weights assumed)
        
              value |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |         30       10.00       10.00
                  1 |         45       15.00       25.00
                  2 |        225       75.00      100.00
        ------------+-----------------------------------
              Total |        300      100.00
        So, for such a variable, the median is 2 and the IQR is 0.5, so one rule of thumb gives you that all the 0s and 1s are outliers, as 1.5 IQR below the median gets you only to 1.25. So, on this rule of thumb 35% of the dataset are outliers. Right or wrong? When a rule gives you bizarre answers, distrust the rule.

        There is more than one way to take this, but the short story is that talk of outliers does not carry over easily to categorical variables. Even if the codes are ordered, they are still arbitrary to some extent, so taking them literally is dubious.

        Now there are sometimes clear-cut cases. If the possible values are 1 2 3 4 5 and everybody but one person is 1 or 2 and that one person is 5, that does look like an outlier. But that doesn't mean that you throw it out.


        Comment


        • #5
          Originally posted by Nick Cox View Post
          The other thread you started recently https://www.statalist.org/forums/for...or2-procedures contains one answer to your question.

          In that thread you had a concrete example of a categorical variable that is 0 or 1 or 2.

          Code:
          . tab value [w=freq]
          (frequency weights assumed)
          
          value | Freq. Percent Cum.
          ------------+-----------------------------------
          0 | 30 10.00 10.00
          1 | 45 15.00 25.00
          2 | 225 75.00 100.00
          ------------+-----------------------------------
          Total | 300 100.00
          So, for such a variable, the median is 2 and the IQR is 0.5, so one rule of thumb gives you that all the 0s and 1s are outliers, as 1.5 IQR below the median gets you only to 1.25. So, on this rule of thumb 35% of the dataset are outliers. Right or wrong? When a rule gives you bizarre answers, distrust the rule.

          There is more than one way to take this, but the short story is that talk of outliers does not carry over easily to categorical variables. Even if the codes are ordered, they are still arbitrary to some extent, so taking them literally is dubious.

          Now there are sometimes clear-cut cases. If the possible values are 1 2 3 4 5 and everybody but one person is 1 or 2 and that one person is 5, that does look like an outlier. But that doesn't mean that you throw it out.


          Dear Nick

          Many thanks for this clarification.


          Kind regards,
          Rabab

          Comment

          Working...
          X