Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identification and delation of outliers for multiple variables

    Hello,

    I want to try and eliminate outliers on all the variables, let's name them var1 var2 var3 ... To do this I have done some research and came across the standard deviation method, deleting values with more than 3 standard deviations from the mean. Is there any way in which I can do this for all the variables at once?

    Thank you in advance!
    Last edited by Koen Appeltans; 15 Apr 2024, 02:21. Reason: Outliers, Sd method

  • #2
    The definition of an outlier provided is not standard, although I won't focus on that issue here. Here's how to delete values of a variable that are more than 3 standard deviations from the mean:

    Code:
    foreach var of varlist var1 - var_N{
        qui sum `var'
        replace `var'=.a if `var' > r(mean)+ 3*r(sd)
    }

    Comment


    • #3
      There are many objections to this procedure. Here are some:

      0. It doesn't have a stated rationale.

      1. By considering each variable separately you are discarding all information carried by relationships between variables.

      2. It makes an assumption that mean and SD are always good summaries which is intensely problematic if there are really are outliers in the data.

      3. It makes no use of the subject-matter knowledge that a researcher should have about their data and their generating process.

      4. It is not obviously superior to other methods of dealing with supposedly problematic data, e.g. working with an appropriate link function.


      In detail, note that Andrew Musau's code does not claim to, but won't, catch values more than 3 SD below their mean, which is perhaps uncommon but not utterly impossible.

      Comment


      • #4
        Nick Cox Andrew Thank You for answering. Nick Cox what would You suggest using instead? I'm relatively new to stata so all help is more than welcome. Thank You in advance!

        Comment


        • #5
          No precise advice is possible without knowing more about your data and your goals. I would omit observations if any value is an outlier in the sense that it is utterly impossible.

          Comment

          Working...
          X