Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Carlo Lazzaro
    replied
    Litosbrito (please, as per FAQ, re-register with your ful name and surname. Just click on the Contact us button at the bottom-right corner of the screen):
    Provided that I second all the previous sound advices, an option for detecting outliers is to loop over a variable list, as in the following toy-example:
    Code:
    set obs 100
    g A=runiform()
    g B=runiform()
    g C=runiform()
    foreach var of varlist A-C {
    quietly summarize `var'
    g Z_`var'=1 if `var'>3*r(sd) ///the aim of Z_`var' is to detect the values beyond a threshold-value you decide to set (let's say >3 standard deviation apart)///
    replace Z_`var'=0 if Z_`var'==.
    list `var' Z_`var' if Z_`var'==1
     }
    Kind regards,
    Carlo

    Leave a comment:


  • Maarten Buis
    replied
    litosbrito: you got it the wrong way around: You need to tell Stata when a value is "too high". Too high is necessarily a subjective statement. So it can only be made by humans. You can think of a criterium, and ask a computer (Stata) to apply that criterium, but you, and only you, can choose the criterium. But before you start on that road, try to answer this question: How can you hope to find anything new, if you first remove all surprising observations from your data?

    Leave a comment:


  • Nick Cox
    replied
    "Think on a logarithmic scale" solves many more problems than eliminating outliers.

    Leave a comment:


  • litosbrito
    replied
    Thanks ofr the answers!

    Yes, is that I want to know, if is possible to Stata to say me that the value is "too high"!!!

    I will follow your suggestion, and see if I can resolve my problem.

    Thank you once for all the comments!!

    Leave a comment:


  • Nick Cox
    replied
    Thanks for trying to provide detail, but my answer remains pretty much the same.

    In effect, you are asking if there is a Stata command that will tell you if values are "too high". If you can translate that into some statistical criterion, then there will be Stata code to do it.

    In any case, eliminating outliers is a highly debatable tactic. It's just one of several possible actions and in my view usually one of the worst imaginable.

    There are entire books and many, many articles on treatment of outliers; the discussion by Richard Williams Anton cited in #3 is good and linked to Stata; another discussion is at http://stats.stackexchange.com/quest...iers-with-mean

    On graphics: I think you have it precisely the wrong way round. The more data you have, the easier it usually is to identify possible outliers or -- more importantly -- decide what to do given skewed or heavy-tailed distributions.
    Last edited by Nick Cox; 23 Sep 2014, 11:42.

    Leave a comment:


  • Anton Ivanov
    replied
    Keep in mind that you need strong theoretical justification in order to eliminate outliers from the analysis.

    Leave a comment:


  • litosbrito
    replied
    Hi Nick Cox,
    Thank you for the answer!
    I have a database, with many variables, to compare the values between two groups of countries. I want to compare the average, minimum, maximum and SD. But I want to eliminate the outliers, because I see that some values is to high.

    And, my attitude to not chose graphic is because I have thousands observation, so it will be more difficult to identify outliers! So that I want to know if is there any command, that I can use, it can say that the value, for example, more than 500, is outliers.

    Leave a comment:


  • Nick Cox
    replied
    You have to give much more specific detail on exactly what you are interested in to make fuller answers likely.

    Ignoring graphics here is a personal choice, as would be ignoring questions based on such a blinkered attitude.

    Please also note our preference for using full real names and for the correct spelling "Stata". See the FAQ Advice for more detail on this and other advice on posing questions.
    Last edited by Nick Cox; 23 Sep 2014, 11:00.

    Leave a comment:


  • litosbrito
    replied
    Thank you for the answers!
    I want to know if there any STATA command that I can use! I don“t want to use graphic!

    Leave a comment:


  • Anton Ivanov
    replied
    Hello!

    There are many ways to identify outliers. Here is a good document for reference: http://www3.nd.edu/~rwilliam/stats2/l24.pdf

    Anton

    Leave a comment:


  • Nick Cox
    replied
    There are many, many ways, depending on your definition of outliers. A good one is to plot your data and think about data points that seem surprising.

    Leave a comment:

Working...
X