Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Outliers

    Good afternoon with everyone. I have a question I am working with a base that has the salary of the affiliates and I want to see if there is outlier data. I make a sum sueldo but I feel that that code doesn't help me much.

  • #2
    Definitions differ across disciplines, but I was taught that an outlier refers to a data point that deviates so significantly from the other data points in a dataset that it is deemed implausible or inconsistent with the rest of the data. In effect, it is an impossible value. These often arise from errors in data entry. So if you can define what an outlier is in your dataset, then it will be easy to identify such data points.

    Comment


    • #3
      If I want to see if there is any data in my sueldo variable that deviates significantly from the others. I was reading and they told me that I can get the z-score, where I subtract its mean from the variable and divide by its standard deviation, but I don't know if that helps me detect outliers.

      Comment


      • #4
        to me, an outlier is a value that is surprising given your, sometimes implicit, model of the data; in #3, you are apparently assuming that the data are normally distributed but what if the data are right-skew? before thinking about "outlier(s)", you need to think about the data-generating process or model of the data

        Comment


        • #5
          In economics, we generally do not consider variability in the data as leading to outliers. Therefore, this seems ad-hoc to me.

          Comment


          • #6
            Time of day references don't make much sense in a world with many time zones....

            More to the point, there isn't a recipe for identifying outliers without a definition of outliers.

            With salary data, I would expect to work on log scale any way, and not expect that Elon Musk or the like would be in my dataset.

            To get more focused answers, please show us the results of summarize, detail on your salary variable.

            (This repeats points already made; I was distracted by other business and didn't see other posts until I had posted mine.)
            Last edited by Nick Cox; 05 Jan 2024, 12:26.

            Comment


            • #7
              re: #5, let me re-state - what if the data are log-normally distributed (rather than normally distributed)?

              Comment

              Working...
              X