Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing Outlier

    Dear all,

    I want to remove the outlier above the yellow line. Is there any Stata command by which I can only remove the outlier from the Y variable on the basis of each X variable. For example, I want to remove the extreme value of Y when X is 1,2,3,......,27.

    Thanks and kind regards,

    Ariful
    Click image for larger version

Name:	Capture.JPG
Views:	2
Size:	84.8 KB
ID:	1496773

  • #2
    You may use the ‘if’ clause to combine situations.

    That being said, I just wish to underline a concept that already became a letany: simply excluding outliers is not the most appropriate strategy.
    Best regards,

    Marcos

    Comment


    • #3
      You need to describe how the code should choose which observations are outliers. For X=1 there are two points above the yellow line; for X=2 there is a single point above the yellow line, and for X=3 there are no points above the yellow line.

      Comment


      • #4
        Expanding on Marcos's suggestion, the following example code demonstrates a technique that could start you on your way.
        Code:
        sort X Y
        generate outlier = 0
        by X (Y): replace outlier = 1 if X==1 & _n>=_N-1
        by X (Y): replace outlier = 1 if X==2 & _n==_N
        ...
        and then in your subsequent commands you can exclude the outliers
        Code:
        scatter Y X if outlier==0
        This will enable you to do what you seek. It is poor statistical technique, as Marcos suggests,

        Comment


        • #5
          What are the variables any way? The plot to me looks consistent with a log-linear relationship for which logarithmic llink function would give good results.

          Comment


          • #6
            Thanks a lot. I solve the problem. Thanks William, Nick and Marcos.

            Comment

            Working...
            X