Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dropping the outlier

    Hello, I want to know how can I identify and drop the outlier from my data, because when I did a scatterplot it was so weird. This is my descriptive table :

    Code:
     
    Variable Obs Mean Std. Dev. Min Max
    GDPgrowth 174 -4.787 7.984 -56.308 43.48
    Many thanks in advance for your valuable time and advice.

    beast regards.


  • #2
    Khati:
    provide that no apparent mistakes occurred in data entry, you should be >100% sure that what you consider as an outlier isn't a perfectly legal observation according to the data generating process you're interested in (as it is quite often th case).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      @Carlo Lazzaro thank you so much for your reply. But how can I drop it to get a rational scatterplot? How can I drop based on the descriptive table?
      Many thanks in advance for your valuable time and advice.

      best regards,

      Comment


      • #4
        Khati:
        while I do not sponsor your approach (who can decide when a scatterplot is "rational" without knowing the data generating process?), you may want to consider the following toy-example (that allows you to flag "outliers" without dropping them, which is rarely a good habit):
        Code:
        . sysuse auto.dta
        (1978 Automobile Data)
        
        . graph box price
        
        . sum price
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
               price |         74    6165.257    2949.496       3291      15906
        
        . g cut_off=2.5*r(sd)
        
        . gen flag=1 if price>=cut_off
        
        . replace flag=0 if flag ==
        
        . graph box price if flag==0
        
        .
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Please show us a quantile plot using neglog or asinh scale. NB neglog is sign() * log(1 + abs()).

          Comment


          • #6
            @ Nick Cox and @ Carlo Lazzaro thank you so much for your reply.

            I attached the quantile plot.
            In my scatterplot I want to represent the relationship between ferlity rate and gdpgrowth.

            Best regards,

            Graph(quantile).gph
            Attached Files
            Last edited by Khati Zolfaghari; 27 Nov 2021, 01:07.

            Comment


            • #7
              Khati:
              if no mistaken data entry was detected, I still think that outliers are part of the game here.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                @Carlo Lazzaro Thank you so much for your reply. Yes, I agree with you, but I want to see what's happen if I exclude them. Because GDP growth is for 2020, and for sure there is no country with GDP growth=43 or even -56, so for this reason I want to see the relationship between the fertility rate and GPD growth with excluding the outlier.

                Best regards,

                Comment


                • #9
                  Khati:
                  thanks for clarifying.
                  Then the issue rests on the way GDP growth (rate/year?) was calculated.
                  Could youplease clarify on that too? Thanks.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Thanks for #6 although .gph attachments are deprecated here and .png attachments explicitly preferred. See https://www.statalist.org/forums/help#stata 12.4 and 12.5

                    From your .gph attachments -- here rendered as these two graphs
                    Click image for larger version

Name:	anyoutliers.png
Views:	1
Size:	25.7 KB
ID:	1638323

                    Click image for larger version

Name:	anyoutliers2.png
Views:	1
Size:	29.7 KB
ID:	1638324



                    I pick up the following points:

                    1. Informally I would say that there are four outliers on GDP growth. (and not one outlier as implied by your wording)

                    2. I don't see much of a relationship between your two variables.

                    3. Nevertheless if you want to push this further I would compare modelling of

                    fertility given GDP growth

                    fertility given GDP growth without the outliers

                    fertility given GDP growth on an asinh or neglog scale.

                    You didn't pick up on my request to use one of the latter, but this graph shows that either scale pulls in both positive and negative outliers while being conservative for smaller values near zero (each function has slope 1 at argument zero)

                    Click image for larger version

Name:	anyoutliers3.png
Views:	1
Size:	22.9 KB
ID:	1638325



                    The graph was drawn with scheme s1color and more crucially for your question with

                    Code:
                    twoway function asinh = asinh(x), ra(-56.308 43.48) || function neglog = sign(x) * log1p(abs(x)), ra(-56.308 43.48) lp(dash)
                    i.e. over the range reported for GDP growth.

                    I share @Carlo Lazzaro's puzzlement over how the extreme values arise especially given your denial that they are present in the data.

                    You could helpfully report the results of

                    Code:
                    list if abs(GDPgrowth) > 20

                    Comment


                    • #11
                      Originally posted by Khati Zolfaghari View Post
                      Because GDP growth is for 2020, and for sure there is no country with GDP growth=43 or even -56, so for this reason I want to see the relationship between the fertility rate and GPD growth with excluding the outlier.
                      except Guyana and Macao... https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG

                      Comment


                      • #12
                        @ Nick Cox Thank you so much for your reply. I apologize fordidn't pick up on to show neglog or asinh scale, I had a problem coding for this I attached a scatterplot ( which I did also wrong to attach and thank you so much for letting me to know).

                        Thanks alot.

                        Best regards,

                        Comment

                        Working...
                        X