Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dealing with Outliers

    Hello everyone,

    Ιn the scatter plot below, I calculate the time effects of a categorical variable in terms of counts in time (1-15) for local councils on some economic aspects for a panel in different countries. Local councils are renewing their period every 4–5 years, depending on the case, with some rare cases having a longer duration of up to 15 years, in irregular countries (i.e.same leader regime, lower degree of democracy, or iconic democracy). That means there are more observations in years 4-5 when the renewal is happening than later years, from 6 up to the year 15, and there are just a few of them. However, taking a graph of summary statistics for year 15 or so, in some variables, the mean statistic is higher, since for instance, there are only 16 observations in year 15 with respect to 500+ observations in years 1–5, which is misleading. According to the model, there should be an increase(decrease) in the mean in the first year, dropping afterward, subject to the case, the variable economic meaning and sign effect. In other words, the early years are far more effective. That means there should be a short of pattern there, if a close look in the scatter plot and the tabulation below.

    I do not mean to winsorize that, but somehow to treat it, creating a graph and a summary table, showing the pattern as well as a correlation table. It is hard to provide data as I have data from more than 200 local councils, and they reach up to 4000 observations. Hard to find where those few observations are, hard to get the right combination of data in order to provide an example. But if you suggest a way, I could provide a sample example. How do I take care of that?

    My scatter code and plot are those below.

    Thanks in advance
    Giorgio
    Click image for larger version

Name:	Sample.png
Views:	1
Size:	211.4 KB
ID:	1727619






    Code:
        
        
    
    scatter gdp_capita_r count, graphr(c(white)) m(Oh) yline(0, lc(navy) lp(dash)) || lfit gdp_capita_r  count ///
        || scatter gdp_capita_r count, mc(navy%10) mlc(none) ml() mlabc(navy) mlabpos(3) ///
        xtitle("Count in time") yt("") title(" GDP per capita  ", size(medsmall)) ///
        legend(off)   yl(, format(%03.1f))
    I also run a summary statistics for the variable count, for example for one of the variables in the scatter.



    Code:
    -> count = 1
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
    gdp_capita_r |        173    30128.29    21966.47   581.2183   79299.84
    
    --------------------------------------------------------------------------------
    -> count = 2
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
    gdp_capita_r |        405    24231.69    14149.85   345.4215   61174.54
    
    --------------------------------------------------------------------------------
    -> count = 3
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
    gdp_capita_r |        401    27788.38    15816.52    517.717   81653.34
    
    --------------------------------------------------------------------------------
    -> count = 4
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
    gdp_capita_r |        465    25782.95    19592.54   364.7466   105264.8
    
    --------------------------------------------------------------------------------
    -> count = 5
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
    gdp_capita_r |        203     23246.8    26436.19   330.2053   111968.4
    
    --------------------------------------------------------------------------------
    -> count = 6
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
    gdp_capita_r |         21    1492.847     1629.38   328.0719   4139.031
    
    --------------------------------------------------------------------------------
    -> count = 7
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
    gdp_capita_r |          8    13650.78    2329.838   10347.18   16791.89
    
    --------------------------------------------------------------------------------
    -> count = 9
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
    gdp_capita_r |         10    394.2766    13.27976   373.5514   415.7148
    
    --------------------------------------------------------------------------------
    -> count = 11
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
    gdp_capita_r |         12    599.7394    71.06272   491.9627   713.6509
    
    --------------------------------------------------------------------------------
    -> count = 15
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
    gdp_capita_r |         14    393.3982    59.90866   302.0926   469.4377
    
    --------------------------------------------------------------------------------
    -> count = .
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
    gdp_capita_r |        560    20751.71    23475.95   322.3344   111043.5
    Last edited by Giorgio Di Stefano; 19 Sep 2023, 14:42.

  • #2

    Here below is an example of the misleading statistical mean average of the outliers.


    Click image for larger version

Name:	Graph3.png
Views:	1
Size:	27.4 KB
ID:	1727629

    Comment

    Working...
    X