Outliers on Stata - Statalist

Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#16

24 Sep 2014, 04:08

Nick is right in his point about multivariate outliers. As a matter of fact, I have seen many papers in Finance that winsorize or drop values that are 3 SD away from mean values. In that case, we can adopt the following code

sysuse auto
foreach x of varlist price mpg{
sum `x'
drop if (`x' -(r(mean))>(3*r(sd)))
}

Regards
Attaullah Shah

Last edited by Attaullah Shah; 24 Sep 2014, 04:10.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
1 like
Comment
Wesley Mokkink

Join Date: Apr 2016

Posts: 22
#17

20 Apr 2016, 08:12

Nick Cox Dear Nick,

I installed the "extremes" code written by you. I would like to use it to remove extreme values in my sample. However, I do not know how to actually remove those extreme values instead of just listing them. Is there any way to do this?

Thanks in advance!

Kind regards,

Wesley
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#18

20 Apr 2016, 08:24

You have now tacked a question on to a thread that was closed over a year ago. Start a new thread.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Wesley Mokkink

Join Date: Apr 2016

Posts: 22
#19

20 Apr 2016, 08:34

Steve Samuels I started a new thread.

HTML Code:

http://www.statalist.org/forums/forum/general-stata-discussion/general/1336660-remove-outliers-on-stata
Comment
Denila Jinny

Join Date: Jun 2014

Posts: 25
#20

25 Oct 2018, 04:22

Originally posted by Nick Cox View Post

"Think on a logarithmic scale" solves many more problems than eliminating outliers.

Sir,

What should I do, if the log value is also not normal?

I have a dataset with 9000 observations. Can I assume normality just because the sample is large?

Denila.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#21

25 Oct 2018, 04:33

Denila Jinny Not at all. A large sample can be highly non-normal too. To give a better answer, we need to know more about your data and your goals, especially on whether or why you think your data "should be" normal.
Comment
Denila Jinny

Join Date: Jun 2014

Posts: 25
#22

25 Oct 2018, 04:41

Originally posted by Nick Cox View Post

Denila Jinny Not at all. A large sample can be highly non-normal too. To give a better answer, we need to know more about your data and your goals, especially on whether or why you think your data "should be" normal.

Thank you very much sir for your immediate reply.
I am working on a cross sectional data. My objective is to study the causal relationships between funding, profitability and productivity. Literature suggests bi-directional relationships among these variables. Therefore I intend to do non-recursive SEM, one of the assumptions of which is normality. I have 4 continuous variables, 1 interaction variable that interacts 2 continuous variables, 2 interaction variables that interact one continuous variable with 1 dichotomous variable, and few other categorical variables. Will this be enough for you to help me with this issue?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#23

25 Oct 2018, 05:42

I don't know anything you don't about SEM. My advice is to start a new thread with a title like "Non-normality and structural equation models" so that people who know about SEM can see that. Also, I would show some graphs of the distributions of your continuous variables to give us some flavour.
1 like
Comment
Denila Jinny

Join Date: Jun 2014

Posts: 25
#24

25 Oct 2018, 08:30

Originally posted by Nick Cox View Post

I don't know anything you don't about SEM. My advice is to start a new thread with a title like "Non-normality and structural equation models" so that people who know about SEM can see that. Also, I would show some graphs of the distributions of your continuous variables to give us some flavour.

OK. Thank you very much.
Comment
Kithinji Charles

Join Date: Dec 2019

Posts: 1
#25

04 Dec 2019, 07:46

*This example shows how to highlight outliers using percentiles
input x
1
2
12
14
15
14
16
15
14
98
76
end
* let show outliers using boxplot
graph box x
*we can then summarize with details
sum x,detail
return list
gen x_outlier=1 if x<=r(p25)-(1.5*(r(p75)-r(p25)))|x>=r(p75)+(1.5*(r(p75)-r(p25)))
keep if x_outlier==1
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#26

04 Dec 2019, 08:20

#25 John W. Tukey proposed a rule of thumb to plot points separately on a box plot if greater than p75 + 1.5 IQR or less than p25 - 1.5 IQR.

So far, so good. This wasn't a recipe for identifying points to drop. In most cases the occurrence of outliers was, at least for Tukey, a signal to think about a transformation.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment