Outliers in Panel Data Analysis

Samuel Renhoar

Join Date: May 2020

Posts: 25
#1

Outliers in Panel Data Analysis

22 May 2020, 05:13

Hello everyone,

I was wondering about how outliers could affect my analysis (causality) using regression for panel data?
is there any proper way to detect it in stata? and what should i do if i have some outliers?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

22 May 2020, 10:08

Samuel:
unless you are 100% sure that your (so called) outliers are the offspring of wrong data entry procedure, they should be usually considered as a fact of life and, as such, kept in database and subsequent analysis.

Kind regards,
Carlo
(Stata 19.0)
Comment
Samuel Renhoar

Join Date: May 2020

Posts: 25
#3

23 May 2020, 01:24

Carlo:

Thank you carlo, i'm pretty sure i did a proper entry procedure and even checked it twice that i inputed the right data, but the companies seems had a problem in the period that I examined.

(I forgot where i read this statement) someone said that if i have several outliers i could affect my estimation and biased in my result.

Code:

graph box Y, over(year)

i did this command to detect the outliers. i don't know is it the right procedure, but can i considered the dot above the upper line as the outliers?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#4

23 May 2020, 01:52

What you call them is up to you. What you do about them is the key issue. graph box here follows a convention introduced by J.W. Tukey, which is to plot points individually if they lie more than 1.5 times the interquartile range (IQR) away from the nearer quartile, so data points are

more than upper quartile + 1.5 IQR

OR

less than lower quartile - 1.5 IQR.

The IQR is shown by the length of the box -- for each year in your case. Tukey was asked why 1.5? and replied that 1 would be too small and 2 would be too large. (Why 1.5 rather than 1.4 or 1.6 or 1.45 or 1.56 is evident from looking carefully at the literature, mostly from the 1970s: Tukey was focusing on calculation procedures that were easy by hand -- without even a small calculator. The recipe is pragmatic and based on experience with data rather than on any formal analysis.

The criterion for plotting points individually -- and Tukey by the way did not use the term "outliers" in this context -- is just to flag points for thinking about, and in no sense whatsoever is it a reliable criterion for deciding between good and bad data points.

What is more:

What is an outlier? is a multivariate question: what about the other variables?

It also depends on the scales you use too (original or transformed, and so forth).

It's important for whatever comes next in your project that you realise that ROA has a skewed distribution, which does not surprise me and which may affect what kind of model makes sense, or may be no issue at all when you look at the response in the context of the predictors.

And, indeed, the more extreme data points will have more influence on most models you fit. That usually makes sense too.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#5

23 May 2020, 02:41

Samuel:
as usual, after Nick's reply, there is nothing to add (sadly enough for the followers like me who cannot compete with -but only benefit from - his knowledge!).
That said, if firms belonging to a given industry can well have return on assets that is far apart from mean or other location measure, this is simply a matter of fact.
In addition, since you have panel data, I would be interested in knowing whether the so called ouliers refer to the same firms across the 3-year timespan that you considered in your model or not.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#6

23 May 2020, 05:23

One of the best ways to account for “outliers” in panel data is to use fixed effects analysis because it controls for large locational differences.
1 like
Comment
Samuel Renhoar

Join Date: May 2020

Posts: 25
#7

24 May 2020, 02:13

Nick:
Thank you Nick,
after all your good explanation, should i still considered to detect my outliers or not? and how to do it correctly?

Carlo:
Thank you Carlo, and this is how it looks like if i put a label for the point that is (now) I considered as above and bottom +1,5 and -1.5 quartile

Jeff:
Thank you Jeff, does it mean that I've overcome the outliers problem by using fixed effect model?
and by the way, i admire your books a lot,
best regards.

Last edited by Samuel Renhoar; 24 May 2020, 02:15.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#8

24 May 2020, 03:56

I don't understand what different question you are asking in #7. I wouldn't call any of your data points outliers on the evidence here, but that is my view.

You have a moderately skewed distribution, but if any points are anomalous they can only be regarded as anomalous given also information on the predictors.

The wording here

above and bottom +1,5 and -1.5 quartile

is incorrect, as my earlier post implies.

It may seem unhelpful but the wording in our FAQ Advice often applies

Whether what you are doing is “correct” is very difficult to discuss helpfully.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#9

24 May 2020, 05:01

Samuel:
the "extreme observations" you detected in your dataset are actually changing across years.
I would investigate whether there's something in the data generating process that can explain/support that evidence.

Kind regards,
Carlo
(Stata 19.0)
Comment
Samuel Renhoar

Join Date: May 2020

Posts: 25
#10

24 May 2020, 07:31

Nick:
Thank you nick,
I'm sorry if i take a wrong conclusion over your replies,
but it seems like I really don't need to worry about outliers in my data.

Carlo:
My data is actually from secondary source, but i check it twice and really sure that i'm using the correct number and data.
can i assumed it as (maybe) unusual event happen to some of those "extreme observations" during the observation periods?"
because there are some "extreme observations" that only exist in one or two year of observations.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#11

24 May 2020, 08:24

Samuel:
this issue has more to do with what your data represent than with statistics/econometrics.
Hence, I cannot tell.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#12

24 May 2020, 11:30

You can never be sure that you have overcome any data/statistical problem by using a specific estimation method. But, for example, those graphs you have presented should, effectively, be done separately for each unit, pooling across time. FE removes the averages within each unit. With a short time series it seems unlikely that you'll see any "outliers."
1 like
Comment

Announcement

Outliers in Panel Data Analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment