Testing for outliers and high leverage observations in survey data.

David Speed

Join Date: May 2015

Posts: 98
#1

Testing for outliers and high leverage observations in survey data.

09 Jan 2020, 11:21

Hi Everyone,

I was curious if there was a best approach to searching for and dealing with outliers and high leverage observations in survey data. Unfortunately,

Code:

predict dfbeta, dfbeta

and

Code:

predict cooksd, cooksd

do not work with survey weighting. Is this type of regression diagnostic imperative when you have thousands of observations?

Thanks,

David.
Tags: cooksd, dfbeta, postestimation, regress
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#2

09 Jan 2020, 11:56

I've approached this kind of thing by calculating a sort of a do it yourself dfbeta. I think I recall a previous StataList thread about doing this with -jackknife-, but I've done something like this:

Code:

sysuse auto gen b1drop1 = . gen b2drop1 = . forval i = 1/`=_N' { regress price weight headroom if (_n != `i' ) replace b1drop1 = _b[weight] in `i' replace b2drop1 = _b[headroom] in `i' }

The more filled-out version of this calculates each b*drop1 as a deviation from the full sample values.
1 like
Comment
David Speed

Join Date: May 2015

Posts: 98
#3

09 Jan 2020, 13:05

Originally posted by Mike Lacy View Post

I've approached this kind of thing by calculating a sort of a do it yourself dfbeta. I think I recall a previous StataList thread about doing this with -jackknife-, but I've done something like this:

Code:

sysuse auto gen b1drop1 = . gen b2drop1 = . forval i = 1/`=_N' { regress price weight headroom if (_n != `i' ) replace b1drop1 = _b[weight] in `i' replace b2drop1 = _b[headroom] in `i' }

The more filled-out version of this calculates each b*drop1 as a deviation from the full sample values.

This looks really helpful, thanks Mike. On a follow-up note: I've noticed that it is common when I have 10,000+ observations that several of them will have higher DF betas or Cook's d scores; but the values for these tend to be very tiny (e.g., 0.002) but greater than the 4/sqrt(n) cutoff. How concerning should these be?
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#4

09 Jan 2020, 15:38

If I recall correctly, the dfbetas from Stata are standardized, so a "large" one could be large in relative but not absolute terms. I wouldn't worry unless it actually had a substantively meaningful effect on the coefficient. I would be interested, though, to find out what was unusual about that particular observation.
1 like
Comment

Announcement

Testing for outliers and high leverage observations in survey data.

Comment

Comment

Comment