Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing for outliers and high leverage observations in survey data.

    Hi Everyone,

    I was curious if there was a best approach to searching for and dealing with outliers and high leverage observations in survey data. Unfortunately,
    Code:
    predict dfbeta, dfbeta
    and
    Code:
    predict cooksd, cooksd
    do not work with survey weighting. Is this type of regression diagnostic imperative when you have thousands of observations?

    Thanks,

    David.

  • #2
    I've approached this kind of thing by calculating a sort of a do it yourself dfbeta. I think I recall a previous StataList thread about doing this with -jackknife-, but I've done something like this:

    Code:
    sysuse auto
    gen b1drop1 = .
    gen b2drop1 = .
    forval i = 1/`=_N' {
      regress price weight headroom if (_n != `i' )
      replace b1drop1 = _b[weight]  in `i'
      replace b2drop1 = _b[headroom] in `i'
    }
    The more filled-out version of this calculates each b*drop1 as a deviation from the full sample values.

    Comment


    • #3
      Originally posted by Mike Lacy View Post
      I've approached this kind of thing by calculating a sort of a do it yourself dfbeta. I think I recall a previous StataList thread about doing this with -jackknife-, but I've done something like this:

      Code:
      sysuse auto
      gen b1drop1 = .
      gen b2drop1 = .
      forval i = 1/`=_N' {
      regress price weight headroom if (_n != `i' )
      replace b1drop1 = _b[weight] in `i'
      replace b2drop1 = _b[headroom] in `i'
      }
      The more filled-out version of this calculates each b*drop1 as a deviation from the full sample values.
      This looks really helpful, thanks Mike. On a follow-up note: I've noticed that it is common when I have 10,000+ observations that several of them will have higher DF betas or Cook's d scores; but the values for these tend to be very tiny (e.g., 0.002) but greater than the 4/sqrt(n) cutoff. How concerning should these be?

      Comment


      • #4
        If I recall correctly, the dfbetas from Stata are standardized, so a "large" one could be large in relative but not absolute terms. I wouldn't worry unless it actually had a substantively meaningful effect on the coefficient. I would be interested, though, to find out what was unusual about that particular observation.

        Comment

        Working...
        X