Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dealing with extreme values in control variables

    Hi all,

    I have variable called “holdings” which measures how much cash is held. I use -graph box holdings- to identify any extreme values and find that many observations lie above the upper quartile range. I also used the -extreme- command.

    (1) Is it important to deal with extreme values even though this is a control variable?

    (2) I intend to impute the mean value for values above a certain threshold (yet to be determined but possibly anything above the upper quartile range).

    Any of advice on this would be much appreciated - Thanks!

  • #2
    If you Google outliers Stata list, or just search on Stata list, you will find substantial discussions of these issues. Many contributors think we shouldn't eliminate or re-code outliers unless we know they are really erroneous observations. Since your right hand side variables are probably correlated, outliers will change the correlations and therefore change your results. This is unlikely to be as bad as outliers in your more important variables, but it can happen.

    I don't understand why you're concerned about the interquartile range. Obviously, lots of your data should be in the top quartile – one quarter of it.

    Comment


    • #3
      Thanks very much Phil !

      Comment

      Working...
      X