Hello,
I am working with a dataset with >200K observations and potential outliers that number in the low thousands. I can easily identify outliers by say Cook's distances and the like. I can even generate a list of all these outliers in stata without too much of an issue. The question is, how what stata code input can have stata to quickly and efficiently drop all of these essentially random observations whose common link is say a cook's distance or the like? So far the only way I am coming up to do this is to do it manually and the number of outliers is quite considerable.
I am working with a dataset with >200K observations and potential outliers that number in the low thousands. I can easily identify outliers by say Cook's distances and the like. I can even generate a list of all these outliers in stata without too much of an issue. The question is, how what stata code input can have stata to quickly and efficiently drop all of these essentially random observations whose common link is say a cook's distance or the like? So far the only way I am coming up to do this is to do it manually and the number of outliers is quite considerable.
Comment