Dear:
Thank you sincerely. I met a problem that some variable has outliers indeed (shown by the graph box), but it seems that winsor (in some p values) does not work. For example,
sysuse auto, clear
(1978 Automobile Data)
. adjacent price
------------------------------------------
price | lower adjacent upper adjacent
----------+-------------------------------
. | 3291 8814
------------------------------------------
. graph box price
. winsor price, gen(price_n) p(0.01)
0 values to be Winsorized
r(198);
When i change the p value into 0.02, 0.05, it works. In this case, which p value should I use? If I use 0.02, it still has some outliers. If I increase p value into 0.05, the number of outliers reduced.
But again, it has some outliers. The number of outliers reduced when I improve the p value, but I wonder do I need to change all the outliers into the normal number before I do the further analysis. It is just the case for one variable, should I follow the same logic to the whole variables?
Best,
Eddie
Thank you sincerely. I met a problem that some variable has outliers indeed (shown by the graph box), but it seems that winsor (in some p values) does not work. For example,
sysuse auto, clear
(1978 Automobile Data)
. adjacent price
------------------------------------------
price | lower adjacent upper adjacent
----------+-------------------------------
. | 3291 8814
------------------------------------------
. graph box price
. winsor price, gen(price_n) p(0.01)
0 values to be Winsorized
r(198);
When i change the p value into 0.02, 0.05, it works. In this case, which p value should I use? If I use 0.02, it still has some outliers. If I increase p value into 0.05, the number of outliers reduced.
But again, it has some outliers. The number of outliers reduced when I improve the p value, but I wonder do I need to change all the outliers into the normal number before I do the further analysis. It is just the case for one variable, should I follow the same logic to the whole variables?
Best,
Eddie
Comment