Hi all,
I'm trying to drop observations using r(mean) and r(sd) results of a price field I get from the summarize command. For some reason, stata drops all observations even when the price do not fall under my restriction. My code is below:.
sum price, d
drop if price < (r(mean) - 5 * r(sd) ) | price> (r(mean) + 5 * r(sd))
Below is my log when I run the above command:
sum price, d
price
-------------------------------------------------------------
Percentiles Smallest
1% .0969173 0
5% .1055639 0
10% .1224311 0 Obs 2,584,269
25% .1498837 0 Sum of Wgt. 2,584,269
50% .1888972 Mean .1920862
Largest Std. Dev. .0571629
75% .2265367 6.92407
90% .2674918 6.92407 Variance .0032676
95% .2977228 6.92407 Skewness 3.061594
99% .3403101 6.984922 Kurtosis 303.4658
. drop if price < (r(mean) - 5 * r(sd) ) | price> (r(mean) + 5 * r(sd))
(2,584,269 observations deleted)
The calculated value of r(mean) - 5 * r(sd) equals-0.0937 (i.e., 0.1920862 - 5*0.0571629), and, similarly, the calculated value of r(mean) + 5 * r(sd) equals 0.4779.
Based on the above statistics, there are no negative price values, and less than 1 % of the data in which the price is greater than 0.4779. However, stata drops all of them.
Could someone please help me understand what is causing this?
Thanks,
DP
I'm trying to drop observations using r(mean) and r(sd) results of a price field I get from the summarize command. For some reason, stata drops all observations even when the price do not fall under my restriction. My code is below:.
sum price, d
drop if price < (r(mean) - 5 * r(sd) ) | price> (r(mean) + 5 * r(sd))
Below is my log when I run the above command:
sum price, d
price
-------------------------------------------------------------
Percentiles Smallest
1% .0969173 0
5% .1055639 0
10% .1224311 0 Obs 2,584,269
25% .1498837 0 Sum of Wgt. 2,584,269
50% .1888972 Mean .1920862
Largest Std. Dev. .0571629
75% .2265367 6.92407
90% .2674918 6.92407 Variance .0032676
95% .2977228 6.92407 Skewness 3.061594
99% .3403101 6.984922 Kurtosis 303.4658
. drop if price < (r(mean) - 5 * r(sd) ) | price> (r(mean) + 5 * r(sd))
(2,584,269 observations deleted)
The calculated value of r(mean) - 5 * r(sd) equals-0.0937 (i.e., 0.1920862 - 5*0.0571629), and, similarly, the calculated value of r(mean) + 5 * r(sd) equals 0.4779.
Based on the above statistics, there are no negative price values, and less than 1 % of the data in which the price is greater than 0.4779. However, stata drops all of them.
Could someone please help me understand what is causing this?
Thanks,
DP
Comment