You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
Why Winsor and Winsor2 Function will add new missing values
I want to use winsorize my data so here I use winsor and winsor2 function. But after I winsorize it,the missing values become more. How could this case happen?
winsor and winsor2 are commands (not functions) from SSC (*). I first wrote winsor in 1998. Given an if condition it ignores observations that don't satisfy that condition, with the side-effect of creating missing values for those observations in a new variable. This is standard for Stata commands. winsor2 was wrritten by someone else but evidently is exactly similar in this detail.
As I've mentioned many times over the last 25 years, my command was written because someone on Statalist wanted to do it.
But I've never seen the point of Winsorizing except as a prelude to getting a Winsorized summary, a mean or a variance or SD, say. I'd appreciate good textbook or review paper references explaining why it is a good way otherwise to deal with supposed outliers or fat/long/heavy tails as a prelude to modelling -- as compared with say working with an appropriate transformation or link function. If the idea is that some fraction of data are rogue or irrelevant, then you need a better method to set them on one side. (How do you Winsorize multiple variables sensibly?)
And if it is a good method, how do you choose the fraction to be Winsorized?
Turn and turn about, with the loosely similar method of trimming it can be interesting and useful to check the sensitivity of results to trimming fraction, but I've not heard of that being done.
(*) Although usage elsewhere is often different, Stata commands are not considered to be functions, and Stata functions are not considered to be commands. This is naturally just a detail, but Stata discussions are clearest when Stata terminology is used.
Comment