Looping using forvalue in winsor2

Manish Kr

Join Date: Jan 2019

Posts: 5
#1

Looping using forvalue in winsor2

30 Mar 2019, 12:28

I am using the following command in Stata 14.2 which has data stacked by industry-year wise. There are 357 industry year groups for the groups 1 to 271 I used the following syntax :

. forval i = 1/271 {
2. capture {
3. winsor cfo_sc if group == `i', gen(work) h(1)
4. replace cfo_sc_w = work if group == `i'
5. drop work
6. }
7. }

This worked fine for winsorizing the data first 271 groups but when I am using the following syntax for winsorizing the remaining groups, I get the winsorised values for group 272 only for other groups I get blank cells. further temporary variable cfo_sc_t created also do not get dropped.:

forval i = 272/357 {
2. capture {
3. winsor2 cfo_sc if group == `i', suffix(_t) cuts(1,99)
4. replace cfo_sc_s = cfo_sc_t if group == `i'
5. drop cfo_sc_t
6. }
7. }

Kindly guide what's wrong in the syntax.

Thank you.
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

30 Mar 2019, 18:55

I suggest you remove lines 2 and 6 and see if Stata gives you an informative error messages. The use of capture should be restricted to well-tested code that works.

Beyond that, help winsor2 suggests you could get the same sort of results with

Code:

winsor2 cfo_sc, suffix(_w) h(1) by(group) replace cfo_sc_w = . if group >= 272 winsor2 cfo_sc, suffix(_s) cuts(1,99) by(group) replace cfo_sc_s = . if group <= 271
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#3

31 Mar 2019, 01:12

winsor and winsor2 are quite different programs both on SSC, as people are asked to explain. I wrote winsor and Lian Yujun wrote winsor2. In Lian's help it is stated that

Code:

Codes from winsor by Nicholas J. Cox and -winsorizeJ.ado- by Judson Caskey have been incorporated.

which is correct. However, it wouldn't be correct to infer that winsor2 is a superset of winsor, because it isn't. Their functionality overlaps, but winsor supports something that winsor2 does not.

winsor (but not winsor2) supports these choices:

p(#) specifies the fraction of the observations to be modified in each tail. p should be greater than
0 and less than 0.5 and imply a value of h as just below.

h(#) specifies the number of the observations to be modified in each tail. h should be at least 1 and
less than half the number of non-missing observations.

Just one of p() and h() should be specified.

highonly and lowonly specify that Winsorizing should be one-sided, referring only to the tail with the
highest values or only to the tail with the lowest values, respectively. These options should not
be specified together.

In contrast in winsor2 the option cuts() supports percentiles. cuts(1, 99) should correspond broadly to p(0.01). I say broadly because the calculations are not identical.

It follows that

1. Manish Kr is trying to winsorize some groups in one way and the other groups in another way. On the face of it there should be a good story for anything so unusual.

2. The code suggested in #3 by William Lisowski

Code:

winsor2 cfo_sc, suffix(_w) h(1) by(group)

won't work as winsor2 doesn't have an h() option.

Note. I don't understand the question in #1 without an example.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

31 Mar 2019, 06:11

The main advice of post #2 should not be lost: don't choose to ignore error messages if your code is not working.

My understanding of the question in post #1 is limited to "my loop is not working", my answer is "pay attention to what Stata tells you". I know little about winsorization. In looking for obvious syntax errors in the second (failing) loop, I saw that the second loop appears unnecessary. I simply overlooked the use of winsor rather than winsor2 in the first loop when I extended my suggestion to that loop.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#5

31 Mar 2019, 08:14

William is quite right: the major point of #2 is that no loop is needed, as I should have emphasised.

Last edited by Nick Cox; 31 Mar 2019, 08:28.
Comment
Manish Kr

Join Date: Jan 2019

Posts: 5
#6

01 Apr 2019, 23:25

Thanks @William Lisowski for your help. I was able to sort out my problem using first two lines of the code in #2 replacing h(1) with cuts(1,99) - as suggested by Nick Cox in #3.

Nick Cox Regarding my initial question, I had a dataset for 357 Industry-year. The Industry-year observations were grouped in the variable group. The first 271 groups has less than 100 observations so winsor2 with cuts(1,99) would not replace any observations in these groups although there were outliers in the group. I therefore used winsor with loop as specified in #1. Since the remaining groups had more than 100 observations, I chose to winsorise at 1 percentile in both tails. I tried the same loop with minor modification in the command for the remaining groups but the loop did not work, I could not understand whats the problem with the loop.

Anyways, Thanks a lot William Lisowski for your suggestion and thanks a lot Nick Cox for giving a patient reading of my problem.
Comment

Announcement

Looping using forvalue in winsor2

Comment

Comment

Comment

Comment

Comment