Advice on looping within dofile

Nicholas Marsh

Join Date: Aug 2016

Posts: 4
#1

Advice on looping within dofile

15 Aug 2016, 06:48

Hello I am trying to write a dofile that allows me to be the most specific with my results using looping code, but have struggled to think of anything that works so far.

My dataset has 406 observations and I am running Stata 14.1 for windows. I have an id variable that is specific to the first two digits, and you'll notice that 111, 112, 113, 114, and 115 add up to 3008, and that 1131, 1132, 1133, 1141, 1142, 1151, and 1152 add up to 3005. Therefore I want to keep the three digit id observations for the 11 level of specification because the three digit level accounts for the greatest amount (3,008 at the three digit level compared to 3005 at the four digit level and 3006 at the two digit level.

Does anyone have any idea what my code should be to generate a check variable and then keep only the specific id observations that account for the greatest amount for each unique first number?

My dataset looks something like this:

id amt length firstnum
11 3006 2 11
21 5725 2 21
22 23919 2 22
111 0 3 11
112 0 3 11
113 1603 3 11
114 136 3 11
115 1269 3 11
211 951 3 21
212 2596 3 21
213 2177 3 21
221 23919 3 22
1131 51 4 11
1132 68 4 11
1133 1483 4 11
1141 42 4 11
1142 94 4 11
1151 558 4 11
1152 623 4 11
1153 86 4 11
2111 951 4 21
2121 15 4 21
2122 1343 4 21
2123 1237 4 21
2131 2177 4 21
2211 21263 4 22
2212 2387 4 22
2213 273 4 22
Tags: data, foreach, loop
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

15 Aug 2016, 08:31

I found your question unclear. But if I do understand it correctly, the following code will almost do it:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(id amt length firstnum) 11 3006 2 11 21 5725 2 21 22 23919 2 22 111 0 3 11 112 0 3 11 113 1603 3 11 114 136 3 11 115 1269 3 11 211 951 3 21 212 2596 3 21 213 2177 3 21 221 23919 3 22 1131 51 4 11 1132 68 4 11 1133 1483 4 11 1141 42 4 11 1142 94 4 11 1151 558 4 11 1152 623 4 11 1153 86 4 11 2111 951 4 21 2121 15 4 21 2122 1343 4 21 2123 1237 4 21 2131 2177 4 21 2211 21263 4 22 2212 2387 4 22 2213 273 4 22 end by firstnum length, sort: egen total = total(amt) by firstnum (total), sort: keep if total == total[_N]

I say "almost" because you provide no information as to what is to be done if the totals at two or more levels for a given firstnum are tied for greatest. That situation does not arise in your example data, but might in your real data. The code above will keep all levels that are tied for greatest total. If that isn't what you need, then you will have to make some modifications to the code.

Finally, in the future, please follow my example and post data examples in a code block using the -dataex- command (which you can get by running -ssc install dataex-).
1 like
Comment
Jack Gibson

Join Date: Aug 2016

Posts: 10
#3

15 Aug 2016, 08:44

This doesn't require looping...

Code:

sort firstnum length by firstnum length: egen groupsum=total(amt) gsort firstnum -groupsum by firstnum: keep if length==length[1]

Note that if the amounts at two or more different lengths (within the same 2-digit id) add up to the same total, you'll end up with a random choice between them. To keep them all, change the last line to:

Code:

by firstnum: keep if groupsum==groupsum[1]

Alternatively, to keep the ones with the shortest length, stick with the original last line and change the gsort command to:

Code:

gsort firstnum -groupsum length

or the longest:

Code:

gsort firstnum -groupsum -length
Comment
Nicholas Marsh

Join Date: Aug 2016

Posts: 4
#4

15 Aug 2016, 08:44

Clyde, thank you for your response. Your answer helped me out very much. I apologize for posting my data example in the incorrect fashion, in the future I will follow the example you set out for me.

As to your "almost" concern, you are very correct, I did mistakenly leave out what to do if the totals at two or more levels for a given firstnum are tied for greatest. In that situation I would prefer to keep the observations that have the greatest length value, that is if the amt is the same for both length 2 and length 3, I would want to keep the length 3 observations and drop the length 2 observations.

Thanks again for your assistance!
Comment

Announcement

Advice on looping within dofile

Comment

Comment

Comment