Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice on looping within dofile

    Hello I am trying to write a dofile that allows me to be the most specific with my results using looping code, but have struggled to think of anything that works so far.

    My dataset has 406 observations and I am running Stata 14.1 for windows. I have an id variable that is specific to the first two digits, and you'll notice that 111, 112, 113, 114, and 115 add up to 3008, and that 1131, 1132, 1133, 1141, 1142, 1151, and 1152 add up to 3005. Therefore I want to keep the three digit id observations for the 11 level of specification because the three digit level accounts for the greatest amount (3,008 at the three digit level compared to 3005 at the four digit level and 3006 at the two digit level.

    Does anyone have any idea what my code should be to generate a check variable and then keep only the specific id observations that account for the greatest amount for each unique first number?

    My dataset looks something like this:

    id amt length firstnum
    11 3006 2 11
    21 5725 2 21
    22 23919 2 22
    111 0 3 11
    112 0 3 11
    113 1603 3 11
    114 136 3 11
    115 1269 3 11
    211 951 3 21
    212 2596 3 21
    213 2177 3 21
    221 23919 3 22
    1131 51 4 11
    1132 68 4 11
    1133 1483 4 11
    1141 42 4 11
    1142 94 4 11
    1151 558 4 11
    1152 623 4 11
    1153 86 4 11
    2111 951 4 21
    2121 15 4 21
    2122 1343 4 21
    2123 1237 4 21
    2131 2177 4 21
    2211 21263 4 22
    2212 2387 4 22
    2213 273 4 22


  • #2
    I found your question unclear. But if I do understand it correctly, the following code will almost do it:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id amt length firstnum)
      11  3006 2 11
      21  5725 2 21
      22 23919 2 22
     111     0 3 11
     112     0 3 11
     113  1603 3 11
     114   136 3 11
     115  1269 3 11
     211   951 3 21
     212  2596 3 21
     213  2177 3 21
     221 23919 3 22
    1131    51 4 11
    1132    68 4 11
    1133  1483 4 11
    1141    42 4 11
    1142    94 4 11
    1151   558 4 11
    1152   623 4 11
    1153    86 4 11
    2111   951 4 21
    2121    15 4 21
    2122  1343 4 21
    2123  1237 4 21
    2131  2177 4 21
    2211 21263 4 22
    2212  2387 4 22
    2213   273 4 22
    end
    
    by firstnum length, sort: egen total = total(amt)
    by firstnum (total), sort: keep if total == total[_N]
    I say "almost" because you provide no information as to what is to be done if the totals at two or more levels for a given firstnum are tied for greatest. That situation does not arise in your example data, but might in your real data. The code above will keep all levels that are tied for greatest total. If that isn't what you need, then you will have to make some modifications to the code.

    Finally, in the future, please follow my example and post data examples in a code block using the -dataex- command (which you can get by running -ssc install dataex-).

    Comment


    • #3
      This doesn't require looping...
      Code:
      sort firstnum length
      by firstnum length: egen groupsum=total(amt)
      gsort firstnum -groupsum
      by firstnum: keep if length==length[1]
      Note that if the amounts at two or more different lengths (within the same 2-digit id) add up to the same total, you'll end up with a random choice between them. To keep them all, change the last line to:
      Code:
      by firstnum: keep if  groupsum==groupsum[1]
      Alternatively, to keep the ones with the shortest length, stick with the original last line and change the gsort command to:
      Code:
      gsort firstnum -groupsum length
      or the longest:
      Code:
      gsort firstnum -groupsum -length

      Comment


      • #4
        Clyde, thank you for your response. Your answer helped me out very much. I apologize for posting my data example in the incorrect fashion, in the future I will follow the example you set out for me.

        As to your "almost" concern, you are very correct, I did mistakenly leave out what to do if the totals at two or more levels for a given firstnum are tied for greatest. In that situation I would prefer to keep the observations that have the greatest length value, that is if the amt is the same for both length 2 and length 3, I would want to keep the length 3 observations and drop the length 2 observations.

        Thanks again for your assistance!

        Comment

        Working...
        X