Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • summing across groups

    Hi, following is an example chunk of my data set.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float countyid str6 naics long estab long total
    1001 "11"    23 0
    1001 "113"    0 
    1001 "114"    0 
    1001 "1141"   0
    1001 "1142"   0
    1001 "115"    0 
    1001 "1151"   0
    1001 "1152"   0
    1001 "1153"   0
    1001 "23"   344 98
    1001 "233"   98 
    1001 "2331"  13
    1001 "2332"  73
    1001 "2333"  12
    1001 "234"    0 
    1001 "2341"   0
    1001 "2349"   0
    1001 "235"    0 
    1001 "2351"  16
    1001 "2352"  35
    1001 "2353"  22
    1001 "2354"  31
    1001 "2355"  43
    1001 "2356"  22
    1001 "2357"   0
    1001 "2359"  67
    end
    What I want to do is compute totals in the following fashion:
    I want to add the numbers in the estab column for naics codes that are 3 digit but within broader groups. In this example, the broader groups start from naics codes 11 and 23.
    For 11, I want to add estab numbers for naics codes 113, 114 and 115.
    For 23, I want to add estab numbers for naics codes 233, 234 and 235.

    Then I want to do the same for naics codes that are 4 digit. For example: 233 is the broader group and I want to add the numbers in estab column for naics 2331, 2332 and 2333.


    Any help would be greatly appreciated, thank you!

  • #2
    Code:
    gen prefix2 = substr(naics, 1, 2)
    gen prefix3 = substr(naics, 1, 3)
    
    by prefix2, sort: egen totals2 = total(cond(length(naics) == 3, estab, .))
    by prefix3, sort: egen totals3 = total(cond(length(naics) == 4, estab, .))
    You mutilated your -dataex- output by editing in a few things, so it did not produce any results beyond the first two observations when run. I edited out your edits, to just have countyid naics and estab, not total. When you want to show the results of a handworked solution, you should edit the data set itself to include the new variable(s) and then run -dataex-. Never hand edit -dataex- output: it is likely to break it.

    By the way, this code assumes that the naics values are, as in your example data, entered without any leading, trailing, or internal blanks. The code will give incorrect results if there are any exceptions to that assumption. You should probably precede the above code with a validity check for this assumption: -assert naics == trim(itrim(naics))-.

    Comment


    • #3
      I'm not going to attempt to answer this now, as I'm not seeing a straightforward solution and its the end of the evening for me.

      But I want to point out to anyone else looking at this that the data example you posted is misleading and will not be read successfully by Stata in its current state.

      The "long total" on the input command, and the extra numbers on naics 11 and naics 23, were added by you to the output of dataex and must be removed in order to make the example data readable in Stata. I've done that below, for the convenience of anyone who hopes to work on this problem.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float countyid str6 naics long estab
      1001 "11"    23
      1001 "113"    0
      1001 "114"    0
      1001 "1141"   0
      1001 "1142"   0
      1001 "115"    0
      1001 "1151"   0
      1001 "1152"   0
      1001 "1153"   0
      1001 "23"   344
      1001 "233"   98
      1001 "2331"  13
      1001 "2332"  73
      1001 "2333"  12
      1001 "234"    0
      1001 "2341"   0
      1001 "2349"   0
      1001 "235"    0
      1001 "2351"  16
      1001 "2352"  35
      1001 "2353"  22
      1001 "2354"  31
      1001 "2355"  43
      1001 "2356"  22
      1001 "2357"   0
      1001 "2359"  67
      end
      Added in edit: Crossed with Clyde's solution, which perhaps needs countryid included as the first item in the by clause if you have more than one country in your dataset.
      Last edited by William Lisowski; 08 Aug 2018, 20:24.

      Comment

      Working...
      X