Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sum of values within a variable

    Hi all,

    I have a dataset with upwards of 100k observations. Each observation is stratified first by union and then by village. The number of groups pertains to the village level. I want to generate a variable that sums the total number of groups within each village of the union. However, when I do egen qualified_farming_union = total(village_farming_group), by(vidunion) it gives me the total from all instances. Each instance within a village is not unique and upto the village each value is distinct. Here's an example -

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(vidunion vidvillage) float(village_farming_group village_aquaculture_group village_fishing_group)
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    1 1001 2 . .
    end
    label values vidunion labels2
    label values vidvillage labels3
    Is it possible to get a variable which would only sum one value for each distinct village within a union? Thanks!

  • #2
    Thanks for the data example -- but the observations are all the same.

    Please invent an example with similar variable names, and show what you want to get. On the face of it only the first three variables are material.

    Comment


    • #3
      Hi Nick,

      Thanks for the reply. Apologies for the poor example. Here's a better one -

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int(vidunion vidvillage) float(village_farming_group village_aquaculture_group village_fishing_group)
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1058 5 . 1
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      20 1059 2 . .
      
      end
      label values vidunion labels2
      label values vidvillage labels3
      In the above, I want to create a variable which sums the village_farming_group based on the vidunion code. In this example, the new variable will have the value 5+2 = 7 in all instances. Does this make it clearer?

      Comment


      • #4
        Perhaps

        Code:
        egen tag = tag(vidunion vidvillage) 
        egen wanted = total(tag * village_farming_group), by(vidunion)

        Comment


        • #5
          It worked! Thank you so much!

          Comment

          Working...
          X