Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Add column with the sum of unique value of subgroups


    Hi, can someone please help?

    I want to add a new column called ss, it should be constant across all rows within the same group_exp. As you can see, observations with the same group_exp and product have the same value of exp_shareUS_k. I want this new column to have the sum of .20564187 and .7964094. that is, the sum of this unique value of exp_shareUS_k across subgroups.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float group_exp str6 product str4 j float exp_shareUS_k
    8 "852432" "ECU"  .7964094
    8 "852432" "PER"  .7964094
    8 "852432" "URY"  .7964094
    8 "852432" "USA"  .7964094
    8 "852439" "MEX" .20564187
    8 "852439" "COL" .20564187
    8 "852439" "USA" .20564187
    end
    The only way I thought would be to preserve, delete the column that varies across observations (i.e j), get the unique values exp_shareUS_k within the subgroups using collapse (i.e group_exp + product), sum those within group_exp, and then merging that back to the original data. I was looking for something easier than that
    Last edited by Arthur Carvalho Brito; 24 Sep 2023, 20:56.

  • #2
    Taking literally what you say you want, the following will do:
    Code:
    egen tag = tag(group_exp exp_shareUS_k)
    by group_exp, sort: egen ss = total(cond(tag, exp_shareUS_k, .))
    But is that really what you want? In the example data there is a 1:1 correspondence between product and exp_shareUS_k. But it is not hard to imagine that in the full set of data, there could be two (or more) products that have the same value of exp_shareUS_k. In that case, this code would count that value of exp_shareUS_k only once. Is that what you want? Or do you really want to sum the values of exp_shareUS-k for distinct values of product? Your proposed method following the example data, wherein you propose to collapse on group_exp and product suggests that is really distinct products you are interested in, not distinct values of exp_shareUS_k. So if I'm right and you really want to sum the values of exp_shareUS_k for distinct products within groups:
    Code:
    egen tag = tag(group_exp product)
    by group_exp, sort: egen ss = total(cond(tag, exp_shareUS_k, .))

    Comment


    • #3
      Hello I have a question related to heckman model and how it run on stata?

      Comment


      • #4
        Hi how to create a participation variable in Stata?

        Comment


        • #5
          Tahira Khan Please start a new thread for each new question that you have.

          Comment


          • #6
            Clyde Schechter thank you!

            Your second approach is more robust, and it does the job. While highly unlikely, it could be that different products in the same group have the same exp_share_US_k

            Comment

            Working...
            X