Add column with the sum of unique value of subgroups

Arthur Carvalho Brito

Join Date: Jan 2021

Posts: 45
#1

Add column with the sum of unique value of subgroups

24 Sep 2023, 20:51

Hi, can someone please help?

I want to add a new column called ss, it should be constant across all rows within the same group_exp. As you can see, observations with the same group_exp and product have the same value of exp_shareUS_k. I want this new column to have the sum of .20564187 and .7964094. that is, the sum of this unique value of exp_shareUS_k across subgroups.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float group_exp str6 product str4 j float exp_shareUS_k 8 "852432" "ECU" .7964094 8 "852432" "PER" .7964094 8 "852432" "URY" .7964094 8 "852432" "USA" .7964094 8 "852439" "MEX" .20564187 8 "852439" "COL" .20564187 8 "852439" "USA" .20564187 end

The only way I thought would be to preserve, delete the column that varies across observations (i.e j), get the unique values exp_shareUS_k within the subgroups using collapse (i.e group_exp + product), sum those within group_exp, and then merging that back to the original data. I was looking for something easier than that

Last edited by Arthur Carvalho Brito; 24 Sep 2023, 20:56.
Tags: collapse, data
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#2

24 Sep 2023, 22:11

Taking literally what you say you want, the following will do:

Code:

egen tag = tag(group_exp exp_shareUS_k) by group_exp, sort: egen ss = total(cond(tag, exp_shareUS_k, .))

But is that really what you want? In the example data there is a 1:1 correspondence between product and exp_shareUS_k. But it is not hard to imagine that in the full set of data, there could be two (or more) products that have the same value of exp_shareUS_k. In that case, this code would count that value of exp_shareUS_k only once. Is that what you want? Or do you really want to sum the values of exp_shareUS-k for distinct values of product? Your proposed method following the example data, wherein you propose to collapse on group_exp and product suggests that is really distinct products you are interested in, not distinct values of exp_shareUS_k. So if I'm right and you really want to sum the values of exp_shareUS_k for distinct products within groups:

Code:

egen tag = tag(group_exp product) by group_exp, sort: egen ss = total(cond(tag, exp_shareUS_k, .))
1 like
Comment
Tahira Khan

Join Date: Sep 2023

Posts: 2
#3

24 Sep 2023, 22:21

Hello I have a question related to heckman model and how it run on stata?
Comment
Tahira Khan

Join Date: Sep 2023

Posts: 2
#4

24 Sep 2023, 22:31

Hi how to create a participation variable in Stata?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35775
#5

25 Sep 2023, 02:24

Tahira Khan Please start a new thread for each new question that you have.
Comment
Arthur Carvalho Brito

Join Date: Jan 2021

Posts: 45
#6

25 Sep 2023, 09:32

Clyde Schechter thank you!

Your second approach is more robust, and it does the job. While highly unlikely, it could be that different products in the same group have the same exp_share_US_k
Comment

Announcement

Add column with the sum of unique value of subgroups

Comment

Comment

Comment

Comment

Comment