Summarizing first row data within subgroups conditioned on ID

gusst max

Join Date: Mar 2021

Posts: 2
#1

Summarizing first row data within subgroups conditioned on ID

02 Mar 2021, 07:36

Hello!

I want to accumulate values from the first occurrences or the first row given a specific round and a specific ID. E.g. see Table below. X represents the accumulated investment from each round. Round 1 equals 100 and round 2 equals 250, hence the accumulated value for round 2 is 250. The problem is that we have duplicate values due to time variables (e.g. round 1 has investment = 100 twice, but the total investment in round 1 equals 100 and not 100+100).

I have tried the following solution: bysort ID (Round): gen x = sum(Investment). Although it resulted in that all the X: es were summarized together. Furthermore, I tried the following method: bysort ID(Round): egen x = sum(mean(Investment)) but it did not work as I hoped in Stata.

To summarize: I want to either summarize the average of each x condition on its round and ID until Round and ID ... n. Or summarize only the first observation for each round for every id. The result should give me information on the accumulation of investment amount (x) in round 2....n and be connected to the ID.

Any help is much appreciated, thank you in advance!

ID Round Investment (observation) x

1 1 100 100

1 1 100 100

1 2 150 250

1 2 150 250

2 1 200 200

2 1 200 200

2 1 200 200

2 2 300 500

2 2 300 500
Tags: None

Ali Atia

Join Date: May 2020
Posts: 737

02 Mar 2021, 07:42

Code:

clear
input ID    Round    Investment    x
1    1    100    100
1    1    100    100
1    2    150    250
1    2    150    250
2    1    200    200
2    1    200    200
2    1    200    200
2    2    300    500
2    2    300    500
end
collapse (first) Investment,by(ID Round)
bys ID (Round): gen x = sum(Investment)

. list,sepby(ID) noobs

  +-----------------------------+
  | ID   Round   Invest~t     x |
  |-----------------------------|
  |  1       1        100   100 |
  |  1       2        150   250 |
  |-----------------------------|
  |  2       1        200   200 |
  |  2       2        300   500 |
  +-----------------------------+

Or if you want to keep the duplicates for whatever reason:

Code:

clear
input ID    Round    Investment    x
1    1    100    100
1    1    100    100
1    2    150    250
1    2    150    250
2    1    200    200
2    1    200    200
2    1    200    200
2    2    300    500
2    2    300    500
end

bysort ID Round: gen y = _n==1
bys ID (Round): gen wanted = sum(Investment) if y
bysort ID Round:replace wanted =wanted[1]
drop y
. list,sepby(ID Round) noobs

  +--------------------------------------+
  | ID   Round   Invest~t     x   wanted |
  |--------------------------------------|
  |  1       1        100   100      100 |
  |  1       1        100   100      100 |
  |--------------------------------------|
  |  1       2        150   250      250 |
  |  1       2        150   250      250 |
  |--------------------------------------|
  |  2       1        200   200      200 |
  |  2       1        200   200      200 |
  |  2       1        200   200      200 |
  |--------------------------------------|
  |  2       2        300   500      500 |
  |  2       2        300   500      500 |
  +--------------------------------------+

Last edited by Ali Atia; 02 Mar 2021, 07:49.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35605

02 Mar 2021, 07:49

You can't nest egen functions. Would be attractive, sometimes, but it isn't allowed.

The following exemplifies the use of dataex as we request and also gives you one line of code as a token gesture. sum() -- which would possibly be called runningsum() if Stata were introduced today -- gives cumulative sums and the needed trick is just to make summation conditional on an observation being first in its round for the same identifier.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(ID Round) int(Investment x)
1 1 100 100
1 1 100 100
1 2 150 250
1 2 150 250
2 1 200 200
2 1 200 200
2 1 200 200
2 2 300 500
2 2 300 500
end

. bysort ID (Round) : gen wanted = sum((Round != Round[_n-1]) * Investment)

. list, sepby(ID Round)

     +--------------------------------------+
     | ID   Round   Invest~t     x   wanted |
     |--------------------------------------|
  1. |  1       1        100   100      100 |
  2. |  1       1        100   100      100 |
     |--------------------------------------|
  3. |  1       2        150   250      250 |
  4. |  1       2        150   250      250 |
     |--------------------------------------|
  5. |  2       1        200   200      200 |
  6. |  2       1        200   200      200 |
  7. |  2       1        200   200      200 |
     |--------------------------------------|
  8. |  2       2        300   500      500 |
  9. |  2       2        300   500      500 |
     +--------------------------------------+

Note: @Ali Atia's code is similar in spirit but not in detail. (Round != Round[_n-1]) is (evaluates to) 1 if true and 0 if false and multiplying anything by zero before summation is ignoring it as a contribution to that sum.

Last edited by Nick Cox; 02 Mar 2021, 07:52.

Comment

gusst max

Join Date: Mar 2021

Posts: 2
#4

02 Mar 2021, 08:29

Thank you so much Nick Cox and Ali Atia. Both of your solutions work. I really appreciate it!

Last edited by gusst max; 02 Mar 2021, 08:32.
Comment

Announcement