Generating group variables under changing group constellation

Barbara Voe

Join Date: Nov 2019

Posts: 6
#1

Generating group variables under changing group constellation

19 Nov 2019, 05:15

Hello Statalisters,

I have a dataset with each row equaling one team member (see below). These team members enter their team at different times (Member_Entry). Thus, team constellation changes over time (e.g. team A has 2 members until 1.1.2018, then have 5 members until 1.1.2019, then they have 6).
I want to calculate age diversity (using SD/mean) for each group constellation. That means, in the first and second row, the generated variable would be age diversity for team A with two members aged 30 and 50. In the third, forth, fifth row, the output would be age diversity for team A with five members aged 30,50,40,16 and 54. In the sixth row, it would be age diversity for team A with now six member aged 30,50,40,16,54 and 29.

I just could not find a code that enables this calculation, I suspect it should be some kind of loop but I am not sure. Can you help? Thank you already.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str1 Team byte Member str10 Member_Entry byte(Member_Age Team_Age_Diversity) "A" 1 "1.1.2017" 30 . "A" 2 "1.1.2017" 50 . "A" 3 "1.1.2018" 40 . "A" 4 "1.1.2018" 16 . "A" 5 "1.1.2018" 54 . "A" 6 "1.1.2019" 29 . "B" 1 "13.01.2018" 87 . "B" 2 "20.02.2018" 15 . "B" 3 "20.02.2018" 11 . "B" 4 "23.03.2019" 65 . end
Tags: diversity, group, loop
Wouter Wakker

Join Date: Nov 2018

Posts: 621
#2

19 Nov 2019, 05:41

You'll probably want to have a look at rangestat from SSC (ssc install rangestat). I think you want something like this.

Code:

gen date = date(Member_Entry, "DMY") rangestat (mean) mean_age=Member_Age (sd) sd_age=Member_Age, interval(date, ., date) by(Team)
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35697

19 Nov 2019, 05:50

Thanks for the clear question. String dates are not fit for many Stata purposes, certainly not this one. So, first we need a numeric daily date variable. Then rangestat (SSC) can calculate the SD and mean of ages at each distinct date and their ratio (often called the coefficient of variation or CV).

It gets messier if members can leave the team but neither your exposition nor your example makes that explicit. Also, you are neglecting age change since joining!

Expressing CV in percent terms is by comparison a trivially different convention.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str1 Team byte Member str10 Member_Entry byte(Member_Age Team_Age_Diversity)
"A" 1 "1.1.2017"   30 .
"A" 2 "1.1.2017"   50 .
"A" 3 "1.1.2018"   40 .
"A" 4 "1.1.2018"   16 .
"A" 5 "1.1.2018"   54 .
"A" 6 "1.1.2019"   29 .
"B" 1 "13.01.2018" 87 .
"B" 2 "20.02.2018" 15 .
"B" 3 "20.02.2018" 11 .
"B" 4 "23.03.2019" 65 .
end

gen Date = daily(Member_Entry, "DMY")
format Date %td

rangestat (sd) SD=Member_Age (mean) Mean=Member_Age, interval(Date . 0) by(Team)  

gen CV = SD/Mean

list Team Date SD Mean CV, sepby(Team Date)

     +-----------------------------------------------------+
     | Team        Date          SD        Mean         CV |
     |-----------------------------------------------------|
  1. |    A   01jan2017   14.142136          40   .3535534 |
  2. |    A   01jan2017   14.142136          40   .3535534 |
     |-----------------------------------------------------|
  3. |    A   01jan2018   15.427249          38   .4059802 |
  4. |    A   01jan2018   15.427249          38   .4059802 |
  5. |    A   01jan2018   15.427249          38   .4059802 |
     |-----------------------------------------------------|
  6. |    A   01jan2019   14.279356        36.5   .3912152 |
     |-----------------------------------------------------|
  7. |    B   13jan2018           .          87          . |
     |-----------------------------------------------------|
  8. |    B   20feb2018   42.770706   37.666667   1.135505 |
  9. |    B   20feb2018   42.770706   37.666667   1.135505 |
     |-----------------------------------------------------|
 10. |    B   23mar2019   37.501111        44.5   .8427216 |
     +-----------------------------------------------------+

I am not sure that I regard the CV as especially attractive here. A personal rule is that CVs are natural if and only if working on logarithmic scale makes sense, which doesn't seem established here. More on that at https://stats.stackexchange.com/ques...t-of-variation

EDIT: Did not see @Wouter Wakker's equivalent suggestion until I had posted mine.

Comment

Barbara Voe

Join Date: Nov 2019

Posts: 6
#4

19 Nov 2019, 13:36

Dear @Wouter, dear @Nick,

thank you very much for your fast and comprehensive answers! It works perfect with rangestat. However, I follow you, Nick, that I neglect the age change since joining. I need to think of a solution for this. Leaving members fortunately can be neglected.
Also, I follow you, that the CV is not especially top in this setting. I think I will reframe to age categories and use the Blau/Gini Index.

As I would also like to calculategender diversity in the different stagesof the team just as age diversity, I wanted to use the Blau Index in addition for gender. From searching the forum, I found the "divcat Member_Gender , gv gen_gv(H_new)" command to calculate the index for gender. I tried to combine it with rangestat as this: "rangestat (divcat Member_Gender , gv gen_gv(H_new)), interval(date, ., date) by(Team)", but the error message "(divcat member_gender , gv gen_gv(H_new))" is not a valid stat" shows up.

It would be great if you could give a hint on this issue as well (concrete: how to calculate the Blau Index for Gender for the changing team compositions). Gender is a dummy variable with 0=male. Thank you very much again!

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str1 Team byte Member float Member_Entry byte(Member_Age Member_Gender) "A" 1 20820 30 1 "A" 2 20820 50 1 "A" 3 21185 40 0 "A" 4 21185 16 1 "A" 5 21185 54 1 "A" 6 21550 29 0 "B" 1 21197 87 1 "B" 2 21235 15 0 "B" 3 21235 11 0 "B" 4 21631 65 0 end format %td Member_Entry

Last edited by Barbara Voe; 19 Nov 2019, 13:39.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#5

19 Nov 2019, 13:50

The mean of gender gives you the fraction of females, say f. Isn’t the Blau index then something like f squared + (1 - f) squared? So I can’t see any need for anything other than what rangestat will provide without further programming.

Aging members in place is already messy. Unless, or indeed if, you know birthdays it is messier still. I am going to set that as a project for extra credit.
1 like
Comment
Barbara Voe

Join Date: Nov 2019

Posts: 6
#6

19 Nov 2019, 14:53

Thank you very much, you are right! Very much appreciated!
Comment

Announcement