Calculate disparity measure (between group members)

Sophie Dibbern

Join Date: Sep 2017

Posts: 23
#1

Calculate disparity measure (between group members)

11 Sep 2017, 06:14

Dear all,

I am relatively new to Stata and I am having trouble translating the following formula to Stata code.

I am trying to calculate a disparity measure that captures the square root of the mean squared distance in a demographic characteristic of a team member i from all other team members. The measure is expressed as follows (O'Reilly III et al., 1989):

S_iis the demographic characteristic of team member i, and S_jis the characteristic of the jth team member of a group with n team members.

I have a data set with 65 teams that have between 2-5 team members. There are few missing values for the demographic characteristics I want to use. I am using Stata 14.

Can anyone help me how to calculate this measure in Stata? I highly appreciate any advice.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#2

11 Sep 2017, 08:36

This problem yields to some algebra. Let's set aside the square root, as we can just apply that at the end. Let's look at that sum of squres.

Code:

d_i² = Sum(S - S_j²)/n = Sum (S² - 2S*S_j + S_j²)/n = [nS² - 2S*Sum(S_j) + Sum(S_j²)] / n = S² - 2S*Mean(S_j) + Mean(S_j²)

So we do not actually need to calculate all of the different S_i - S_j to do this: all we need are the means of S_j and S_j², these being available from -egen-

So, assuming your data are laid out long:

Code:

by team, sort: egen mean_S = mean(S) by team: egen mean_S2 = mean(S*S) gen d = sqrt(S^2 - 2S*mean_S + mean_S2)

Note: Because you did not provide example data, this code is not tested. Beware of typos. Also, if your data are not laid out long, the code is not applicable to your data: use -reshape- first.
1 like
Comment
Sophie Dibbern

Join Date: Sep 2017

Posts: 23
#3

11 Sep 2017, 09:43

Dear Clyde, thank you very much. My data is laid out long and the code gave me the correct results.
Comment
Constantinos Mammassis

Join Date: Feb 2018

Posts: 7
#4

26 Feb 2018, 09:05

Dear all,

this is an interesting conversation. The code seems to be fine for a binary outcome (e.g. gender).

But what if we have more than 2 categories such as in Ethnicity? For instance in a group of 4 individuals where 1 is German, 1 is Italian and 2 Americans the code above does not apply properly.
Also what about a continuous variable (i.e., age)?

Could you please help?

Thanks in advance.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#5

26 Feb 2018, 09:33

Actually, that formula is most ideally suited to a continuous variable such as age. It can be applied to a dichotomy such as sex as well, although it's not necessarily the best measure of heterogeneity within a team. For a multi-level category variable it would be completely unsuitable. In that case, the first thing that comes to my mind is the negative of the entropy of the distribution. There are, no doubt, many other possibilities.
Comment

Announcement

Calculate disparity measure (between group members)

Comment

Comment

Comment

Comment