Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate disparity measure (between group members)

    Dear all,

    I am relatively new to Stata and I am having trouble translating the following formula to Stata code.

    I am trying to calculate a disparity measure that captures the square root of the mean squared distance in a demographic characteristic of a team member i from all other team members. The measure is expressed as follows (O'Reilly III et al., 1989):
    Click image for larger version

Name:	Disparity.PNG
Views:	1
Size:	8.5 KB
ID:	1409967

    Si is the demographic characteristic of team member i, and Sj is the characteristic of the jth team member of a group with n team members.

    I have a data set with 65 teams that have between 2-5 team members. There are few missing values for the demographic characteristics I want to use. I am using Stata 14.

    Can anyone help me how to calculate this measure in Stata? I highly appreciate any advice.

  • #2
    This problem yields to some algebra. Let's set aside the square root, as we can just apply that at the end. Let's look at that sum of squres.

    Code:
    di2 = Sum(S - Sj2)/n = Sum (S2 - 2S*Sj + Sj2)/n
    
    = [nS2 - 2S*Sum(Sj) + Sum(Sj2)] / n
    
    = S2 - 2S*Mean(Sj) + Mean(Sj2)
    So we do not actually need to calculate all of the different Si - Sj to do this: all we need are the means of Sj and Sj2, these being available from -egen-

    So, assuming your data are laid out long:

    Code:
    by team, sort: egen mean_S = mean(S)
    by team: egen mean_S2 = mean(S*S)
    gen d = sqrt(S^2 - 2S*mean_S + mean_S2)
    Note: Because you did not provide example data, this code is not tested. Beware of typos. Also, if your data are not laid out long, the code is not applicable to your data: use -reshape- first.

    Comment


    • #3
      Dear Clyde, thank you very much. My data is laid out long and the code gave me the correct results.

      Comment


      • #4
        Dear all,

        this is an interesting conversation. The code seems to be fine for a binary outcome (e.g. gender).

        But what if we have more than 2 categories such as in Ethnicity? For instance in a group of 4 individuals where 1 is German, 1 is Italian and 2 Americans the code above does not apply properly.
        Also what about a continuous variable (i.e., age)?

        Could you please help?

        Thanks in advance.

        Comment


        • #5
          Actually, that formula is most ideally suited to a continuous variable such as age. It can be applied to a dichotomy such as sex as well, although it's not necessarily the best measure of heterogeneity within a team. For a multi-level category variable it would be completely unsuitable. In that case, the first thing that comes to my mind is the negative of the entropy of the distribution. There are, no doubt, many other possibilities.

          Comment

          Working...
          X