Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculation of Blau Index / Simpson Index

    Hello everyone!

    I am trying to analyze the degree of national and gender diversity of TMTs of german companies over a period of 10 years. The Blau Index seems to be the ideal measurement, but after trying out different possibilities, I could not find the right solution.

    The formula for the Blau Index is the following:

    Click image for larger version

Name:	Unbenannt.JPG
Views:	1
Size:	9.4 KB
ID:	1675229


    the data set:


    Company_ID year gender nationality BlauGender BlauNationality
    1 2010 male German
    2010 female German
    2011 male Austrian
    2011 female German
    2 2013 male Australian
    2013 male German
    2013 male Dutch
    2014 female Dutch
    2014 female Dutch
    The data has to be aggregated on a company-year-level. The results of gender diversity and national diversity for every company and every year should be shown in the columns "BlauGender" and "BlauNationality".

    I would appreciate any tips or suggestions on how to solve this problem. Thank you very much!

  • #2
    This measure of nominal variable variation you want has a different name in almost every discipline, and sometimes even has multiple names within the same discipline. (Sociologists, for example, know it as both the Blau Index and in a normalized form as the Index of Qualitative Variation.) "Blau Index" is not a a very commonly used name, which would explain why something like -search blau index- doesn't help. I'd also note that if the acronym "TMT" matters to the interpretation of your question, you'd want to describe what that is, since list members represent a wide range of disciplines and won't necessarily know what that is. (I don't.)

    Two relevant user-contributed programs in Stata are -divcat- and -entropyetc-, which compute the measure of interest or its complement. Each of these programs is available via SSC, about which see e.g., -ssc describe entropyetc-. You'll need to read the related help files carefully to get the measure in the form you want. Note that -entropyetc- allows a by() list, and if I understand you correctly, you'll want something like ... by(company year).

    I would also think that having string values for gender and nationality could be a source of trouble, since spelling errors or variations in names would cause problems. If it were me, I'd create new numeric versions of those variables, doing something along the way to handle such difficulties.

    If you find you need more help with this, posting a good data example with -dataex-, as described in the StataList FAQ for new list members, would be useful.

    Comment

    Working...
    X