Calculation of Blau Index / Simpson Index

Sascha Mueller

Join Date: Jul 2022
Posts: 2

Calculation of Blau Index / Simpson Index

26 Jul 2022, 13:40

Hello everyone!

I am trying to analyze the degree of national and gender diversity of TMTs of german companies over a period of 10 years. The Blau Index seems to be the ideal measurement, but after trying out different possibilities, I could not find the right solution.

The formula for the Blau Index is the following:

Click image for larger version

Name: Unbenannt.JPG
Views: 1
Size: 9.4 KB
ID: 1675229

the data set:

Company_ID	year	gender	nationality	BlauGender	BlauNationality
1	2010	male	German
	2010	female	German
	2011	male	Austrian
	2011	female	German
2	2013	male	Australian
	2013	male	German
	2013	male	Dutch
	2014	female	Dutch
	2014	female	Dutch

The data has to be aggregated on a company-year-level. The results of gender diversity and national diversity for every company and every year should be shown in the columns "BlauGender" and "BlauNationality".

I would appreciate any tips or suggestions on how to solve this problem. Thank you very much!

Tags: None

Mike Lacy

Join Date: Apr 2014

Posts: 2449
#2

26 Jul 2022, 21:22

This measure of nominal variable variation you want has a different name in almost every discipline, and sometimes even has multiple names within the same discipline. (Sociologists, for example, know it as both the Blau Index and in a normalized form as the Index of Qualitative Variation.) "Blau Index" is not a a very commonly used name, which would explain why something like -search blau index- doesn't help. I'd also note that if the acronym "TMT" matters to the interpretation of your question, you'd want to describe what that is, since list members represent a wide range of disciplines and won't necessarily know what that is. (I don't.)

Two relevant user-contributed programs in Stata are -divcat- and -entropyetc-, which compute the measure of interest or its complement. Each of these programs is available via SSC, about which see e.g., -ssc describe entropyetc-. You'll need to read the related help files carefully to get the measure in the form you want. Note that -entropyetc- allows a by() list, and if I understand you correctly, you'll want something like ... by(company year).

I would also think that having string values for gender and nationality could be a source of trouble, since spelling errors or variations in names would cause problems. If it were me, I'd create new numeric versions of those variables, doing something along the way to handle such difficulties.

If you find you need more help with this, posting a good data example with -dataex-, as described in the StataList FAQ for new list members, would be useful.
2 likes
Comment

Announcement

Calculation of Blau Index / Simpson Index

Comment