Measuring HHI for datasets with lots of observations

Robin Naessens

Join Date: Mar 2023

Posts: 1
#1

Measuring HHI for datasets with lots of observations

19 Mar 2023, 11:18

Hello everyone

I am a thesis student from Belgium working on my thesis about labour market concentration. For my research, I need to calculate the HHI for multiple observations. Below, you find an example of the data which I have to use. The NACE Code represents the sector. I need to calculate the HHI for each NACE code and for each year. So for code 1 I need to have a result for all the years in a range from 1985 until 2014, and so on. Every row in the table represents one firm. So in my dataset there are multiple rows of data for the same NACE Code and year.

NACE Code YEAR EMPLOYEES

1 1985 14

1 2010 5

2 1999 88

14 2000 79

15 1987 3

27 1992 45

45 2014 777

66 1998 23

97 2000 14

I already read some helpful posts on this forum about the HHI. For example, I tried to use the entropyetc command. However, I have a lot of data. In total, my dataset consists of 7,5 million rows of data. If I use the collapse command to decrease the amount of rows, I get the same problem of having to much rows of data.
One thing I was thinking as a solution was working with some kind of loop command so that I can loop the same command for the whole dataset. However, I don't really know how to do this. I've tried some things so far but none of them have given me the desired result. I was wondering if anyone can help me out with this problem?

Thank you in advance!
Kind Regards
Robin Naessens
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35612

19 Mar 2023, 12:24

It's best not to assume that every reader knows what is HHI, any more than an economist should be expected to know about immunoglobulin or drainage density. Your data example is helpful, but includes only (code, year) singletons, so I messed with it to show two lines of code that show the sum of squared proportions in a more challenging case too. If you want the HHI in some other persona, the change in code should be straightforward.

Naming this beast for Herfindahl and Hirschman is historically perverse, as it was in use twenty or more years before either economist thought about it.

Code:

clear
input NACE_Code    YEAR    EMPLOYEES
1    1985    14
1    2010    5
2    1999    44
2   1999    22
2   1999    11
2   1999    11 
14    2000    79
15    1987    3
27    1992    45
45    2014    777
66    1998    23
97    2000    14
end 

bysort NACE_Code YEAR : egen p = pc(EMPLOYEES), prop

by NACE_Code YEAR : egen HHI = total(p^2)

list, sepby(NACE_Code YEAR)

     +--------------------------------------------+
     | NACE_C~e   YEAR   EMPLOY~S      p      HHI |
     |--------------------------------------------|
  1. |        1   1985         14      1        1 |
     |--------------------------------------------|
  2. |        1   2010          5      1        1 |
     |--------------------------------------------|
  3. |        2   1999         44     .5   .34375 |
  4. |        2   1999         22    .25   .34375 |
  5. |        2   1999         11   .125   .34375 |
  6. |        2   1999         11   .125   .34375 |
     |--------------------------------------------|
  7. |       14   2000         79      1        1 |
     |--------------------------------------------|
  8. |       15   1987          3      1        1 |
     |--------------------------------------------|
  9. |       27   1992         45      1        1 |
     |--------------------------------------------|
 10. |       45   2014        777      1        1 |
     |--------------------------------------------|
 11. |       66   1998         23      1        1 |
     |--------------------------------------------|
 12. |       97   2000         14      1        1 |
     +--------------------------------------------+

Announcement

Measuring HHI for datasets with lots of observations

Comment