Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Measuring HHI for datasets with lots of observations

    Hello everyone

    I am a thesis student from Belgium working on my thesis about labour market concentration. For my research, I need to calculate the HHI for multiple observations. Below, you find an example of the data which I have to use. The NACE Code represents the sector. I need to calculate the HHI for each NACE code and for each year. So for code 1 I need to have a result for all the years in a range from 1985 until 2014, and so on. Every row in the table represents one firm. So in my dataset there are multiple rows of data for the same NACE Code and year.
    NACE Code YEAR EMPLOYEES
    1 1985 14
    1 2010 5
    2 1999 88
    14 2000 79
    15 1987 3
    27 1992 45
    45 2014 777
    66 1998 23
    97 2000 14
    I already read some helpful posts on this forum about the HHI. For example, I tried to use the entropyetc command. However, I have a lot of data. In total, my dataset consists of 7,5 million rows of data. If I use the collapse command to decrease the amount of rows, I get the same problem of having to much rows of data.
    One thing I was thinking as a solution was working with some kind of loop command so that I can loop the same command for the whole dataset. However, I don't really know how to do this. I've tried some things so far but none of them have given me the desired result. I was wondering if anyone can help me out with this problem?

    Thank you in advance!
    Kind Regards
    Robin Naessens

  • #2
    It's best not to assume that every reader knows what is HHI, any more than an economist should be expected to know about immunoglobulin or drainage density. Your data example is helpful, but includes only (code, year) singletons, so I messed with it to show two lines of code that show the sum of squared proportions in a more challenging case too. If you want the HHI in some other persona, the change in code should be straightforward.

    Naming this beast for Herfindahl and Hirschman is historically perverse, as it was in use twenty or more years before either economist thought about it.

    Code:
    clear
    input NACE_Code    YEAR    EMPLOYEES
    1    1985    14
    1    2010    5
    2    1999    44
    2   1999    22
    2   1999    11
    2   1999    11 
    14    2000    79
    15    1987    3
    27    1992    45
    45    2014    777
    66    1998    23
    97    2000    14
    end 
    
    bysort NACE_Code YEAR : egen p = pc(EMPLOYEES), prop
    
    by NACE_Code YEAR : egen HHI = total(p^2)
    
    list, sepby(NACE_Code YEAR)
    
         +--------------------------------------------+
         | NACE_C~e   YEAR   EMPLOY~S      p      HHI |
         |--------------------------------------------|
      1. |        1   1985         14      1        1 |
         |--------------------------------------------|
      2. |        1   2010          5      1        1 |
         |--------------------------------------------|
      3. |        2   1999         44     .5   .34375 |
      4. |        2   1999         22    .25   .34375 |
      5. |        2   1999         11   .125   .34375 |
      6. |        2   1999         11   .125   .34375 |
         |--------------------------------------------|
      7. |       14   2000         79      1        1 |
         |--------------------------------------------|
      8. |       15   1987          3      1        1 |
         |--------------------------------------------|
      9. |       27   1992         45      1        1 |
         |--------------------------------------------|
     10. |       45   2014        777      1        1 |
         |--------------------------------------------|
     11. |       66   1998         23      1        1 |
         |--------------------------------------------|
     12. |       97   2000         14      1        1 |
         +--------------------------------------------+

    Comment

    Working...
    X