Herfindahl - Hirschmann Index calculation with 2 variables

Filipp Sabitzer

Join Date: Jun 2018

Posts: 18
#1

Herfindahl - Hirschmann Index calculation with 2 variables

20 Jun 2018, 09:05

Hi all,

this is my first post here. I am currently doing my master thesis. As part of this, I would like to calculate the Herfindahl-Hirschman Index for the following scenario. To understand what I am talking about, I have attached a table. The first column depicts different IDs, e.g. the first five rows have the same ID. The second column depicts different regions for the corresponding ID. I would now like to calculate the HHI index based on these two columns. I computed the HHI index manually and have attached it as a third column to show the results that I would like to have. My real data set, however, encompasses >5000 rows. As can be seen from the HHI results, I want to compute a separate HHI_region for each ID.

So far I have tried different commands, such as "hhi Region, by(ID)", however the results I get are wrong.

If anyone can help me out, this would be great.

Thank you for help.

Best regards,
Filipp

ID Region HHI_region

1 1 0.36

1 2 0.36

1 2 0.36

1 3 0.36

1 3 0.36

2 1 0.55

2 1 0.55

2 2 0.55

3 1 1
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17749
#2

20 Jun 2018, 09:19

Filipp:
welcome to this forum.
-search Herfindahl-Hirschman- will take you to some promising user-written commands.

Kind regards,
Carlo
(Stata 19.0)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35809

20 Jun 2018, 09:21

There are many such commands that are community-contributed.

See e.g. entropyetc (SSC). https://www.statalist.org/forums/for...lable-from-ssc

The Herfindahl-Hirschman [not Hirschmann] measure is named for SImpson (the same Simpson as is honoured with the name Simpson's paradox) in that program output, partly to remind economists that they didn't invent everything and partly because it was invented by Gini. I get 0.556 not 0.55 but that's your rounding: (2/3)^2 + (1/3)^2 = 5/9.without doubt.

Code:

clear
input ID    Region    HHI_region
1    1    0.36
1    2    0.36
1    2    0.36
1    3    0.36
1    3    0.36
2    1    0.55
2    1    0.55
2    2    0.55
3    1    1
end

entropyetc Region, by(ID)

----------------------------------------------------------------------
    Group |  Shannon H      exp(H)     Simpson   1/Simpson     dissim.
----------+-----------------------------------------------------------
        1 |      1.055       2.872       0.360       2.778       0.133
        2 |      0.637       1.890       0.556       1.800       0.333
        3 |      0.000       1.000       1.000       1.000       0.667
----------------------------------------------------------------------

entropyetc has a generate() option, etc.

Last edited by Nick Cox; 20 Jun 2018, 09:31.

Comment

daniel klein

Join Date: Mar 2014
Posts: 3890

20 Jun 2018, 09:25

Download entropyetc from SSC, then

Code:

clear
input ID Region    HHI_region
1    1    0.36
1    2    0.36
1    2    0.36
1    3    0.36
1    3    0.36
2    1    0.55
2    1    0.55
2    2    0.55
3    1    1
end

* ssc install entropyetc
entropyetc Region , by(ID)

seems to yield the desired result

Code:

. entropyetc Region , by(ID)

----------------------------------------------------------------------
    Group |  Shannon H      exp(H)     Simpson   1/Simpson     dissim.
----------+-----------------------------------------------------------
        1 |      1.055       2.872       0.360       2.778       0.133
        2 |      0.637       1.890       0.556       1.800       0.333
        3 |      0.000       1.000       1.000       1.000       0.667
----------------------------------------------------------------------

Note that Herfindahl-Hirschman is called Simpson here (see the help for more information).

Best
Daniel

Edit: Nick was quicker.

Comment

daniel klein

Join Date: Mar 2014

Posts: 3890
#5

20 Jun 2018, 10:44

Originally posted by Filipp Sabitzer

Do you know if it is also possible to get the answer in the table format I posted instead of an overview of indices for each ID?

Nick already pointed to the generate() option. Try

Code:

entropyetc Region , by(ID) generate(3=HHI)

Note no spaces around the equals sign.

Best
Daniel
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35809
#6

20 Jun 2018, 10:46

Daniel picked up the major point in #5, but on a very minor point 5/9 to 2 d.p. is not 0.55.
Comment
Filipp Sabitzer

Join Date: Jun 2018

Posts: 18
#7

05 Jul 2018, 06:30

Thank you. Is it possible to do the command without generating the whole table each time? I would only like to generate a new variable in my dataset instead of printing out the entire entropy table. Do you know if this is possible?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35809
#8

05 Jul 2018, 07:07

If "the command" means entropyetc (SSC) then

Code:

quietly

as a prefix will suppress the display.
Comment
Filipp Sabitzer

Join Date: Jun 2018

Posts: 18
#9

05 Jul 2018, 08:52

Great, thank you. That is what I was looking for.
Comment
Filipp Sabitzer

Join Date: Jun 2018

Posts: 18
#10

19 Jul 2018, 12:34

Hi all,

I extended my dataset so that I have roughly 25000 observations for each of my variables now. As a result if I try to use the entropyetc command for exactly the same purpose as described in this post I receive an error message saying "too many values". Does anyone know a work around for this issue? I would greatly appreciate any help.

Thank you.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35809
#11

19 Jul 2018, 13:01

25000 observations ("for each of my variables" is redundant) shouldn't bite so far as I can see. I'd need to know where and why the command was failing. You could use set trace for that purpose.

Otherwise look at the code and write your own program. That's an option.

Warning: I won't be active here from soon for about a week.

Last edited by Nick Cox; 19 Jul 2018, 13:03.
Comment
mohina saxena

Join Date: Mar 2016

Posts: 61
#12

22 Oct 2018, 05:57

Hello all, I am posting query with regards to HHI which might sound very trivial to many of you. Yet I am not able to solve that even after going through many threads on this topic. Really it would be of great help if you can let me know where am i going wrong. Below I am attaching a (similar) dataset that I am working on. So each firm in a particular year makes certain sales. Now I need to estimate HHI year wise. So for the attached dataset the HHI is mentioned in second table

Firm_id Firm Year NetSales

1 A 2000 2

2 B 2000 3

3 C 2000 4

1 A 2001 5

2 B 2001 6

3 C 2001 7

Firm_id Firm Year NetSales HHI

1 A 2000 2 3.22

2 B 2000 3 3.22

3 C 2000 4 3.22

1 A 2001 5 6.11

2 B 2001 6 6.11

3 C 2001 7 6.11

To reach this I use the following code: hhi netsales by( year, firm_id) (Apologize me for not putting the code in the appropriate way as it has been put in the thread).
This gives me an error like this: "factor variables and time-series operators not allowed"

Next when I use the code: hhi netsales, then it gives me same HHI for all years as shown below (please note the calculated HHI is also not what I aimed at)

firm_id firm year netsales hhi_netsales
1 A 2000 2 .1906721
2 B 2000 3 .1906721
3 C 2000 4 .1906721
1 A 2001 5 .1906721
2 B 2001 6 .1906721
3 C 2001 7 .1906721

I am not bale to get where am I going wrong.

Please help me in resolving this.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35809
#13

22 Oct 2018, 06:11

I think you just got the comma in the wrong place:

Code:

hhi netsales, by(year)

If I understand your data correctly it makes no sense to ask

Code:

hhi netsales, by(year firm_id)

as that just works on singleton observations.

hhi has been mentioned already in the thread (#1) but you're asked to explain that it comes from SSC.
Comment
mohina saxena

Join Date: Mar 2016

Posts: 61
#14

22 Oct 2018, 12:46

Thanks for the quick feedback. I tried [CODE
hhi netsales, by(year) ][/CODE]

and it worked so well.
Comment
Farid Mammadaliyev

Join Date: Aug 2018

Posts: 33
#15

08 Jan 2019, 07:27

Dear Nick,

I also measure HHI index for patents classes. My database contains around 2 mln observations (patents). Therefore, it says "too mant values". How can I deal with this problem?

Best regards, Farid
Comment

Firm_id	Firm	Year	NetSales
1	A	2000	2
2	B	2000	3
3	C	2000	4
1	A	2001	5
2	B	2001	6
3	C	2001	7

Firm_id	Firm	Year	NetSales	HHI
1	A	2000	2	3.22
2	B	2000	3	3.22
3	C	2000	4	3.22
1	A	2001	5	6.11
2	B	2001	6	6.11
3	C	2001	7	6.11

Announcement