Herfindahl - Hirschmann Index calculation with 2 variables

Farid Mammadaliyev

Join Date: Aug 2018

Posts: 33
#31

20 Feb 2019, 08:21

Dear Nick,

I meant that I was too close to find HHI for my class variable. I did it with a tag function. But tag does not report how many times a distinct value is being repeated, it only reports whether or not a value appears in the group before the focal observation which made my results for HHI wrong. I thought that the logic of this approach was close to Daniel's. Therefore, I found your warning ("If all categories are represented in all groups, then that would work") so relevant to my approach.

How tag does:

Firm year class tag

Firm A 2000 105 1

Firm A 2000 107 1

Firm A 2000 107 0

Firm A 2000 107 0

What do I need to find HHI?

Firm year class Any function

Firm A 2000 105 1

Firm A 2000 107 3

Firm A 2000 107 0

Firm A 2000 107 0

Question here is: Is there any function in Stata which gives me a count how many times a distinct value is being repeated in the group?

Best regards, Farid
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#32

20 Feb 2019, 08:50

Code:

help tabulate
Comment
Leo Schmidt

Join Date: Apr 2019

Posts: 1
#33

06 Apr 2019, 17:10

Dear all,

i am using the HHI as a measure for the degree of specialization of individual employees.
For each individual i, i observed the number of years he or she previously worked in one industry-category, based on 2-3 employments.

Based on your earlier posts, i undestand that the Herfindahl score shall be calculated by summing the squared shares of all experiences through the time period for each individual. However, i stuck here with the execution.
person industry 1 tenure 1 industry 2 tenure 2 industry 3 tenure 3

1 8881 3 8881 6

2 7123 2 8881 5 9911 3

3 6982 7 7493 6 3352 2

Your help is highly appreciated!
Comment

carlton durrant

Join Date: May 2020
Posts: 10

#34

09 Jun 2020, 11:01

Dear Nick
I am seeking similar assistance on HHI index based on total income and total assets per credit union in each country similar to that of Mohina. I would like to compute the hhi yearly for each credit union per island country. I would like to compute the concentration ratio per island for each credit union for each year. I am attaching the sample for st vincent only below but the entire file for all seven islands is attached.

Island	ID	CU	YEAR	TOTINC	TOTAST
St Vincent	1	GECCU	2009	10,354,327	142,998,310
St Vincent	1	GECCU	2010	10,398,687	149,047,458
St Vincent	1	GECCU	2011	12,138,542	152,551,061
St Vincent	1	GECCU	2012	11,270,081	159,944,699
St Vincent	1	GECCU	2013	11,174,077	166,839,901
St Vincent	1	GECCU	2014	11,797,644	179,320,697
St Vincent	1	GECCU	2015	12,428,461	190,855,037
St Vincent	1	GECCU	2016	13,578,058	210,379,156
St Vincent	1	GECCU	2017	15,236,572	231,914,947
St Vincent	1	GECCU	2018	16,442,983	258,826,523
St Vincent	2	Kccu	2009	4,515,787	55,226,160
St Vincent	2	Kccu	2010	4,709,019	56,533,034
St Vincent	2	Kccu	2011	5,081,558	60,829,313
St Vincent	2	Kccu	2012	5,342,034	65,331,538
St Vincent	2	Kccu	2013	5,323,653	69,052,753
St Vincent	2	Kccu	2014	5,763,040	78,659,710
St Vincent	2	Kccu	2015	6,412,253	84,194,080
St Vincent	2	Kccu	2016	6,606,917	91,959,291
St Vincent	2	Kccu	2017	7,013,308	99,081,220
St Vincent	2	Kccu	2018	7,391,305	103,275,280

Attached Files

HHI index.xlsx (38.5 KB, 1 view)

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35698
#35

09 Jun 2020, 11:16

#34 Good that you have found a relevant thread, but spreadsheet attachments are a no-no here (FAQ Advice #12). Code for your problem follows from other answers in this thread. That was the implicit answer to #33 too.
1 like
Comment

carlton durrant

Join Date: May 2020
Posts: 10

#36

09 Jun 2020, 16:05

YEAR	CUID	COID	hhi_TOTAST	hhi_TOTINC
2009	1	1	1	1
2010	1	1	1	1
2011	1	1	1	1
2012	1	1	1	1
2013	1	1	1	1
2014	1	1	1	1
2015	1	1	1	1
2016	1	1	1	1
2017	1	1	1	1
2018	1	1	1	1
2009	2	1	1	1
2010	2	1	1	1
2011	2	1	1	1
2012	2	1	1	1
2013	2	1	1	1
2014	2	1	1	1
2015	2	1	1	1
2016	2	1	1	1
2017	2	1	1	1
2018	2	1	1	1

hhi TOTAST TOTINC, by(YEAR CUID COID) outfile replace

The above command was what I used but I got all Ones . what am I doing wrong.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35698
#37

10 Jun 2020, 05:24

carlton durrant You're struggling here but none of this is rocket surgery or brain science.

The key advice here is simple. Please do read https://www.statalist.org/forums/help#stataThe idea is just: Show us a data example we can use (easily), or else we're entitled to shrug our shoulders and get back to the day job.

Going against my personal rule I tried to look at your spreadsheet but my copy of Excel refuses to read it. That is the sort of experience that puts many people here off trying to look at someone else's spreadsheet files.

Your #36 just shows results but the problem there in using hhi (from SSC) was already explained in #13 of this thread. hhi will necessarily return 1 for single observations as a single value is 100% of its own total and a proportion of 1, squared, is nothing but 1 again. .

I don't know much about hhi, which I didn't write. I do know more about entropyetc (SSC), which I did write. But at most they are convenience commands for a simple calculation, given that this measure and most like it are defined by one line of algebra.

I took your earlier data display -- which looks to me like copy-and-paste from Excel rather than a Stata listing -- and with some editing turned it into a listing of the kind we ask for here. The calculation is then divisible into (1) calculate the proportions you want (2) square them and add up the squares. That's all it is.

If your variable names are different, then your code needs to be different accordingly.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str10 island str5 cu int year long(totinc totast) "St Vincent" "GECCU" 2009 10354327 142998310 "St Vincent" "GECCU" 2010 10398687 149047458 "St Vincent" "GECCU" 2011 12138542 152551061 "St Vincent" "GECCU" 2012 11270081 159944699 "St Vincent" "GECCU" 2013 11174077 166839901 "St Vincent" "GECCU" 2014 11797644 179320697 "St Vincent" "GECCU" 2015 12428461 190855037 "St Vincent" "GECCU" 2016 13578058 210379156 "St Vincent" "GECCU" 2017 15236572 231914947 "St Vincent" "GECCU" 2018 16442983 258826523 "St Vincent" "Kccu" 2009 4515787 55226160 "St Vincent" "Kccu" 2010 4709019 56533034 "St Vincent" "Kccu" 2011 5081558 60829313 "St Vincent" "Kccu" 2012 5342034 65331538 "St Vincent" "Kccu" 2013 5323653 69052753 "St Vincent" "Kccu" 2014 5763040 78659710 "St Vincent" "Kccu" 2015 6412253 84194080 "St Vincent" "Kccu" 2016 6606917 91959291 "St Vincent" "Kccu" 2017 7013308 99081220 "St Vincent" "Kccu" 2018 7391305 103275280 end egen p = pc(totinc), by(island year) prop egen HHI = total(p^2), by(island year) tabdisp year island, c(HHI) format(%4.3f) ---------------------- | Island YEAR | St Vincent ----------+----------- 2009 | 0.577 2010 | 0.571 2011 | 0.584 2012 | 0.564 2013 | 0.563 2014 | 0.559 2015 | 0.551 2016 | 0.560 2017 | 0.568 2018 | 0.572 ----------------------

Last edited by Nick Cox; 10 Jun 2020, 06:19.
1 like
Comment
carlton durrant

Join Date: May 2020

Posts: 10
#38

10 Jun 2020, 20:47

Thanks Nick .I tried the command for all seven islands and it worked well. The only hurdle was converting the string data for income and assets to long format.
gen p = pc(totast), by(island year) prop

. egen HHI = total(p^2), by(island year)

. tabdisp year island, c(HHI) format(%4.3f)

--------------------------------------------------------------------------------------------------------------------------
| ISLAND
YEAR | Antigua Dominica Grenada Montserratt St Kitts Nevis St Vincent St. Lucia
----------+---------------------------------------------------------------------------------------------------------------
2009 | 0.466 0.516 0.309 1.000 0.415 0.389 0.225
2010 | 0.462 0.551 0.313 1.000 0.415 0.387 0.221
2011 | 0.471 0.557 0.314 1.000 0.424 0.380 0.215
2012 | 0.481 0.567 0.316 1.000 0.426 0.376 0.210
2013 | 0.493 0.568 0.314 1.000 0.429 0.371 0.209
2014 | 0.495 0.556 0.314 1.000 0.425 0.364 0.207
2015 | 0.489 0.556 0.296 1.000 0.422 0.362 0.201
2016 | 0.500 0.552 0.295 1.000 0.415 0.358 0.192
2017 | 0.508 0.556 0.295 1.000 0.411 0.365 0.186
2018 | 0.527 0.540 0.297 1.000 0.399 0.366 0.181
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#39

11 Jun 2020, 00:14

Good, and thanks for the report.

Had you shown a Stata example of the kind requested we would have certainly explained about a need to destring.

For the record, in Stata long is a variable or storage type, not a (display) format. What terms are in use elsewhere is a different and small question.
1 like
Comment
Huthayfa Nabeel

Join Date: Apr 2020

Posts: 16
#40

12 Jul 2020, 07:07

Thank you all for the information in this valuable thread

Last edited by Huthayfa Nabeel; 12 Jul 2020, 07:15.
1 like
Comment
Huthayfa Nabeel

Join Date: Apr 2020

Posts: 16
#41

12 Jul 2020, 08:02

Dear Nick,

I have the same problem as #15. I followed the answers in this thread and in other threads. Unfortunately, the problem has not been solved yet.

I have panel data (n=1260 , t=9). I've tried to run

Code:

entropyetc ta , by( year country_code)

The variable country_code categorize the ta into 26 categories. Stata output still gives me

HTML Code:

too many values r(134);

Any suggested solution wi bee highly appreciated.
Comment
Sanne Jansen

Join Date: Jun 2021

Posts: 4
#42

08 Jun 2021, 03:52

Hi all,

This is my first post and I am doing my master thesis. For the analysing of my data I would like to calculate the Herfindahl-Hirschman Index for the following scenario. I would like to know the market concentration of certain healthcare companies on the basis of the number of clients. An example of the data is shown below:

Firm_name Region Numberofclients

Aafje Burghsluissingel Rotterdam 58

Aafje De Nieuwe Plantage Rotterdam 75

't Verlaet Zeeland 10

't Vonder Drenthe 29

Currently I used this code but the only result I get is the same number (1) for all the rows and (0) if data is missing

Code:
ssc instal hhi
hhi Numberofclients, by(Region Firm_name)

I hope someone can help me to find the right command so I will have the market concentration of every firm per region, based on the number of clients

Best regards,

Sanne Jansen
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#43

08 Jun 2021, 04:18

The data appear to be pairs of Region Firm_name so the sum of squared probabilities for each combination is identically 1 for non-missing values.

You can look at concentration by region or by firm, but not both.
Comment
Sanne Jansen

Join Date: Jun 2021

Posts: 4
#44

08 Jun 2021, 05:52

Thank you Nick, this explains a lot, but what would you recommend me to do to calculate the market concentration of the firms per region, based on the number of clients? Should I make a new stata file per Region or are there other (smarter and faster) ways?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#45

08 Jun 2021, 06:13

I didn't write hhi but my understanding is that your choices are limited to

Code:

hhi number, by(region) hhi number, by(firm_name)

and I don't see why a different dataset is thought to be needed.
Comment

Firm	year	class	tag
Firm A	2000	105	1
Firm A	2000	107	1
Firm A	2000	107	0
Firm A	2000	107	0

Firm	year	class	Any function
Firm A	2000	105	1
Firm A	2000	107	3
Firm A	2000	107	0
Firm A	2000	107	0

person	industry 1	tenure 1	industry 2	tenure 2	industry 3	tenure 3
1	8881	3	8881	6
2	7123	2	8881	5	9911	3
3	6982	7	7493	6	3352	2

Firm_name	Region	Numberofclients
Aafje Burghsluissingel	Rotterdam	58
Aafje De Nieuwe Plantage	Rotterdam	75
't Verlaet	Zeeland	10
't Vonder	Drenthe	29

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment