Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Herfindahl - Hirschmann Index calculation with 2 variables

    Hi all,

    this is my first post here. I am currently doing my master thesis. As part of this, I would like to calculate the Herfindahl-Hirschman Index for the following scenario. To understand what I am talking about, I have attached a table. The first column depicts different IDs, e.g. the first five rows have the same ID. The second column depicts different regions for the corresponding ID. I would now like to calculate the HHI index based on these two columns. I computed the HHI index manually and have attached it as a third column to show the results that I would like to have. My real data set, however, encompasses >5000 rows. As can be seen from the HHI results, I want to compute a separate HHI_region for each ID.

    So far I have tried different commands, such as "hhi Region, by(ID)", however the results I get are wrong.

    If anyone can help me out, this would be great.

    Thank you for help.

    Best regards,
    Filipp
    ID Region HHI_region
    1 1 0.36
    1 2 0.36
    1 2 0.36
    1 3 0.36
    1 3 0.36
    2 1 0.55
    2 1 0.55
    2 2 0.55
    3 1 1

  • #2
    Filipp:
    welcome to this forum.
    -search Herfindahl-Hirschman- will take you to some promising user-written commands.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      There are many such commands that are community-contributed.

      See e.g. entropyetc (SSC). https://www.statalist.org/forums/for...lable-from-ssc

      The Herfindahl-Hirschman [not Hirschmann] measure is named for SImpson (the same Simpson as is honoured with the name Simpson's paradox) in that program output, partly to remind economists that they didn't invent everything and partly because it was invented by Gini. I get 0.556 not 0.55 but that's your rounding: (2/3)^2 + (1/3)^2 = 5/9.without doubt.

      Code:
      clear
      input ID    Region    HHI_region
      1    1    0.36
      1    2    0.36
      1    2    0.36
      1    3    0.36
      1    3    0.36
      2    1    0.55
      2    1    0.55
      2    2    0.55
      3    1    1
      end
      
      entropyetc Region, by(ID)
      
      ----------------------------------------------------------------------
          Group |  Shannon H      exp(H)     Simpson   1/Simpson     dissim.
      ----------+-----------------------------------------------------------
              1 |      1.055       2.872       0.360       2.778       0.133
              2 |      0.637       1.890       0.556       1.800       0.333
              3 |      0.000       1.000       1.000       1.000       0.667
      ----------------------------------------------------------------------
      entropyetc has a generate() option, etc.
      Last edited by Nick Cox; 20 Jun 2018, 09:31.

      Comment


      • #4
        Download entropyetc from SSC, then

        Code:
        clear
        input ID Region    HHI_region
        1    1    0.36
        1    2    0.36
        1    2    0.36
        1    3    0.36
        1    3    0.36
        2    1    0.55
        2    1    0.55
        2    2    0.55
        3    1    1
        end
        
        * ssc install entropyetc
        entropyetc Region , by(ID)
        seems to yield the desired result

        Code:
        . entropyetc Region , by(ID)
        
        ----------------------------------------------------------------------
            Group |  Shannon H      exp(H)     Simpson   1/Simpson     dissim.
        ----------+-----------------------------------------------------------
                1 |      1.055       2.872       0.360       2.778       0.133
                2 |      0.637       1.890       0.556       1.800       0.333
                3 |      0.000       1.000       1.000       1.000       0.667
        ----------------------------------------------------------------------
        Note that Herfindahl-Hirschman is called Simpson here (see the help for more information).

        Best
        Daniel

        Edit: Nick was quicker.

        Comment


        • #5
          Originally posted by Filipp Sabitzer
          Do you know if it is also possible to get the answer in the table format I posted instead of an overview of indices for each ID?
          Nick already pointed to the generate() option. Try

          Code:
          entropyetc Region , by(ID) generate(3=HHI)
          Note no spaces around the equals sign.

          Best
          Daniel

          Comment


          • #6
            Daniel picked up the major point in #5, but on a very minor point 5/9 to 2 d.p. is not 0.55.

            Comment


            • #7
              Thank you. Is it possible to do the command without generating the whole table each time? I would only like to generate a new variable in my dataset instead of printing out the entire entropy table. Do you know if this is possible?

              Comment


              • #8
                If "the command" means entropyetc (SSC) then

                Code:
                quietly 
                as a prefix will suppress the display.

                Comment


                • #9
                  Great, thank you. That is what I was looking for.

                  Comment


                  • #10
                    Hi all,

                    I extended my dataset so that I have roughly 25000 observations for each of my variables now. As a result if I try to use the entropyetc command for exactly the same purpose as described in this post I receive an error message saying "too many values". Does anyone know a work around for this issue? I would greatly appreciate any help.

                    Thank you.

                    Comment


                    • #11
                      25000 observations ("for each of my variables" is redundant) shouldn't bite so far as I can see. I'd need to know where and why the command was failing. You could use set trace for that purpose.

                      Otherwise look at the code and write your own program. That's an option.

                      Warning: I won't be active here from soon for about a week.
                      Last edited by Nick Cox; 19 Jul 2018, 13:03.

                      Comment


                      • #12
                        Hello all, I am posting query with regards to HHI which might sound very trivial to many of you. Yet I am not able to solve that even after going through many threads on this topic. Really it would be of great help if you can let me know where am i going wrong. Below I am attaching a (similar) dataset that I am working on. So each firm in a particular year makes certain sales. Now I need to estimate HHI year wise. So for the attached dataset the HHI is mentioned in second table

                        Firm_id Firm Year NetSales
                        1 A 2000 2
                        2 B 2000 3
                        3 C 2000 4
                        1 A 2001 5
                        2 B 2001 6
                        3 C 2001 7
                        Firm_id Firm Year NetSales HHI
                        1 A 2000 2 3.22
                        2 B 2000 3 3.22
                        3 C 2000 4 3.22
                        1 A 2001 5 6.11
                        2 B 2001 6 6.11
                        3 C 2001 7 6.11
                        To reach this I use the following code: hhi netsales by( year, firm_id) (Apologize me for not putting the code in the appropriate way as it has been put in the thread).
                        This gives me an error like this: "factor variables and time-series operators not allowed"

                        Next when I use the code: hhi netsales, then it gives me same HHI for all years as shown below (please note the calculated HHI is also not what I aimed at)

                        firm_id firm year netsales hhi_netsales
                        1 A 2000 2 .1906721
                        2 B 2000 3 .1906721
                        3 C 2000 4 .1906721
                        1 A 2001 5 .1906721
                        2 B 2001 6 .1906721
                        3 C 2001 7 .1906721


                        I am not bale to get where am I going wrong.

                        Please help me in resolving this.

                        Comment


                        • #13
                          I think you just got the comma in the wrong place:

                          Code:
                          hhi netsales, by(year)
                          If I understand your data correctly it makes no sense to ask


                          Code:
                          hhi netsales, by(year firm_id)
                          as that just works on singleton observations.

                          hhi has been mentioned already in the thread (#1) but you're asked to explain that it comes from SSC.

                          Comment


                          • #14
                            Thanks for the quick feedback. I tried [CODE
                            hhi netsales, by(year) ][/CODE]

                            and it worked so well.

                            Comment


                            • #15
                              Dear Nick,

                              I also measure HHI index for patents classes. My database contains around 2 mln observations (patents). Therefore, it says "too mant values". How can I deal with this problem?

                              Best regards, Farid

                              Comment

                              Working...
                              X