Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • taking mean of irregular data

    Dear Statalist participants,
    Below I give a part of my dataset. It is related with nace2 classification of industrial sectors and their HHIs(market concentration levels). But unfortunately they are published in 4 digits level (like 0610) and I need for 3 digit's value (like 061). Now I try to convert (take average) HHI's of subsectors ( ex. 0811 and 0812) to main sectors (ex. 081). As you see, the subsectors of main sectors with three digits are in an irregular size, that is, some main sectors have two subsectors others have only one, others have seven etc. I attemted to take average with "for loops" but I could not manage. How can I do that with my large dataset without manually calculating and entering the data which would take huge amount of my time!! Thank you for your interest in advance.

    Demet.


    nace2 HHI
    061
    0610 .5854457791
    062
    0620 .4798563569
    071
    0710 .2199852131
    072
    0729 .1103154682
    081
    0811 .0111453791
    0812 .0102862714
    089
    0891 .6843208716
    0892 .1855210554
    0893 .3512744743
    0899 .0748194056

  • #2
    HHI presumably means something like Herfindahl-Hirschman index. I wouldn't average HHIs unless forced to. If you're aggregating then the proportions are defined by totals at coarser levels. Also, it is evident from examples like 089 that finer levels can be highly variable in concentration within a coarser level.

    All that said, and with yet more reservations, I guess wildly that nace2 is string. (You don't give a data example in the form we request; so that is why I have to guess.) If so then

    Code:
    gen NACE = substr(nace2, 1, 3) 
    egen meanHHI = mean(HHI), by(NACE) 
    tabdisp NACE, c(meanHHI) format(%4.3f)
    may be what you are looking for.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      HHI presumably means something like Herfindahl-Hirschman index. I wouldn't average HHIs unless forced to. If you're aggregating then the proportions are defined by totals at coarser levels. Also, it is evident from examples like 089 that finer levels can be highly variable in concentration within a coarser level.

      All that said, and with yet more reservations, I guess wildly that nace2 is string. (You don't give a data example in the form we request; so that is why I have to guess.) If so then

      Code:
      gen NACE = substr(nace2, 1, 3)
      egen meanHHI = mean(HHI), by(NACE)
      tabdisp NACE, c(meanHHI) format(%4.3f)
      may be what you are looking for.

      Thank you for your reservations which I agree definitely. I just want to make some comparisons and see the variations across subsectors etc. Below is my dataset I arranged according to dataex command. I hope it works. My original dataset also contains two digit sectors but I can omit them if the calculation would be more complicated. I tried to use your command but it calculated the mean wrongly for some sectors.

      Code:
      * Example generated by -dataex-.    To    install:    ssc    install    dataex
      clear
      input str21 nace2 double HHI
      "05"             .
      "051"            .
      "0510" .0929000861
      "052"            .
      "0520" .0978579125
      "06"             .
      "061"            .
      "0610" .5854457791
      "062"            .
      "0620" .4798563569
      "07"             .
      "071"            .
      "0710" .2199852131
      "072"            .
      "0729" .1103154682
      "08"             .
      "081"            .
      "0811" .0111453791
      "0812" .0102862714
      "089"            .
      "0891" .6843208716
      "0892" .1855210554
      "0893" .3512744743
      "0899" .0748194056
      "09"             .
      end

      Comment


      • #4
        Give me an example of a wrong result explaining why it is wrong.

        Comment

        Working...
        X