taking mean of irregular data

Demet Korkut

Join Date: Aug 2018

Posts: 16
#1

taking mean of irregular data

01 Aug 2018, 03:13

Dear Statalist participants,
Below I give a part of my dataset. It is related with nace2 classification of industrial sectors and their HHIs(market concentration levels). But unfortunately they are published in 4 digits level (like 0610) and I need for 3 digit's value (like 061). Now I try to convert (take average) HHI's of subsectors ( ex. 0811 and 0812) to main sectors (ex. 081). As you see, the subsectors of main sectors with three digits are in an irregular size, that is, some main sectors have two subsectors others have only one, others have seven etc. I attemted to take average with "for loops" but I could not manage. How can I do that with my large dataset without manually calculating and entering the data which would take huge amount of my time!! Thank you for your interest in advance.

Demet.

nace2 HHI
061
0610 .5854457791
062
0620 .4798563569
071
0710 .2199852131
072
0729 .1103154682
081
0811 .0111453791
0812 .0102862714
089
0891 .6843208716
0892 .1855210554
0893 .3512744743
0899 .0748194056
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35636
#2

01 Aug 2018, 05:09

HHI presumably means something like Herfindahl-Hirschman index. I wouldn't average HHIs unless forced to. If you're aggregating then the proportions are defined by totals at coarser levels. Also, it is evident from examples like 089 that finer levels can be highly variable in concentration within a coarser level.

All that said, and with yet more reservations, I guess wildly that nace2 is string. (You don't give a data example in the form we request; so that is why I have to guess.) If so then

Code:

gen NACE = substr(nace2, 1, 3) egen meanHHI = mean(HHI), by(NACE) tabdisp NACE, c(meanHHI) format(%4.3f)

may be what you are looking for.
Comment
Demet Korkut

Join Date: Aug 2018

Posts: 16
#3

02 Aug 2018, 04:01

Originally posted by Nick Cox View Post

HHI presumably means something like Herfindahl-Hirschman index. I wouldn't average HHIs unless forced to. If you're aggregating then the proportions are defined by totals at coarser levels. Also, it is evident from examples like 089 that finer levels can be highly variable in concentration within a coarser level.

All that said, and with yet more reservations, I guess wildly that nace2 is string. (You don't give a data example in the form we request; so that is why I have to guess.) If so then

Code:

gen NACE = substr(nace2, 1, 3) egen meanHHI = mean(HHI), by(NACE) tabdisp NACE, c(meanHHI) format(%4.3f)

may be what you are looking for.

Thank you for your reservations which I agree definitely. I just want to make some comparisons and see the variations across subsectors etc. Below is my dataset I arranged according to dataex command. I hope it works. My original dataset also contains two digit sectors but I can omit them if the calculation would be more complicated. I tried to use your command but it calculated the mean wrongly for some sectors.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str21 nace2 double HHI "05" . "051" . "0510" .0929000861 "052" . "0520" .0978579125 "06" . "061" . "0610" .5854457791 "062" . "0620" .4798563569 "07" . "071" . "0710" .2199852131 "072" . "0729" .1103154682 "08" . "081" . "0811" .0111453791 "0812" .0102862714 "089" . "0891" .6843208716 "0892" .1855210554 "0893" .3512744743 "0899" .0748194056 "09" . end
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35636
#4

02 Aug 2018, 05:38

Give me an example of a wrong result explaining why it is wrong.
Comment

Announcement

taking mean of irregular data

Comment

Comment

Comment