entropy

samaneh khaef

Join Date: Apr 2018

Posts: 61
#1

entropy

27 Mar 2019, 06:27

Hi.
I have a dataset which shows the share of low, high and middle educated in each neighborhood. now I want to know to what extent they each neighborhood is homogenous or heterogeneous based on educational level. which method id better? I tried to use entropyetc but it said too many variables define.
my data looks like that:

Id Place share of high educated share of low educated share of middle educated

1 0.5 0.3 0.1

2 0.1 0.2 0.5

3 0.3 0.4 0.2

4 0.3 0.1 0.5

5 0.2 0.4 0.5

6 0.5 0.4 0.3

7 0.3 0.4 0.4
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35798
#2

27 Mar 2019, 06:34

Your three share variables add to 0.9, 0.8, 0.9, 0.9, 1.1, 1.2, 1.1. How are we supposed to treat these shares?
Comment
samaneh khaef

Join Date: Apr 2018

Posts: 61
#3

27 Mar 2019, 06:50

Sorry something went wrong: it should be like that:
Id Place share of high educated share of low educated share of middle educated

1 0.5 0.3 0.2

2 0.1 0.2 0.7

3 0.3 0.4 0.3

4 0.3 0.2 0.5

5 0.2 0.3 0.5

6 0.5 0.1 0.4

7 0.3 0.2 0.5

this shows the total number of population in each educational level from the total population in neighborhood
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35798

27 Mar 2019, 08:49

Thanks. For this data structure, entropy calculation could just be one direct command line working across variables, but a moderately long one. Although it doesn't not bite with your example, the main need is to ensure that 0 ln (1/0) gets treated as zero, not missing.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id high middle low)
1 .5 .2 .3
2 .1 .7 .2
3 .3 .3 .4
4 .3 .5 .2
5 .2 .5 .3
6 .5 .4 .1
7 .3 .5 .2
end
. 
gen entropy = 0 

quierforeach v in high middle low { 
    replace entropy = entropy + cond(`v' == 0, 0, `v' * ln(1/`v')) 
} 

list, sep(0) 
     +-------------------------------------+
     | id   high   low   middle    entropy |
     |-------------------------------------|
  1. |  1     .5    .3       .2   1.029653 |
  2. |  2     .1    .2       .7   .8018185 |
  3. |  3     .3    .4       .3     1.0889 |
  4. |  4     .3    .2       .5   1.029653 |
  5. |  5     .2    .3       .5   1.029653 |
  6. |  6     .5    .1       .4   .9433484 |
  7. |  7     .3    .2       .5   1.029653 |
     +-------------------------------------+

You could do it with entropyetc (SSC, as you are asked to explain) -- but as the help tells you that expects one categorical variable as input, plus optionally a set of weights, and you need to reshape first.

Code:

 . 
rename (high middle low) p= 

reshape long p, i(id) j(class) string 
 
entropyetc class [aw=p] , by(id) 

----------------------------------------------------------------------
    Group |  Shannon H      exp(H)     Simpson   1/Simpson     dissim.
----------+-----------------------------------------------------------
        1 |      1.030       2.800       0.380       2.632       0.167
        2 |      0.802       2.230       0.540       1.852       0.367
        3 |      1.089       2.971       0.340       2.941       0.067
        4 |      1.030       2.800       0.380       2.632       0.167
        5 |      1.030       2.800       0.380       2.632       0.167
        6 |      0.943       2.569       0.420       2.381       0.233
        7 |      1.030       2.800       0.380       2.632       0.167
----------------------------------------------------------------------

Comment

samaneh khaef

Join Date: Apr 2018

Posts: 61
#5

28 Mar 2019, 02:36

Thanks Nick! It helped a lot. and to interpret does it mean for paces with higher Entropy different class of education exist and they are more heterogeneous based on education and if Entropy is smaller they are more homogeneous? also is there any way to make come classes based on Entropy result? for instance classification in 5 scales?
Comment
samaneh khaef

Join Date: Apr 2018

Posts: 61
#6

28 Mar 2019, 02:40

I also faced with this error :
entropyetc class [aw=p], by(Id)

matsize too small

You have attempted to create a matrix with too many rows or columns or attempted to fit a

model with too many variables. You need to increase matsize; it is currently 400. Use

set matsize; see help matsize.

If you are using factor variables and included an interaction that has lots of missing

cells, either increase matsize or set emptycells drop to reduce the required matrix size;

see help set emptycells.

If you are using factor variables, you might have accidentally treated a continuous

variable as a categorical, resulting in lots of categories. Use the c. operator on such

variables.

r(908);

end of do-file

r(908);
Comment

Id Place	share of high educated	share of low educated	share of middle educated
1	0.5	0.3	0.1
2	0.1	0.2	0.5
3	0.3	0.4	0.2
4	0.3	0.1	0.5
5	0.2	0.4	0.5
6	0.5	0.4	0.3
7	0.3	0.4	0.4

Id Place	share of high educated	share of low educated	share of middle educated
1	0.5	0.3	0.2
2	0.1	0.2	0.7
3	0.3	0.4	0.3
4	0.3	0.2	0.5
5	0.2	0.3	0.5
6	0.5	0.1	0.4
7	0.3	0.2	0.5

Announcement

Comment

Comment

Comment

Comment

Comment