Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • entropy

    Hi.
    I have a dataset which shows the share of low, high and middle educated in each neighborhood. now I want to know to what extent they each neighborhood is homogenous or heterogeneous based on educational level. which method id better? I tried to use entropyetc but it said too many variables define.
    my data looks like that:
    Id Place share of high educated share of low educated share of middle educated
    1 0.5 0.3 0.1
    2 0.1 0.2 0.5
    3 0.3 0.4 0.2
    4 0.3 0.1 0.5
    5 0.2 0.4 0.5
    6 0.5 0.4 0.3
    7 0.3 0.4 0.4

  • #2
    Your three share variables add to 0.9, 0.8, 0.9, 0.9, 1.1, 1.2, 1.1. How are we supposed to treat these shares?

    Comment


    • #3
      Sorry something went wrong: it should be like that:
      Id Place share of high educated share of low educated share of middle educated
      1 0.5 0.3 0.2
      2 0.1 0.2 0.7
      3 0.3 0.4 0.3
      4 0.3 0.2 0.5
      5 0.2 0.3 0.5
      6 0.5 0.1 0.4
      7 0.3 0.2 0.5

      this shows the total number of population in each educational level from the total population in neighborhood

      Comment


      • #4
        Thanks. For this data structure, entropy calculation could just be one direct command line working across variables, but a moderately long one. Although it doesn't not bite with your example, the main need is to ensure that 0 ln (1/0) gets treated as zero, not missing.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(id high middle low)
        1 .5 .2 .3
        2 .1 .7 .2
        3 .3 .3 .4
        4 .3 .5 .2
        5 .2 .5 .3
        6 .5 .4 .1
        7 .3 .5 .2
        end
        . 
        gen entropy = 0 
        
        quierforeach v in high middle low { 
            replace entropy = entropy + cond(`v' == 0, 0, `v' * ln(1/`v')) 
        } 
        
        list, sep(0) 
             +-------------------------------------+
             | id   high   low   middle    entropy |
             |-------------------------------------|
          1. |  1     .5    .3       .2   1.029653 |
          2. |  2     .1    .2       .7   .8018185 |
          3. |  3     .3    .4       .3     1.0889 |
          4. |  4     .3    .2       .5   1.029653 |
          5. |  5     .2    .3       .5   1.029653 |
          6. |  6     .5    .1       .4   .9433484 |
          7. |  7     .3    .2       .5   1.029653 |
             +-------------------------------------+
        You could do it with entropyetc (SSC, as you are asked to explain) -- but as the help tells you that expects one categorical variable as input, plus optionally a set of weights, and you need to reshape first.

        Code:
         . 
        rename (high middle low) p= 
        
        reshape long p, i(id) j(class) string 
         
        entropyetc class [aw=p] , by(id) 
        
        ----------------------------------------------------------------------
            Group |  Shannon H      exp(H)     Simpson   1/Simpson     dissim.
        ----------+-----------------------------------------------------------
                1 |      1.030       2.800       0.380       2.632       0.167
                2 |      0.802       2.230       0.540       1.852       0.367
                3 |      1.089       2.971       0.340       2.941       0.067
                4 |      1.030       2.800       0.380       2.632       0.167
                5 |      1.030       2.800       0.380       2.632       0.167
                6 |      0.943       2.569       0.420       2.381       0.233
                7 |      1.030       2.800       0.380       2.632       0.167
        ----------------------------------------------------------------------

        Comment


        • #5
          Thanks Nick! It helped a lot. and to interpret does it mean for paces with higher Entropy different class of education exist and they are more heterogeneous based on education and if Entropy is smaller they are more homogeneous? also is there any way to make come classes based on Entropy result? for instance classification in 5 scales?

          Comment


          • #6
            I also faced with this error :
            entropyetc class [aw=p], by(Id)

            matsize too small

            You have attempted to create a matrix with too many rows or columns or attempted to fit a

            model with too many variables. You need to increase matsize; it is currently 400. Use

            set matsize; see help matsize.



            If you are using factor variables and included an interaction that has lots of missing

            cells, either increase matsize or set emptycells drop to reduce the required matrix size;

            see help set emptycells.



            If you are using factor variables, you might have accidentally treated a continuous

            variable as a categorical, resulting in lots of categories. Use the c. operator on such

            variables.

            r(908);



            end of do-file



            r(908);

            Comment

            Working...
            X