Entropy Method of forming Composite Index

Niti Khandelwal

Join Date: Apr 2016

Posts: 4
#1

Entropy Method of forming Composite Index

04 Jun 2025, 10:32

Hi,
I was wondering how to form a composite index using entropy method in STATA.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35681
#2

04 Jun 2025, 10:41

That could mean several different things, but I will first mention two sources familiar to me: check out entropyetc from SSC or https://journals.sagepub.com/doi/pdf...6867X241276115

Code:

search entropy

in Stata will point to much else.

If that doesn't answer your question, please give more detail.
Comment
Niti Khandelwal

Join Date: Apr 2016

Posts: 4
#3

04 Jun 2025, 10:52

Also, I am not able to save an ado file to the personal directory of STATA. I have been denied permission for the same by the administrator. Can someone help?
Comment
Niti Khandelwal

Join Date: Apr 2016

Posts: 4
#4

04 Jun 2025, 10:58

Originally posted by Nick Cox View Post

That could mean several different things, but I will first mention two sources familiar to me: check out entropyetc from SSC or https://journals.sagepub.com/doi/pdf...6867X241276115

Code:

search entropy

in Stata will point to much else.

If that doesn't answer your question, please give more detail.

Thanks for your reply, Nick!
I am referring to the following: Entropy Index Program for Stata
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35681
#5

04 Jun 2025, 11:33

Also, I am not able to save an ado file to the personal directory of Stata. I have been denied permission for the same by the administrator. Can someone help?

If you are referring to your local IT administrator, how are we expected to help?

I am referring to the following: Entropy Index Program for Stata

Sorry, but that sounds like the same question to me.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3454
#6

04 Jun 2025, 12:37

Originally posted by Niti Khandelwal View Post

I am referring to the following: Entropy Index Program for Stata

Terminology is not as standardized as you seem to think. The words you have written could mean many different things. Can you tell us more about what you want that program to do, or can you give a reference?

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Niti Khandelwal

Join Date: Apr 2016

Posts: 4
#7

05 Jun 2025, 05:22

Originally posted by Maarten Buis View Post

Terminology is not as standardized as you seem to think. The words you have written could mean many different things. Can you tell us more about what you want that program to do, or can you give a reference?

Thank you, Maarten! My objective is to form a composite index of various measures of a variable. I cam across certain references that have used the Entropy method to do the same.
Kindly refer to Zhang,W.M.; An, J.W.; Han, C. The application of the entropy method in the evaluation of urban sustainable
development. J. Quant. Tech. Econ. 2003, 6, 115–118.
Jin, H., Qian, X., Chin, T., & Zhang, H. (2020). A global assessment of sustainable development based on modification of the human development index via the entropy method. Sustainability, 12(8), 3251.

When I searched for a STATA code for the same, I came across the following:
https://github.com/ammari1986/entropy-index-stata
Ammari has shared and written a code for the same, but I can't load it in my ado directory.
Could you please help?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3454
#8

05 Jun 2025, 08:23

Ok, so you have the program you want (it is in the Github repository). You just don't know where to save that file, as your administrator does not allow you access to the standard directory.

First choice is to talk to the administrator again and be as convincing as you possibly can (without breaking the law).

If that does not work, than you can store the .ado files and .sthlp files in your working directory. The disadvantage of this is, is that the program is only available for that project, and, as you use more community contributed programs, you fill up that folder quite quickly. The advantage is that you don't need administrator approval to store files there, and you store exactly the version of that program in your project making it easier to create a replication package afterwards.

If you do all your work in one .do file, then at the top of that do file you add a line

Code:

cd h:\where\ever\I\work

This "h:\where\ever\Iwork" is now the working directory for that project (obviously you need to change that to something that will work on your machine). In that directory you store all your .ado files that you want to download, and Stata will find them.

You can be more fancy and have a main .do file calling several sub-files. In which case you would only include the cd command in the main.do file. You can work with sysdir set in that main.do file to be able to store the community contributed package in a separate folder. But the first solution will work just fine for small simple projects. Remember that each project should have its own folder.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35681

05 Jun 2025, 11:19

This is https://github.com/ammari1986/entrop...ropy_index.ado which is the code for what Ammari is calling an entropy index.

It's not equivalent to what is calculated in the code referred to in #2. If you want this program, then you could copy and paste it directly into your Stata, but I don't vouch for it or even my own code suggestions as being rock solid reliable. I haven't chased up the literature cited in #7.

Code:

capture program drop entropy_index
program define entropy_index
    syntax varlist(min=2 numeric) [if] [in], GENerate(name)

    // Mark sample
    marksample touse

    // Step 1: Normalize Indicators (Min-Max Normalization)
    foreach var of local varlist {
        quietly summarize `var' if `touse'
        local min_`var' = r(min)  // Store minimum value
        local max_`var' = r(max)  // Store maximum value
        gen norm_`var' = (`var' - `min_`var'') / (`max_`var'' - `min_`var'') if `touse'
    }

    // Step 2: Compute Proportions
    foreach var of local varlist {
        egen total_norm_`var' = total(norm_`var') if `touse'
        gen prop_`var' = norm_`var' / total_norm_`var' if `touse'
    }

    // Step 3: Calculate Entropy
    egen n = total(1) if `touse'  // Total number of observations
    gen k = 1 / ln(n) if `touse'  // Scaling constant

    foreach var of local varlist {
        gen ln_prop_`var' = ln(prop_`var' + 1e-6) if `touse'  // Add small constant to avoid log(0)
        gen entropy_`var' = -k * prop_`var' * ln_prop_`var' if `touse'
        egen e_`var' = total(entropy_`var') if `touse'
    }

    // Step 4: Compute Divergence and Weights
    foreach var of local varlist {
        gen divergence_`var' = 1 - e_`var' if `touse'
    }

    egen total_divergence = rowtotal(`=subinstr("`varlist'", " ", " divergence_", .)') if `touse'

    foreach var of local varlist {
        gen weight_`var' = divergence_`var' / total_divergence if `touse'
    }

    // Step 5: Construct the Composite Index
    gen `generate' = 0 if `touse'
    foreach var of local varlist {
        replace `generate' = `generate' + weight_`var' * norm_`var' if `touse'
    }
end

The .sthlp file at the same place isn't, hmm, very helpful. It's best to look at the code.

Here is the code again, with my translation (ignoring some details) and some commentary.

Code:

capture program drop entropy_index
program define entropy_index
    syntax varlist(min=2 numeric) [if] [in], GENerate(name)

2 or more numeric variables are input.

Code:

    // Mark sample
    marksample touse

The program is going to ignore any observations that the user didn't select OR
that have any missing values.

Code:

    // Step 1: Normalize Indicators (Min-Max Normalization)
    foreach var of local varlist {
        quietly summarize `var' if `touse'
        local min_`var' = r(min)  // Store minimum value
        local max_`var' = r(max)  // Store maximum value
        gen norm_`var' = (`var' - `min_`var'') / (`max_`var'' - `min_`var'') if `touse'
    }

We scale each variable to [0, 1] using (value MINUS minimum) / (maximum MINUS minimum)

Comment 1: summarize, meanonly would be a smidgen more efficient.

Comment 2: Putting r(min) and r(max) into local macros and then taking them out again is pointless.
Just use r(min) and r(max) directly.

Comment 3: New variables with prefix norm_ are created. In principle, they might clash with the
names of variables you have already

Code:

    // Step 2: Compute Proportions
    foreach var of local varlist {
        egen total_norm_`var' = total(norm_`var') if `touse'
        gen prop_`var' = norm_`var' / total_norm_`var' if `touse'
    }

Now we scale each of those variables to be a proportion of its total.

Comment 4: Again, new variables are created that might clash with yours.

Comment 5: There are various tacit assumptions there, essentially that this makes sense substantively.

Code:

    // Step 3: Calculate Entropy
    egen n = total(1) if `touse'  // Total number of observations
    gen k = 1 / ln(n) if `touse'  // Scaling constant

Comment 6: The programmer wants ln(#observations used) as a scaling factor.
The #observations used was calculated in Step 1 by summarize which left r(N) in its wake

Comment 7: Same point about new variables.

Code:

    foreach var of local varlist {
        gen ln_prop_`var' = ln(prop_`var' + 1e-6) if `touse'  // Add small constant to avoid log(0)
        gen entropy_`var' = -k * prop_`var' * ln_prop_`var' if `touse'
        egen e_`var' = total(entropy_`var') if `touse'
    }

Comment 8: Whoa there! The comment may seem innocuous but this is a fudge undocumented in the help.
Backing up, the standard way to insist that p = 0 results in p ln p being 0 too is just to trap that condition, so

Code:

cond(p == 0, 0, p * ln(p))

is Stata code for a probability

Code:

.

So, it would have simpler (and more accurate) to replace these two generate statements with

Code:

gen entropy_`var' = cond(prop_`var' == 0, 0, -k * prop_`var' * ln(prop_`var')) if `touse'

Comment 9: If you need a total, egen, total() is overkill compared with using summarize directly, which leaves r(sum) in its wake. The same point also applies earlier.

Comment 10: Same point about new variables.

Code:

    // Step 4: Compute Divergence and Weights
    foreach var of local varlist {
        gen divergence_`var' = 1 - e_`var' if `touse'
    }

    egen total_divergence = rowtotal(`=subinstr("`varlist'", " ", " divergence_", .)') if `touse'

    foreach var of local varlist {
        gen weight_`var' = divergence_`var' / total_divergence if `touse'
    }

    // Step 5: Construct the Composite Index
    gen `generate' = 0 if `touse'
    foreach var of local varlist {
        replace `generate' = `generate' + weight_`var' * norm_`var' if `touse'
    }
end

This is more calculations that may interest you, and it seems that some other paper may be needed for motivation.

Announcement

Entropy Method of forming Composite Index

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment