Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Entropy Method of forming Composite Index

    Hi,
    I was wondering how to form a composite index using entropy method in STATA.

  • #2
    That could mean several different things, but I will first mention two sources familiar to me: check out entropyetc from SSC or https://journals.sagepub.com/doi/pdf...6867X241276115

    Code:
    search entropy
    in Stata will point to much else.

    If that doesn't answer your question, please give more detail.

    Comment


    • #3
      Also, I am not able to save an ado file to the personal directory of STATA. I have been denied permission for the same by the administrator. Can someone help?

      Comment


      • #4
        Originally posted by Nick Cox View Post
        That could mean several different things, but I will first mention two sources familiar to me: check out entropyetc from SSC or https://journals.sagepub.com/doi/pdf...6867X241276115

        Code:
        search entropy
        in Stata will point to much else.

        If that doesn't answer your question, please give more detail.
        Thanks for your reply, Nick!
        I am referring to the following: Entropy Index Program for Stata

        Comment


        • #5

          Also, I am not able to save an ado file to the personal directory of Stata. I have been denied permission for the same by the administrator. Can someone help?
          If you are referring to your local IT administrator, how are we expected to help?

          I am referring to the following: Entropy Index Program for Stata
          Sorry, but that sounds like the same question to me.

          Comment


          • #6
            Originally posted by Niti Khandelwal View Post
            I am referring to the following: Entropy Index Program for Stata
            Terminology is not as standardized as you seem to think. The words you have written could mean many different things. Can you tell us more about what you want that program to do, or can you give a reference?
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Originally posted by Maarten Buis View Post

              Terminology is not as standardized as you seem to think. The words you have written could mean many different things. Can you tell us more about what you want that program to do, or can you give a reference?
              Thank you, Maarten! My objective is to form a composite index of various measures of a variable. I cam across certain references that have used the Entropy method to do the same.
              Kindly refer to Zhang,W.M.; An, J.W.; Han, C. The application of the entropy method in the evaluation of urban sustainable
              development. J. Quant. Tech. Econ. 2003, 6, 115–118.
              Jin, H., Qian, X., Chin, T., & Zhang, H. (2020). A global assessment of sustainable development based on modification of the human development index via the entropy method. Sustainability, 12(8), 3251.

              When I searched for a STATA code for the same, I came across the following:
              https://github.com/ammari1986/entropy-index-stata
              Ammari has shared and written a code for the same, but I can't load it in my ado directory.
              Could you please help?

              Comment


              • #8
                Ok, so you have the program you want (it is in the Github repository). You just don't know where to save that file, as your administrator does not allow you access to the standard directory.

                First choice is to talk to the administrator again and be as convincing as you possibly can (without breaking the law).

                If that does not work, than you can store the .ado files and .sthlp files in your working directory. The disadvantage of this is, is that the program is only available for that project, and, as you use more community contributed programs, you fill up that folder quite quickly. The advantage is that you don't need administrator approval to store files there, and you store exactly the version of that program in your project making it easier to create a replication package afterwards.

                If you do all your work in one .do file, then at the top of that do file you add a line

                Code:
                cd h:\where\ever\I\work
                This "h:\where\ever\Iwork" is now the working directory for that project (obviously you need to change that to something that will work on your machine). In that directory you store all your .ado files that you want to download, and Stata will find them.

                You can be more fancy and have a main .do file calling several sub-files. In which case you would only include the cd command in the main.do file. You can work with sysdir set in that main.do file to be able to store the community contributed package in a separate folder. But the first solution will work just fine for small simple projects. Remember that each project should have its own folder.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  This is https://github.com/ammari1986/entrop...ropy_index.ado which is the code for what Ammari is calling an entropy index.

                  It's not equivalent to what is calculated in the code referred to in #2. If you want this program, then you could copy and paste it directly into your Stata, but I don't vouch for it or even my own code suggestions as being rock solid reliable. I haven't chased up the literature cited in #7.

                  Code:
                  capture program drop entropy_index
                  program define entropy_index
                      syntax varlist(min=2 numeric) [if] [in], GENerate(name)
                  
                      // Mark sample
                      marksample touse
                  
                      // Step 1: Normalize Indicators (Min-Max Normalization)
                      foreach var of local varlist {
                          quietly summarize `var' if `touse'
                          local min_`var' = r(min)  // Store minimum value
                          local max_`var' = r(max)  // Store maximum value
                          gen norm_`var' = (`var' - `min_`var'') / (`max_`var'' - `min_`var'') if `touse'
                      }
                  
                      // Step 2: Compute Proportions
                      foreach var of local varlist {
                          egen total_norm_`var' = total(norm_`var') if `touse'
                          gen prop_`var' = norm_`var' / total_norm_`var' if `touse'
                      }
                  
                      // Step 3: Calculate Entropy
                      egen n = total(1) if `touse'  // Total number of observations
                      gen k = 1 / ln(n) if `touse'  // Scaling constant
                  
                      foreach var of local varlist {
                          gen ln_prop_`var' = ln(prop_`var' + 1e-6) if `touse'  // Add small constant to avoid log(0)
                          gen entropy_`var' = -k * prop_`var' * ln_prop_`var' if `touse'
                          egen e_`var' = total(entropy_`var') if `touse'
                      }
                  
                      // Step 4: Compute Divergence and Weights
                      foreach var of local varlist {
                          gen divergence_`var' = 1 - e_`var' if `touse'
                      }
                  
                      egen total_divergence = rowtotal(`=subinstr("`varlist'", " ", " divergence_", .)') if `touse'
                  
                      foreach var of local varlist {
                          gen weight_`var' = divergence_`var' / total_divergence if `touse'
                      }
                  
                      // Step 5: Construct the Composite Index
                      gen `generate' = 0 if `touse'
                      foreach var of local varlist {
                          replace `generate' = `generate' + weight_`var' * norm_`var' if `touse'
                      }
                  end
                  The .sthlp file at the same place isn't, hmm, very helpful. It's best to look at the code.

                  Here is the code again, with my translation (ignoring some details) and some commentary.

                  Code:
                  capture program drop entropy_index
                  program define entropy_index
                      syntax varlist(min=2 numeric) [if] [in], GENerate(name)
                  2 or more numeric variables are input.

                  Code:
                      // Mark sample
                      marksample touse
                  The program is going to ignore any observations that the user didn't select OR
                  that have any missing values.

                  Code:
                      // Step 1: Normalize Indicators (Min-Max Normalization)
                      foreach var of local varlist {
                          quietly summarize `var' if `touse'
                          local min_`var' = r(min)  // Store minimum value
                          local max_`var' = r(max)  // Store maximum value
                          gen norm_`var' = (`var' - `min_`var'') / (`max_`var'' - `min_`var'') if `touse'
                      }
                  We scale each variable to [0, 1] using (value MINUS minimum) / (maximum MINUS minimum)

                  Comment 1: summarize, meanonly would be a smidgen more efficient.

                  Comment 2: Putting
                  r(min) and r(max) into local macros and then taking them out again is pointless.
                  Just use
                  r(min) and r(max) directly.

                  Comment 3: New variables with prefix
                  norm_ are created. In principle, they might clash with the
                  names of variables you have already


                  Code:
                      // Step 2: Compute Proportions
                      foreach var of local varlist {
                          egen total_norm_`var' = total(norm_`var') if `touse'
                          gen prop_`var' = norm_`var' / total_norm_`var' if `touse'
                      }
                  Now we scale each of those variables to be a proportion of its total.

                  Comment 4: Again, new variables are created that might clash with yours.

                  Comment 5: There are various tacit assumptions there, essentially that this makes sense substantively.

                  Code:
                      // Step 3: Calculate Entropy
                      egen n = total(1) if `touse'  // Total number of observations
                      gen k = 1 / ln(n) if `touse'  // Scaling constant
                  Comment 6: The programmer wants ln(#observations used) as a scaling factor.
                  The #observations used was calculated in Step 1 by summarize which left r(N) in its wake

                  Comment 7: Same point about new variables.

                  Code:
                      foreach var of local varlist {
                          gen ln_prop_`var' = ln(prop_`var' + 1e-6) if `touse'  // Add small constant to avoid log(0)
                          gen entropy_`var' = -k * prop_`var' * ln_prop_`var' if `touse'
                          egen e_`var' = total(entropy_`var') if `touse'
                      }
                  Comment 8: Whoa there! The comment may seem innocuous but this is a fudge undocumented in the help.
                  Backing up, the standard way to insist that p = 0 results in p ln p being 0 too is just to trap that condition, so

                  Code:
                  cond(p == 0, 0, p * ln(p))
                  is Stata code for a probability
                  Code:
                  p
                  .

                  So, it would have simpler (and more accurate) to replace these two generate statements with

                  Code:
                  gen entropy_`var' = cond(prop_`var' == 0, 0, -k * prop_`var' * ln(prop_`var')) if `touse'
                  Comment 9: If you need a total, egen, total() is overkill compared with using summarize directly, which leaves r(sum) in its wake. The same point also applies earlier.

                  Comment 10: Same point about new variables.

                  Code:
                      // Step 4: Compute Divergence and Weights
                      foreach var of local varlist {
                          gen divergence_`var' = 1 - e_`var' if `touse'
                      }
                  
                      egen total_divergence = rowtotal(`=subinstr("`varlist'", " ", " divergence_", .)') if `touse'
                  
                      foreach var of local varlist {
                          gen weight_`var' = divergence_`var' / total_divergence if `touse'
                      }
                  
                      // Step 5: Construct the Composite Index
                      gen `generate' = 0 if `touse'
                      foreach var of local varlist {
                          replace `generate' = `generate' + weight_`var' * norm_`var' if `touse'
                      }
                  end
                  This is more calculations that may interest you, and it seems that some other paper may be needed for motivation.


                  Comment

                  Working...
                  X