Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating entropy for LCA (latent class analysis) in STATA 15

    Hi

    I'm trying out the new Latent Class Analysis feature of Stata 15's -gsem- command with Stata/IC 15. I can get all AIC and BIC values for all different number of latent classes, as done in Stata 15 sem.pdf manual, but I cannot calculate the entropy value of the model.

    Is there a command to do that? Or more evaluations of the model?

    Thank you!

  • #2
    So, I believe you are referring to the entropy defined as the sum of -p log p, where p is the class probability, and we sum over classes. I'm not aware of a command to do this (if somebody else knows of one, please chime in), but it is easy enough to calculate:

    Code:
    webuse gsem_lca1
    
    gsem (accident play insurance stock <- ), logit lclass(C 2)
    
    estat lcprob
    
    matrix M = r(b)
    
    local entropy = 0
    forvalues i = 1/2 {
        local entropy = `entropy' + M[1, `i']*log(M[1, `i'])
    }
    local entropy = -`entropy'
    display `entropy'

    Comment


    • #3
      Alessandro,

      Are you using the postestimation command estat lcgof after estimating the latent class model?

      In addition to AIC and BIC, estat lcgof provides a G-squared likelihood ratio chi-square, which is an indicator of the goodness of fit. If you're not familiar with this statistic, you can read John Uebersax's discussion of it at http://www.john-uebersax.com/stat/faq.htm#fit .

      Hope that helps.

      Red Owl
      Stata/IC 15.0

      Comment


      • #4
        Dear Clyde Schlechter,
        does your formula for entropy also apply to Latent Profil Models? I am looking to quantify how my estimated profiles are distinct from another.

        Best regards,
        Anton

        Comment


        • #5
          I have only a small amount of experience with these models, but my understanding is that it's the same.

          Comment


          • #6
            With respect to Clyde, the calculation for entropy in this context may be incorrect. The contribute of each unit to the overall entropy of the model is correct, it is p * log(p). However, we actually need the sum of each observation's contribution to entropy. Consider: in this case, the LCA estimated that 72.1% of the sample are in class 1, and 27.9% are in class 2. Now, say each respondent's predicted probabilities of being in class 1 and 2 were 99% and 1%, or vice versa. That means we're very sure the classes are separated strongly, and the (normalized) entropy should be near 1. In contrast, say the LCA model said that everyone had a 72.1% chance of being in class 1 and 27.9% chance of being in class 2. We have no certainty about who's in what class, and the normalized entropy should be near 0.

            Also, it appears to be helpful to normalize entropy. Normally, it has a maximum of N log K, where K denotes the number of classes. Dividing by N log K rescales it to 0 to 1. I am not a real statistician, so please correct me if there any errors. That said, I came up with some code, likely inefficient, to produce the overall model normalized entropy here. I do not believe that the data are in the correct form for Nick Cox's entropyetc command (on SSC) to calculate entropy, and similarly for Dirk Enzmann's divcat command (on SSC, link available from the previous hyperlink).

            Code:
            quietly predict classpost*, classposteriorpr
            forvalues k = 1/2 {
                    gen p_lnp_k`k' = classpost`k'*ln(classpost`k')
                    }
            egen sum_p_lnp = rowtotal(p_lnp_k?)
            total sum_p_lnp
            drop classpost? p_lnp_k? sum_p_lnp
            matrix a = e(b)
            scalar E = 1 + a[1,1]/(e(N)*ln(2))
            di E
            Clyde's code gives an entropy of 0.592, my code gives a normalized entropy of 0.719. If someone can simplify this coding, I'd be grateful. If anyone can find a programming or conceptual error, I'll be mildly embarrassed in the short term, but very grateful in the long term.
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #7
              Weiwen makes an interesting point and distinction here. The code I offered in #2 is, in fact, the entropy of the class distribution, not the entropy of the classification model in the data. The latter is given, I believe, by his code in #6.

              As for the efficiency of that code, unless one is doing a very large series of these calculations, it would be hard to noticeably improve the performance of what he wrote. In the case of very large data sets or a loop with a very large series of entropy calculations, one could slightly streamline it as follows:

              Code:
              quietly predict classpost*, classposteriorpr
              gen sum_p_lnp = 0
              forvalues k = 1/2 {
                      replace sum_p_lnp = sum_p_lnp + classpost`k'*ln(classpost`k')
              }
              summ sum_p_lnp, meanonly
              scalar E = 1+`r(sum)'/(e(N)*ln(2))
              drop classpost? sum_p_lnp
              di E
              Note: Not tested, beware of typos.

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                ...

                Note: Not tested, beware of typos.
                I did test that, and it works. Thanks!

                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment


                • #9
                  Updating my prior post. I recently came to a situation where one of my class prevalences was about 6%. That class was very distinctive indeed, and the majority of the predicted probabilities were zero. As you can imagine, this messed up the entropy calculation, because ln(0) is undefined.

                  I did a quick internet search, but I couldn't find any solid recommendations on how to handle this situation. If you plot a function of p * ln(p) from p = 0 to 1, you'll see that the function approaches 0 as p tends to 0, and it is 0 at p = 1.

                  Code:
                  graph twoway function y = x * ln(x), range(0 1)
                  Hence, I believe the principled thing to do is to replace sum_p_lnp = 0 when p = 0. After all, if p = 0 or 1, we are absolutely certain what class the observation belongs to. Both cases should contribute to the overall model entropy in the same way. As the core is written, if you come to an observation where one of the posterior class probabilities is 0, sum_p_lnp will be replaced by a missing value. This will cause incorrect calculations of entropy, but you won't know unless you see the log saying that some observations got changed to missing values. Here is an update that will not break in this fashion.

                  Code:
                  quietly predict classpost*, classposteriorpr
                  forvalues k = 1/2 {        
                  gen sum_p_lnp_`k' = classpost`k'*ln(classpost`k')
                  }
                  egen sum_p_lnp = rowtotal(sum_p_lnp_*)
                  summ sum_p_lnp, meanonly
                  scalar E = 1+`r(sum)'/(e(N)*ln(2))
                  drop classpost?
                  sum_p_lnp* di E
                  Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                  When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                  Comment


                  • #10
                    I would just add that it's always a bit hazardous to rely on a graph to get the limiting behavior of a function at a point where it is undefined. But it is easy to show that lim(p->0) p ln p = 0 (which fully justifies everything Weiwen did in #5):

                    Code:
                    Let f(p) = p ln(p).
                    
                    Then f'(p) = 1 + ln p, and f''(p) = 1/p.
                    
                    From that it follows that f(p) has a local minimum at p = 1/e, and then
                    increases monotonically from there to 0.  Since f(p) is monotone increasing on (0, 1/e),
                    it follows that lim (p->0) p ln p = lim (n -> infinity) (1/n) ln (1/n), n integer >= 3.
                    
                    But ln(1/n) = - ln(n).  So we are looking at lim(n -> infinity) -ln(n)/n, and this
                    limit is well known to be zero.  QED, as they say.

                    Comment


                    • #11
                      Originally posted by Clyde Schechter View Post
                      I would just add that it's always a bit hazardous to rely on a graph to get the limiting behavior of a function at a point where it is undefined. But it is easy to show that lim(p-&gt;0) p ln p = 0 (which fully justifies everything Weiwen did in #5):

                      Code:
                      Let f(p) = p ln(p).
                      
                      Then f'(p) = 1 + ln p, and f''(p) = 1/p.
                      
                      From that it follows that f(p) has a local minimum at p = 1/e, and then
                      increases monotonically from there to 0. Since f(p) is monotone increasing on (0, 1/e),
                      it follows that lim (p-&gt;0) p ln p = lim (n -&gt; infinity) (1/n) ln (1/n), n integer &gt;= 3.
                      
                      But ln(1/n) = - ln(n). So we are looking at lim(n -&gt; infinity) -ln(n)/n, and this
                      limit is well known to be zero. QED, as they say.
                      Helpful and on point as always. My calculus is a bit rusty! Also worth noting, the Wikipedia entry for Shannon's entropy in general states that "the value of the corresponding summand 0 log(0) is taken to be 0, consistent with the limit."

                      I'll just add that in my data, this doesn't appear to be an issue of floating point precision. I re-predicted the class probabilities as doubles (i.e. double floating point precision), and I still got some observations with predicted class probabilities of 0. I can describe the data further, but I am confident that the latent class in question is very different from every other observation.
                      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                      Comment


                      • #12
                        Originally posted by Weiwen Ng View Post
                        With respect to Clyde, the calculation for entropy in this context may be incorrect. The contribute of each unit to the overall entropy of the model is correct, it is p * log(p). However, we actually need the sum of each observation's contribution to entropy. Consider: in this case, the LCA estimated that 72.1% of the sample are in class 1, and 27.9% are in class 2. Now, say each respondent's predicted probabilities of being in class 1 and 2 were 99% and 1%, or vice versa. That means we're very sure the classes are separated strongly, and the (normalized) entropy should be near 1. In contrast, say the LCA model said that everyone had a 72.1% chance of being in class 1 and 27.9% chance of being in class 2. We have no certainty about who's in what class, and the normalized entropy should be near 0.

                        Also, it appears to be helpful to normalize entropy. Normally, it has a maximum of N log K, where K denotes the number of classes. Dividing by N log K rescales it to 0 to 1. I am not a real statistician, so please correct me if there any errors. That said, I came up with some code, likely inefficient, to produce the overall model normalized entropy here. I do not believe that the data are in the correct form for Nick Cox's entropyetc command (on SSC) to calculate entropy, and similarly for Dirk Enzmann's divcat command (on SSC, link available from the previous hyperlink).

                        Code:
                        quietly predict classpost*, classposteriorpr
                        forvalues k = 1/2 {
                        gen p_lnp_k`k' = classpost`k'*ln(classpost`k')
                        }
                        egen sum_p_lnp = rowtotal(p_lnp_k?)
                        total sum_p_lnp
                        drop classpost? p_lnp_k? sum_p_lnp
                        matrix a = e(b)
                        scalar E = 1 + a[1,1]/(e(N)*ln(2))
                        di E
                        Clyde's code gives an entropy of 0.592, my code gives a normalized entropy of 0.719. If someone can simplify this coding, I'd be grateful. If anyone can find a programming or conceptual error, I'll be mildly embarrassed in the short term, but very grateful in the long term.
                        Hi Weiwen,

                        Thank you for sharing methods calculating normalized entropy. May I clarify with you if "N" noted in "e(N)" in the denominator indicated the sample size?

                        Best,
                        Mengmeng

                        Comment


                        • #13
                          Hi everyone,

                          I am also running into the situation where LCA model could not concave and I constrained _cons @ -15 for the particular variable if its _cons was more negative than -15. After re-running the model with constrains applied, I did not need to specify -nonrtolerance- to reach model concave but still I got ". . ." for standard errors and 95% CIs for those problematic variables. Is that normal? Can I still report the conditional probability (e.g 3.06E-07) for each variable based on class membership? These numbers suggest unreliability of the model, correct?

                          Many thanks in advance.

                          Best,
                          Mengmeng

                          Comment


                          • #14
                            Originally posted by Mengmeng Li View Post
                            May I clarify with you if "N" noted in "e(N)" in the denominator indicated the sample size?
                            It does. After an estimation command, Stata stores the sample size in a scalar (think of it as a 1x1 matrix) called e(N). You can type
                            Code:
                            display e(N)
                            after any estimation command and you'll see this. Of course, I should also mention that e(N) gets overwritten every time you run a new estimation command, so if you want to calculate entropy, you want to run that set of commands immediately after the corresponding LCA.

                            Originally posted by Mengmeng Li
                            I am also running into the situation where LCA model could not concave and I constrained _cons @ -15 for the particular variable if its _cons was more negative than -15. After re-running the model with constrains applied, I did not need to specify -nonrtolerance- to reach model concave but still I got ". . ." for standard errors and 95% CIs for those problematic variables. Is that normal? Can I still report the conditional probability (e.g 3.06E-07) for each variable based on class membership? These numbers suggest unreliability of the model, correct?
                            It might help if you posted the appropriate section of the results (within code delimiters). That said, once you constrain a variable to some value, I believe this means that it is treated as a known quantity rather than a statistical estimate. So, the standard error will be missing. In that case, I think you can indeed report the conditional probability - you'll note that this is pretty close to 0. I'd explain this in whatever article you write.

                            If you have further questions on this topic, it would be better to start a new thread. The current thread relates to model entropy. Starting a new thread will make it easier for others who have this question in the future to search for and find the appropriate information.
                            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                            Comment


                            • #15
                              Originally posted by Weiwen Ng View Post

                              It does. After an estimation command, Stata stores the sample size in a scalar (think of it as a 1x1 matrix) called e(N). You can type
                              Code:
                              display e(N)
                              after any estimation command and you'll see this. Of course, I should also mention that e(N) gets overwritten every time you run a new estimation command, so if you want to calculate entropy, you want to run that set of commands immediately after the corresponding LCA.



                              It might help if you posted the appropriate section of the results (within code delimiters). That said, once you constrain a variable to some value, I believe this means that it is treated as a known quantity rather than a statistical estimate. So, the standard error will be missing. In that case, I think you can indeed report the conditional probability - you'll note that this is pretty close to 0. I'd explain this in whatever article you write.

                              If you have further questions on this topic, it would be better to start a new thread. The current thread relates to model entropy. Starting a new thread will make it easier for others who have this question in the future to search for and find the appropriate information.
                              Thank you so much for your quick reply. It is very helpful. And as suggested, I will start a new post to ask for advices on LCA modeling.

                              Best,
                              Mengmeng

                              Comment

                              Working...
                              X