Calculating entropy for LCA (latent class analysis) in STATA 15

Alessandro Jammal

Join Date: Sep 2017

Posts: 2
#1

Calculating entropy for LCA (latent class analysis) in STATA 15

29 Sep 2017, 11:16

Hi

I'm trying out the new Latent Class Analysis feature of Stata 15's -gsem- command with Stata/IC 15. I can get all AIC and BIC values for all different number of latent classes, as done in Stata 15 sem.pdf manual, but I cannot calculate the entropy value of the model.

Is there a command to do that? Or more evaluations of the model?

Thank you!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

29 Sep 2017, 11:57

So, I believe you are referring to the entropy defined as the sum of -p log p, where p is the class probability, and we sum over classes. I'm not aware of a command to do this (if somebody else knows of one, please chime in), but it is easy enough to calculate:

Code:

webuse gsem_lca1 gsem (accident play insurance stock <- ), logit lclass(C 2) estat lcprob matrix M = r(b) local entropy = 0 forvalues i = 1/2 { local entropy = `entropy' + M[1, `i']*log(M[1, `i']) } local entropy = -`entropy' display `entropy'
Comment
Red Owl

Join Date: Nov 2016

Posts: 127
#3

29 Sep 2017, 11:58

Alessandro,

Are you using the postestimation command estat lcgof after estimating the latent class model?

In addition to AIC and BIC, estat lcgof provides a G-squared likelihood ratio chi-square, which is an indicator of the goodness of fit. If you're not familiar with this statistic, you can read John Uebersax's discussion of it at http://www.john-uebersax.com/stat/faq.htm#fit .

Hope that helps.

Red Owl
Stata/IC 15.0
Comment
Anton Schwarz

Join Date: Apr 2017

Posts: 5
#4

02 Oct 2017, 05:35

Dear Clyde Schlechter,
does your formula for entropy also apply to Latent Profil Models? I am looking to quantify how my estimated profiles are distinct from another.

Best regards,
Anton
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

02 Oct 2017, 08:38

I have only a small amount of experience with these models, but my understanding is that it's the same.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

16 Nov 2017, 09:27

With respect to Clyde, the calculation for entropy in this context may be incorrect. The contribute of each unit to the overall entropy of the model is correct, it is p * log(p). However, we actually need the sum of each observation's contribution to entropy. Consider: in this case, the LCA estimated that 72.1% of the sample are in class 1, and 27.9% are in class 2. Now, say each respondent's predicted probabilities of being in class 1 and 2 were 99% and 1%, or vice versa. That means we're very sure the classes are separated strongly, and the (normalized) entropy should be near 1. In contrast, say the LCA model said that everyone had a 72.1% chance of being in class 1 and 27.9% chance of being in class 2. We have no certainty about who's in what class, and the normalized entropy should be near 0.

Also, it appears to be helpful to normalize entropy. Normally, it has a maximum of N log K, where K denotes the number of classes. Dividing by N log K rescales it to 0 to 1. I am not a real statistician, so please correct me if there any errors. That said, I came up with some code, likely inefficient, to produce the overall model normalized entropy here. I do not believe that the data are in the correct form for Nick Cox's entropyetc command (on SSC) to calculate entropy, and similarly for Dirk Enzmann's divcat command (on SSC, link available from the previous hyperlink).

Code:

quietly predict classpost*, classposteriorpr forvalues k = 1/2 { gen p_lnp_k`k' = classpost`k'*ln(classpost`k') } egen sum_p_lnp = rowtotal(p_lnp_k?) total sum_p_lnp drop classpost? p_lnp_k? sum_p_lnp matrix a = e(b) scalar E = 1 + a[1,1]/(e(N)*ln(2)) di E

Clyde's code gives an entropy of 0.592, my code gives a normalized entropy of 0.719. If someone can simplify this coding, I'd be grateful. If anyone can find a programming or conceptual error, I'll be mildly embarrassed in the short term, but very grateful in the long term.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
3 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

16 Nov 2017, 10:33

Weiwen makes an interesting point and distinction here. The code I offered in #2 is, in fact, the entropy of the class distribution, not the entropy of the classification model in the data. The latter is given, I believe, by his code in #6.

As for the efficiency of that code, unless one is doing a very large series of these calculations, it would be hard to noticeably improve the performance of what he wrote. In the case of very large data sets or a loop with a very large series of entropy calculations, one could slightly streamline it as follows:

Code:

quietly predict classpost*, classposteriorpr gen sum_p_lnp = 0 forvalues k = 1/2 { replace sum_p_lnp = sum_p_lnp + classpost`k'*ln(classpost`k') } summ sum_p_lnp, meanonly scalar E = 1+`r(sum)'/(e(N)*ln(2)) drop classpost? sum_p_lnp di E

Note: Not tested, beware of typos.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

16 Nov 2017, 11:29

Originally posted by Clyde Schechter View Post

...

Note: Not tested, beware of typos.

I did test that, and it works. Thanks!

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#9

16 Dec 2017, 13:56

Updating my prior post. I recently came to a situation where one of my class prevalences was about 6%. That class was very distinctive indeed, and the majority of the predicted probabilities were zero. As you can imagine, this messed up the entropy calculation, because ln(0) is undefined.

I did a quick internet search, but I couldn't find any solid recommendations on how to handle this situation. If you plot a function of p * ln(p) from p = 0 to 1, you'll see that the function approaches 0 as p tends to 0, and it is 0 at p = 1.

Code:

graph twoway function y = x * ln(x), range(0 1)

Hence, I believe the principled thing to do is to replace sum_p_lnp = 0 when p = 0. After all, if p = 0 or 1, we are absolutely certain what class the observation belongs to. Both cases should contribute to the overall model entropy in the same way. As the core is written, if you come to an observation where one of the posterior class probabilities is 0, sum_p_lnp will be replaced by a missing value. This will cause incorrect calculations of entropy, but you won't know unless you see the log saying that some observations got changed to missing values. Here is an update that will not break in this fashion.

Code:

quietly predict classpost*, classposteriorpr forvalues k = 1/2 { gen sum_p_lnp_`k' = classpost`k'*ln(classpost`k') } egen sum_p_lnp = rowtotal(sum_p_lnp_*) summ sum_p_lnp, meanonly scalar E = 1+`r(sum)'/(e(N)*ln(2)) drop classpost? sum_p_lnp* di E

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

#10

16 Dec 2017, 14:31

I would just add that it's always a bit hazardous to rely on a graph to get the limiting behavior of a function at a point where it is undefined. But it is easy to show that lim(p->0) p ln p = 0 (which fully justifies everything Weiwen did in #5):

Code:

Let f(p) = p ln(p).

Then f'(p) = 1 + ln p, and f''(p) = 1/p.

From that it follows that f(p) has a local minimum at p = 1/e, and then
increases monotonically from there to 0.  Since f(p) is monotone increasing on (0, 1/e),
it follows that lim (p->0) p ln p = lim (n -> infinity) (1/n) ln (1/n), n integer >= 3.

But ln(1/n) = - ln(n).  So we are looking at lim(n -> infinity) -ln(n)/n, and this
limit is well known to be zero.  QED, as they say.

Comment

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#11

18 Dec 2017, 17:25

Originally posted by Clyde Schechter View Post

I would just add that it's always a bit hazardous to rely on a graph to get the limiting behavior of a function at a point where it is undefined. But it is easy to show that lim(p->0) p ln p = 0 (which fully justifies everything Weiwen did in #5):

Code:

Let f(p) = p ln(p). Then f'(p) = 1 + ln p, and f''(p) = 1/p. From that it follows that f(p) has a local minimum at p = 1/e, and then increases monotonically from there to 0. Since f(p) is monotone increasing on (0, 1/e), it follows that lim (p->0) p ln p = lim (n -> infinity) (1/n) ln (1/n), n integer >= 3. But ln(1/n) = - ln(n). So we are looking at lim(n -> infinity) -ln(n)/n, and this limit is well known to be zero. QED, as they say.

Helpful and on point as always. My calculus is a bit rusty! Also worth noting, the Wikipedia entry for Shannon's entropy in general states that "the value of the corresponding summand 0 log(0) is taken to be 0, consistent with the limit."

I'll just add that in my data, this doesn't appear to be an issue of floating point precision. I re-predicted the class probabilities as doubles (i.e. double floating point precision), and I still got some observations with predicted class probabilities of 0. I can describe the data further, but I am confident that the latent class in question is very different from every other observation.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Mengmeng Li

Join Date: Sep 2016

Posts: 62
#12

02 Jun 2018, 12:45

Originally posted by Weiwen Ng View Post

With respect to Clyde, the calculation for entropy in this context may be incorrect. The contribute of each unit to the overall entropy of the model is correct, it is p * log(p). However, we actually need the sum of each observation's contribution to entropy. Consider: in this case, the LCA estimated that 72.1% of the sample are in class 1, and 27.9% are in class 2. Now, say each respondent's predicted probabilities of being in class 1 and 2 were 99% and 1%, or vice versa. That means we're very sure the classes are separated strongly, and the (normalized) entropy should be near 1. In contrast, say the LCA model said that everyone had a 72.1% chance of being in class 1 and 27.9% chance of being in class 2. We have no certainty about who's in what class, and the normalized entropy should be near 0.

Also, it appears to be helpful to normalize entropy. Normally, it has a maximum of N log K, where K denotes the number of classes. Dividing by N log K rescales it to 0 to 1. I am not a real statistician, so please correct me if there any errors. That said, I came up with some code, likely inefficient, to produce the overall model normalized entropy here. I do not believe that the data are in the correct form for Nick Cox's entropyetc command (on SSC) to calculate entropy, and similarly for Dirk Enzmann's divcat command (on SSC, link available from the previous hyperlink).

Code:

quietly predict classpost*, classposteriorpr forvalues k = 1/2 { gen p_lnp_k`k' = classpost`k'*ln(classpost`k') } egen sum_p_lnp = rowtotal(p_lnp_k?) total sum_p_lnp drop classpost? p_lnp_k? sum_p_lnp matrix a = e(b) scalar E = 1 + a[1,1]/(e(N)*ln(2)) di E

Clyde's code gives an entropy of 0.592, my code gives a normalized entropy of 0.719. If someone can simplify this coding, I'd be grateful. If anyone can find a programming or conceptual error, I'll be mildly embarrassed in the short term, but very grateful in the long term.

Hi Weiwen,

Thank you for sharing methods calculating normalized entropy. May I clarify with you if "N" noted in "e(N)" in the denominator indicated the sample size?

Best,
Mengmeng
Comment
Mengmeng Li

Join Date: Sep 2016

Posts: 62
#13

02 Jun 2018, 12:51

Hi everyone,

I am also running into the situation where LCA model could not concave and I constrained _cons @ -15 for the particular variable if its _cons was more negative than -15. After re-running the model with constrains applied, I did not need to specify -nonrtolerance- to reach model concave but still I got ". . ." for standard errors and 95% CIs for those problematic variables. Is that normal? Can I still report the conditional probability (e.g 3.06E-07) for each variable based on class membership? These numbers suggest unreliability of the model, correct?

Many thanks in advance.

Best,
Mengmeng
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#14

02 Jun 2018, 13:00

Originally posted by Mengmeng Li View Post

May I clarify with you if "N" noted in "e(N)" in the denominator indicated the sample size?

It does. After an estimation command, Stata stores the sample size in a scalar (think of it as a 1x1 matrix) called e(N). You can type

Code:

display e(N)

after any estimation command and you'll see this. Of course, I should also mention that e(N) gets overwritten every time you run a new estimation command, so if you want to calculate entropy, you want to run that set of commands immediately after the corresponding LCA.

Originally posted by Mengmeng Li

I am also running into the situation where LCA model could not concave and I constrained _cons @ -15 for the particular variable if its _cons was more negative than -15. After re-running the model with constrains applied, I did not need to specify -nonrtolerance- to reach model concave but still I got ". . ." for standard errors and 95% CIs for those problematic variables. Is that normal? Can I still report the conditional probability (e.g 3.06E-07) for each variable based on class membership? These numbers suggest unreliability of the model, correct?

It might help if you posted the appropriate section of the results (within code delimiters). That said, once you constrain a variable to some value, I believe this means that it is treated as a known quantity rather than a statistical estimate. So, the standard error will be missing. In that case, I think you can indeed report the conditional probability - you'll note that this is pretty close to 0. I'd explain this in whatever article you write.

If you have further questions on this topic, it would be better to start a new thread. The current thread relates to model entropy. Starting a new thread will make it easier for others who have this question in the future to search for and find the appropriate information.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Mengmeng Li

Join Date: Sep 2016

Posts: 62
#15

02 Jun 2018, 13:59

Originally posted by Weiwen Ng View Post

It does. After an estimation command, Stata stores the sample size in a scalar (think of it as a 1x1 matrix) called e(N). You can type

Code:

display e(N)

after any estimation command and you'll see this. Of course, I should also mention that e(N) gets overwritten every time you run a new estimation command, so if you want to calculate entropy, you want to run that set of commands immediately after the corresponding LCA.

It might help if you posted the appropriate section of the results (within code delimiters). That said, once you constrain a variable to some value, I believe this means that it is treated as a known quantity rather than a statistical estimate. So, the standard error will be missing. In that case, I think you can indeed report the conditional probability - you'll note that this is pretty close to 0. I'd explain this in whatever article you write.

If you have further questions on this topic, it would be better to start a new thread. The current thread relates to model entropy. Starting a new thread will make it easier for others who have this question in the future to search for and find the appropriate information.

Thank you so much for your quick reply. It is very helpful. And as suggested, I will start a new post to ask for advices on LCA modeling.

Best,
Mengmeng
Comment

Announcement

Calculating entropy for LCA (latent class analysis) in STATA 15

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment