FMM Class Probabilities

Iman Kent

Join Date: Dec 2018
Posts: 4

FMM Class Probabilities

04 Dec 2018, 17:57

Hello Stata experts:

I have a large dataset for which I have run a FMM model. I have realized that there are two latent classes in my dataset. I was curious to know what portion of my obs falls into each class. I followed Stata documentations, and I used the following piece of code:

Code:

estat lcprob

This was the results:

Latent class marginal probabilities Number of obs = 2,382,941

--------------------------------------------------------------
\| Delta-method
\| Margin Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Class \|
1 \| .7623763 .0021933 .7580509 .7666482
2 \| .2376237 .0021933 .2333518 .2419491
--------------------------------------------------------------

This implies that around 76% of obs falls into Class 1 and the rest falls into Class 2.

Then, I calculated class posteriors by the following code to see the likelihood of class membership of each obs:

Code:

 predict P, classposteriorpr

Now, I could see the probability of each obs belongling to each class by the code below.

Code:

generate lgroup = P < 0.5
tab lgroup

The results is like this:

lgroup \| Freq. ---- Percent ---- Cum.
------------+-----------------------------------
0 \| 2,217,252 ----- 93.05 ---- 93.05
1 \| 165,694 - ------ 6.95 ----- 100.00
------------+-----------------------------------
Total \| 2,382,946 ------ 100.00

Do you see the issue? The percentage that I get here do not match with what I got lcprob command which is strange. A little bit of difference is fine but not this much!

I need help!

Thanks in advance

Tags: finite mixture model, fmm

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

04 Dec 2018, 18:25

Iman,

FMMs and latent class models predict each observation's group membership with uncertainty. For example, say you have a 2-class FMM or LCA, and some observation has characteristics that are intermediate between the two groups. We aren't very certain about which group that observation belongs to.

What you did was modal class assignment, where you say that you know an observation belongs to a group if it has a membership probability of 0.5 or greater. In many cases, this will be quite close to the truth. In cases where the classes are closer together, the modal class approach will be further from the truth. In any case, the modal class probabilities should never exactly equal the model-estimated class probabilities.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#3

04 Dec 2018, 19:12

Weiwen's advice is absolutely right. I'll just add to it that if you had model with 3 or more latent classes, it would be completely useless to use predicted probability > 0.5 as a way of assigning each observation to a predicted class. First of all, there might well be no such group at all. Moreover, you couldn't get around that problem by using, say predicted probability > 0.25 as the criterion in a 4-class model because there could easily be up to 3 such classes for a given observation. The point is that, in general, this kind of prediction approach really is not effective and should not be used, even in a 2-class model where its failure is not blatantly obvious.
Comment

lgroup \| Freq. ---- Percent ---- Cum.
------------+-----------------------------------
0 \| 2,217,252 ----- 93.05 ---- 93.05
1 \| 165,694 - ------ 6.95 ----- 100.00
------------+-----------------------------------
Total \| 2,382,946 ------ 100.00

Announcement

FMM Class Probabilities

Comment

Comment