Finite mixture model

yuling Wu

Join Date: Jul 2023

Posts: 1
#1

Finite mixture model

26 Jul 2023, 05:11

I am currently using Stata17 to learn about the limited Mixture model. However, I have encountered some difficulties and I am in need of your assistance. I greatly appreciate your help.
Here is the issue I am facing: After conducting finite mixed linear regression analysis on three categories, I have successfully completed the clustering of observations by calculating the Posterior probability. Now, I would like to match the clustering results with latent class and calculate the accuracy of the model. Unfortunately, I am unsure about the specific code required to perform this task, even after consulting the Stata operation manual and the textbook “Microeconomics Using Stata”.
It would be immensely appreciated if you could provide me with guidance on this matter. Thank you very much in advance.
Supplement: I found the code online
collapse (median)pr*, by(Class)
list
recode Cluster (2 = 1) (3 = 2) (1 = 3), gen(Class_pred)
gen True_pred = 0
replace True_pred = 1 if Class == Class_pred
collapse (sum)True_pred
local Accuracy = True_pred / 178
display `Accuracy'
However, these codes are not feasible because Stata will prompt that the variable "Class" was not found after running, as I understand that finite mixed regression analysis does not automatically generate the variable Class representing the latent class.
Here is the code I have already run
use https://www.stata-press.com/data/r17/mus03sub
qui fmm 3, lcprob(totchr): regress lmedexp income c.age##c.age totchr i.sex
predict pr*, classposteriorpr
gen Cluster = 0
replace Cluster = 1 if pr1 > pr2 & pr1 > pr3
replace Cluster = 2 if pr2 > pr1 & pr2 > pr3
replace Cluster = 3 if pr3 > pr1 & pr3 > pr2
tabulate Cluster
Tags: None
Daniel Schaefer

Join Date: Mar 2020

Posts: 822
#2

26 Jul 2023, 14:04

Welcome to the forum.

Now, I would like to match the clustering results with latent class and calculate the accuracy of the model.

The problem is that you don't know the true latent class. The Class variable should contain the true value of the latent class, and it should come as a column in the original dataset. However, the mus03sub dataset doesn't have a column indicating the "true" latent class. You say you get this code somewhere online. Are you sure you aren't mixing and matching code from different resources? It looks like this data you use is from the Stata 17 reference, but it doesn't look like the reference material does this comparison to the true class that you have outlined above.

It actually makes sense that there isn't a true label in this dataset, because if you already know the true label, the class isn't actually "latent," the class is "observed," and you don't need the model in the first place. You should only expect to have true labels in cases where you have a subset of labels (perhaps hand-coded by a human) and you want to automate coding the rest by predicting those labels with a model, or in cases where you want to evaluate the accuracy on the model under different statistical conditions, often with simulated data. Otherwise, you should use theory to modulate your expectations about the predicted classes.

Finally, please place any code within code tags (see the # symbol in the editor).
1 like
Comment

Announcement

Finite mixture model

Comment