Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finite mixture model

    I am currently using Stata17 to learn about the limited Mixture model. However, I have encountered some difficulties and I am in need of your assistance. I greatly appreciate your help.
    Here is the issue I am facing: After conducting finite mixed linear regression analysis on three categories, I have successfully completed the clustering of observations by calculating the Posterior probability. Now, I would like to match the clustering results with latent class and calculate the accuracy of the model. Unfortunately, I am unsure about the specific code required to perform this task, even after consulting the Stata operation manual and the textbook “Microeconomics Using Stata”.
    It would be immensely appreciated if you could provide me with guidance on this matter. Thank you very much in advance.
    Supplement: I found the code online
    collapse (median)pr*, by(Class)
    list
    recode Cluster (2 = 1) (3 = 2) (1 = 3), gen(Class_pred)
    gen True_pred = 0
    replace True_pred = 1 if Class == Class_pred
    collapse (sum)True_pred
    local Accuracy = True_pred / 178
    display `Accuracy'
    However, these codes are not feasible because Stata will prompt that the variable "Class" was not found after running, as I understand that finite mixed regression analysis does not automatically generate the variable Class representing the latent class.
    Here is the code I have already run
    use https://www.stata-press.com/data/r17/mus03sub
    qui fmm 3, lcprob(totchr): regress lmedexp income c.age##c.age totchr i.sex
    predict pr*, classposteriorpr
    gen Cluster = 0
    replace Cluster = 1 if pr1 > pr2 & pr1 > pr3
    replace Cluster = 2 if pr2 > pr1 & pr2 > pr3
    replace Cluster = 3 if pr3 > pr1 & pr3 > pr2
    tabulate Cluster

  • #2
    Welcome to the forum.

    Now, I would like to match the clustering results with latent class and calculate the accuracy of the model.
    The problem is that you don't know the true latent class. The Class variable should contain the true value of the latent class, and it should come as a column in the original dataset. However, the mus03sub dataset doesn't have a column indicating the "true" latent class. You say you get this code somewhere online. Are you sure you aren't mixing and matching code from different resources? It looks like this data you use is from the Stata 17 reference, but it doesn't look like the reference material does this comparison to the true class that you have outlined above.

    It actually makes sense that there isn't a true label in this dataset, because if you already know the true label, the class isn't actually "latent," the class is "observed," and you don't need the model in the first place. You should only expect to have true labels in cases where you have a subset of labels (perhaps hand-coded by a human) and you want to automate coding the rest by predicting those labels with a model, or in cases where you want to evaluate the accuracy on the model under different statistical conditions, often with simulated data. Otherwise, you should use theory to modulate your expectations about the predicted classes.

    Finally, please place any code within code tags (see the # symbol in the editor).

    Comment

    Working...
    X