Known groups in LCA

David Isaksson

Join Date: Mar 2020

Posts: 14
#1

Known groups in LCA

28 Mar 2020, 09:51

Hi all,

I am still quite new to this so maybe it is just a simple problem. I am using Stata 16 at a mac.

I have done an LCA with 6 variables and 3 classes.

Code:

gsem (ancestry culture lang jobs pride civic <-), logit lclass(C 3),

I now try to do the same thing but testing the results over different regions. I keep receiving the error message

Code:

quietly gsem (ancestry culture lang jobs pride civic <-), over(land) logit lclass(C > 3) option over() not allowed r(198);

Where am I supposed to ad the categorical variable here? What am I doing wrong here?

All help is very appreciated!

Last edited by David Isaksson; 28 Mar 2020, 09:53.
Tags: Latent Class Analyses, lca
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

28 Mar 2020, 10:04

Subjectively, it seems to me like there are a bunch of people who are on lockdown and are turning to LCA, perhaps to relieve boredom? But this could be surveillance bias on my part.

More seriously, define "testing the results over different regions." I assume you mean that you want to see if the coefficients vary over different regions? If that's it, you actually wanted the group option; there's no over option in gsem. However, you will need to tell Stata which parameters are allowed to vary over groups, as defined in the manual here. If this is what you actually want, then I believe that as per this post, you'd want to allow the constants to vary. The default is that constants, coefficients, and loading parameters are constrained equal. I think that in this context, the constants are the class-specific response probabilities to each item, e.g. the probability of endorsing ancestry in each of the 3 latent classes. So, if I'm right, you'd type this:

Code:

gsem (ancestry culture lang jobs pride civic <-), group(land) ginvariant(coef loading) logit lclass(C > 3)

Then, you should be able to inspect the parameters by each group.

Be aware, however, that in your model, you are fitting 6 parameters per latent class (for 6 response variables), plus K - 1 parameters for the multinomial model describing the probability of being in each latent class (here, that's 3 classes minus 1 = 2 parameters). If you try this for x groups, I think that you're multiplying the number of parameters by x. This could rapidly produce identification issues. Or you could have convergence issues.

I've got no direct experience with this technique, and I haven't done any reading on it yet, so I can't offer any sound theoretically-based guidance, just my gut instincts.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
David Isaksson

Join Date: Mar 2020

Posts: 14
#3

28 Mar 2020, 15:50

Thank you Weiwen! It seems to work and you interpret my question correctly. However, when I want to compare the lc prob between the regions I get this error message:

Code:

estat lcprob, group(land) option group() not allowed r(198);

How is this supposed to be coded? I cannot find any information regarding this issue anywhere. I am very thankful for all help.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

28 Mar 2020, 16:25

Originally posted by David Isaksson View Post

Thank you Weiwen! It seems to work and you interpret my question correctly. However, when I want to compare the lc prob between the regions I get this error message:

Code:

estat lcprob, group(land) option group() not allowed r(198);

How is this supposed to be coded? I cannot find any information regarding this issue anywhere. I am very thankful for all help.

What does estat lcprob alone do?

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
David Isaksson

Join Date: Mar 2020

Posts: 14
#5

29 Mar 2020, 02:54

It reports the latent class probabilities. How the distribution over the classes looks. I want to compare the distribution between the regions.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

30 Mar 2020, 10:00

Originally posted by David Isaksson View Post

It reports the latent class probabilities. How the distribution over the classes looks. I want to compare the distribution between the regions.

Hmm. This is not what I experienced. I have Stata 16 on my machine. I asked it to revert to version 15 using the version command, and it also reported the class prevalences over each of the values in group.

Going off my old code in the other thread, we're going to create a fictitious version of the dataset used in SEM example 50. That dataset had 4 indicators: accident, play, insurance, and stock. The example has 4 indicators that basically ask if you'd favor your friend's interest over broader societal/legal interests (e.g. would you testify against your friend in an accident case?). My fictitious adaptation adds gender to the equation. In this randomly generated dataset, 40% of the sample are men. There are 2 latent classes: particularistic (i.e. you favor your friend) and universalistic (i.e. you favor society's interest in honesty). Men are less likely to belong to the universalistic latent class (28% of them are in this class, vs. 42% of women). I'm modifying things vis a vis my example data in the other post to simplify things (in that post, women had different response probabilities as well).

Code:

clear set seed 4142 set obs 1000 gen male = rbinomial(1,0.4) gen class = rbinomial(1,0.42) + 1 if male == 0 replace class = rbinomial(1,0.28) + 1 if male == 1 gen accident = . gen play = . gen insurance = . gen stock = . /*Latent class #1, particularistic*/ replace accident = rbinomial(1,0.71) if class == 1 replace play = rbinomial(1,0.33) if class == 1 replace insurance = rbinomial(1,0.35) if class == 1 replace stock = rbinomial(1,0.13) if class == 1 /*Latent class #2, universalistic*/ replace accident = rbinomial(1,0.92) if class == 2 replace play = rbinomial(1,0.71) if class == 2 replace insurance = rbinomial(1,0.93) if class == 2 replace stock = rbinomial(1,0.82) if class == 2 /*Optional if you're on version 15 of Stata*/ version 15 quietly gsem (accident play insurance stock <-, logit), lclass(C 2) group(male) byparm ginvariant(coef loading) nonrtolerance estat lcprob, nose Latent class marginal probabilities Number of obs = 1,000 -------------------------------------------------------------- male | Margin -------------+------------------------------------------------ 0 | C | 1 | .623417 2 | .376583 -------------+------------------------------------------------ 1 | C | 1 | .728638 2 | .271362 --------------------------------------------------------------

So, for each of your known groups, Stata should report the distribution of latent classes, e.g. 37.7% of women and 27.1% of men are in latent class #2, the universalistic class (if you delete the nose option, you'll see that the CI does contain the simulated probabilities of 42% for women, 28% for men).

Please clarify if this isn't what you were looking for.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
David Isaksson

Join Date: Mar 2020

Posts: 14
#7

03 Apr 2020, 07:37

Thank you so much! This was exactly what I needed and it worked just fine. Problem solved.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

03 Apr 2020, 17:14

Originally posted by David Isaksson View Post

Thank you so much! This was exactly what I needed and it worked just fine. Problem solved.

You're welcome!

Just for info for others who read this: the term "known groups" references the fact that the grouping variable here is known/observed (whereas the latent class is categorical, but not directly observed). This page on the UCLA site discusses this method in the context of MPlus, and is does verify that I correctly deduced David's intentions (that is, do the response probabilities vary over the values of the known group).

Another thing you could do with latent class and some sort of observed variable is that you could say, I don't have any strong a priori reason to think that the response probabilities vary, but I would like to know if the observed variable is associated with membership in some of the latent classes. For example, you might wonder if men were more or less likely to belong to the universalistic class (sticking with the fictitious example above). In the example, we know this to be the case, because that's what I simulated the data. But in real life, we don't know this. We might use latent class regression to examine this question.

Code:

gsem (accident play insurance stock <-, logit) (C <- male), lclass(C 2) margins, predict(classpr class(1)) predict(classpr class(2)) over(male) Predictive margins Number of obs = 1,000 Model VCE : OIM over : male 1._predict : Predicted probability (1.C), predict(classpr class(1)) 2._predict : Predicted probability (2.C), predict(classpr class(2)) ------------------------------------------------------------------------------- | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- _predict#male | 1 0 | .5989598 .0413331 14.49 0.000 .5179485 .6799712 1 1 | .7381489 .0372952 19.79 0.000 .6650517 .8112461 2 0 | .4010402 .0413331 9.70 0.000 .3200288 .4820515 2 1 | .2618511 .0372952 7.02 0.000 .1887539 .3349483 -------------------------------------------------------------------------------

To reiterate, this model estimated that women had a 40.1% probability of being in class 2, and men had a 26.2% probability of being in class 2. The confidence intervals for these probabilities contain the simulated probabilities.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Known groups in LCA

Comment

Comment

Comment

Comment

Comment

Comment

Comment