Local dependency test for Latent Class Analysis

Wossenseged Jemberie

Join Date: Aug 2017

Posts: 17
#1

Local dependency test for Latent Class Analysis

25 Feb 2019, 10:03

Dear Stata users,
I was looking for Stata materials on methods to test local dependency between items (observed variables) when doing Latent Class Analysis. I was not able to find one. I would appreciate some advice or example on this issue, for example on

Code:

use http://www.stata-press.com/data/r15/gsem_lca1

.
Thank you
Tags: None

1 like
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

26 Feb 2019, 09:59

This is a very interesting question. For background information, latent class analysis requires the assumption of local independence: the indicator variables are independent within each latent class, i.e. they are independent conditional on latent class, and all the dependency among the observed indicators is explained by the latent class structure; NB this assumption is also required in item response theory.

In latent profile analysis, with continuous Gaussian indicators, you can quite easily relax the local independence assumption by allowing the error terms to be correlated within each class (specify the covstructure option as indicated in SEM example 52). Indeed, when doing LPA, you most definitely should vary not only the number of classes, but also the model structure (that includes correlated vs uncorrelated error terms, and class-variant vs -invariant variances with the lcinvariant(none) option). However, because Wossenseged pointed us to the data for latent class analysis with binary indicators, I will assume that the question pertains to binary or categorical indicators.

I have not seen any options to test for local dependence in either the Penn State University Stata plugin, or in Drew Linzer's R package polca. I do see that Latent Gold and MPlus both calculate some sort of standardized bivariate residual between all pairs of binary items (uncertain how categorical items are treated). Residuals with absolute values over 1.96 are a possible indication that local dependence is violated. It looks like Linda Muthén gave the formula for the bivariate residual (maybe) in her Feb 1, 2009 post. So, it should be possible to calculate the statistics involved, but I don't know of an automated program to do this in Stata. To be honest, I haven't paid attention to this in my work, and I the LCA papers I've reviewed haven't done so either.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Wossenseged Jemberie

Join Date: Aug 2017

Posts: 17
#3

26 Feb 2019, 15:48

Thank you, Weiwen Ng. I have actually categorical indicators as you mentioned it correctly. I was worried because the AIC and BIC estimates were continuously decreasing as I added additional classes (but with some of the added classes consisting of below 2% of the sample, and with no theoretical support ), and I thought if I had some issue with this. But I have also posted elsewhere another question which might be related to this. Thank you again!
Comment
Madu Abuchi

Join Date: Sep 2017

Posts: 143
#4

23 Feb 2020, 21:41

Has anyone worked up how to test for assumption of local independence in Stata for a latent class analysis?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

24 Feb 2020, 11:17

Originally posted by Madu Abuchi View Post

Has anyone worked up how to test for assumption of local independence in Stata for a latent class analysis?

Madu, I'm going give my thoughts on your other post here. In general, the FAQ asks us not to repeatedly bump a post. Also, your other post doesn't seem related to the topic.

There, you asked this:

Does anyone with some experience in latent class analysis in Stata share possible ways of dealing with violation of local independence assumption. Does incorporating a covariance structure for all error variables associated with observed endogenous variables in your model like:

, covstructure(e._OEn, unstructured)

solve the problem?

I alluded to that in post #2 on this thread.

If you have some continuous items/indicators, and you assume that they are distributed Gaussian (i.e. you're doing latent profile analysis with Gaussian indicators), you are making an assumption like this (where _k indexes the latent class, and you have 2 indicators, y1 and y2):

y1_k = mean_1k + error_1k

y2_k = mean_2k + error_2k

By default, the syntax you'd use would assume that within each class, y1 and y2 have zero correlation. If you include the covariance structure option, you cited,

Code:

, covstructure(e._OEn, unstructured)

That tells Stata to assume the indicators are correlated.

Refer to figure 6 in this article. (That's an article on the flexmix package in R; I realize this is the Stata forum, but it's a good illustration.)

In the figure, the authors are fitting a latent profile model to data. There are two variables of interest. Each point in the scatterplot is an observation. In the figure on the left, the authors assumed uncorrelated indicators within each class (more correctly, the error terms above are uncorrelated). There's a group of points that, in the left figure, the model assigned to latent classes 1 and 4. However, maybe it's more substantively correct to say that that group of points is one latent class, not two - it's just that within that class, the indicators are correlated. That's the model they fit for the figure on the right.

(Side note: I sometimes liken latent profile analysis to you taking a magic cookie cutter to the data. The cookie cutter can cut out k circles or ellipses out of the data. If you assume uncorrelated indicators, your cookie cutter can't tilt at all. If you assume correlated indicators, the cookie cutter can tilt at an angle. And naturally, if you have more than 3 dimensions, it's really a magic cookie cutter, but I believe the principle still holds.)

Now, what does that have to do with local independence? Local independence means that you assume that conditional on latent class, all the indicators are independent of each other. I'd suggest you to skim the Kathryn Masyn chapter that the Stata manual references. I believe that she explains why LPA models can relax the requirement of conditional independence. How would you test this? Well, you can simply compare BIC among the LPA models with correlated errors and without.

However, you asked about latent class analysis, which makes me wonder if you are asking about binary indicators. If you have binary indicators, the covariance structure option will not do anything. I don't know how to test for violations of conditional independence when we have binary indicators, and it does not seem like that's been addressed.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Steev Loyola

Join Date: Jul 2020

Posts: 3
#6

07 Mar 2023, 20:28

Hello everyone. I appreciate the discussion on the subject. It serves as a guide and starting point for me. Since this discussion is already a couple of years old, I wonder if by now (March 2023) you have any specific extension/command or guide for Local dependency test for Latent Class Analysis. Looking forward to hearing from you!
Comment

Announcement

Local dependency test for Latent Class Analysis

Comment

Comment

Comment

Comment

Comment