Latent Class Growth Analysis in Stata?

Zelina Stevens

Join Date: May 2024

Posts: 5
#1

Latent Class Growth Analysis in Stata?

05 Aug 2025, 22:03

Hello,

I am conducting an analysis such that the first step is to determine specific latent classes based on 10-items related to childhood health (all binary variables). These items were retrospectively asked (2006) to older adults once in the survey. After determining the latent classes via gsem and posterior probabilities, I would like to use the latent classes to predicted later-life health (2014, 2016, 2018). Apparently, this type of analysis is referred to as a latent class growth analysis and/or growth mixture modeling. However, it doesn't seem like these models can be run in Stata. Thus, I use the posterior probabilities from the LCA analysis and created a categorical variable that represents each latent class (3 were identified from gsem). Next, I used this variable to predict changes in health over time (2014, 2016, 2018) using mixed. Is this approach appropriate or do I need to use another software? I am unable to share the data, but an example of the steps I've taken is below:

gsem (item1 item2 item3 item4 item5 item6 item7 item8 item9 item10 <-), logit lclass(C 3) ///
startvalues(randompr, draws(5) seed(15) difficult) ///
emopts(iterate(30) difficult)

estat lcmean

predict cpost*, classposteriorpr
egen max = rowmax(cpost*)
generate predclass=1 if cpost1==max
replace predclass=2 if cpost2==max
replace predclass=3 if cpost3==max
label var predclass "Prediction Class - 3 Class LCA Solution"
label define predclass 1"Class 1" 2"Class 2" 3"Class 3"
tab predclass

Next, I reshaped the data to long format and used the following code:

mixed cvd i.predclass covariates time || id: time, cov(uns)

I would like to confirm whether this approach is appropriate and appreciate any help you can provide.
Tags: None
Erik Ruzek

Join Date: Oct 2017

Posts: 442
#2

06 Aug 2025, 07:13

Zelina,

Technically, this is not a latent class growth analysis, as such a model uses information about the outcome over time to categorize people into groups. See here. What you are doing is a little more straightforward in that you do the classification into groups based on a putatively exogenous variable (relative to the outcome measurement period) and then run a growth curve model with the group identifier as a covariate. As a side note, you should interact the predclass variable with time.

However, one thing that is important to account for in these types of analyses is the uncertainty surrounding the assignment of an individual to a group. That is, the latent class analysis in step 1 (gsem, then predict) has uncertainty baked into it. As evidence of this, you will see that each individual has probabilities associated with each of the latent classes. The latent class analysis will be more certain of the grouping assignment for some individuals than others. You can get a sense of the global prediction accuracy by estimating the entropy of the final solution. See here. Stata 19 calculates entropy automatically for you when using lcstats.

Unfortunately, when you include the predclass variable in mixed (or regress, logit, whatever), the model assumes no measurement error in that variable. Methodologists in the mixture modeling area have recognized this problem and proposed a number of solutions - this paper goes through each solution and its acceptability. Take a look at that and decide which works for you and your software setup. I will say that the best software for these types of analyses is Mplus or Latent Gold, although gsem in Stata is good. There are also good options in the freely-available R.
Comment
Zelina Stevens

Join Date: May 2024

Posts: 5
#3

06 Aug 2025, 10:03

Erik,

Thank you so much for taking time to answer my questions and provide resources for further edification. I appreciate you!

Zelina
Comment

Announcement

Latent Class Growth Analysis in Stata?

Comment

Comment