Problem with generating class membership variable using latent profile analysis

Devon Hensel

Join Date: Mar 2018

Posts: 2
#1

Problem with generating class membership variable using latent profile analysis

20 Mar 2018, 12:59

Hello all:

I have a longitudinal dataset of adolescent relationships (N=683), each of which have one continuous measure of relationship length in months (MosTotalTog).

I have been successful in generating a three class solution as the best fit, using this code:
gsem (MosTotalTog <- _cons), lclass(C 3) nocapslatent nonrtolerance

I would like to generate a "predicted class membership" variable (predclass) to use in other analyses. My code is this:
predict cpost*, classposteriorpr
egen max = rowmax(cpost*)
generate predclass = 1 if cpost1==max
replace predclass = 2 if cpost2==max
replace predclass = 3 if cpost3==max

When I run a frequency distribution on this variable however, I only see two classes: 2 and 3, both of which account for the entire 683 observations. I am clearly "losing" class 1 somewhere, but I can't figure out where.

Many thanks!
Devon
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

20 Mar 2018, 13:29

Devon,

This means that nobody has a modal class of 1. That could happen if the 3 classes aren't well-separated, i.e. their mean values of MosTotalTog are quite close together. Basically, class 1 is not distinct from 2 or 3. This leads to a question about how well-identified the model is, discussed below. Also note that for your code as written, if any observations had missing values on MosTotalTog, they would be coded as class 3, because missing numeric values are coded as the highest imaginable numeric value when Stata assess commands like -replace-. I don't think this caused your error, but it's maybe worth noting.

More generally, I see you are running a latent profile analysis with only one indicator variable, and that you used the -nonrtolerance- option. I do worry that one variable isn't a lot of information to identify 3 separate classes. In general, Stata has advised that we turn off the -nonrtolerance- option and re-fit the model from the identified parameters, e.g.:

Code:

gsem (MosTotalTog <- _cons), lclass(C 3) nocapslatent nonrtolerance matrix b = e(b) gsem (MosTotalTog <- _cons), lclass(C 3) nocapslatent from(b)

The -nonrtolerance- option allows the maximizer to converge in a non-concave part of the parameter space by eliminating the criterion that the second derivative of the log likelihood be 0. My impression is that if you do this, you are not guaranteed to be at a global maximum log likelihood. I would see if your model converges with the code above; the second line saves the estimated parameters in a matrix named b, and in the third line, you tell gsem to estimate using the saved parameters as a starting point. If the model is identifiable, my experience has been that it will converge quickly.

Of note, other programs seem to favor running a large number of random starting parameter values, but I'm not sure if they impose something equivalent to Stata's default -rtolerance- criterion. As your model gets more complex, I'd advise using the -startvalues(randomid, draws(k)- option as documented in the -gsem- documentation. Other programs seem to run as many as 100 start values with 5 or more classes, and I'd probably advise doing this (this Stata option gives each observation a randomly drawn class, and then iterates from the starting parameter values formed by the randomly-assigned classes).

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
2 likes
Comment
Devon Hensel

Join Date: Mar 2018

Posts: 2
#3

20 Mar 2018, 17:42

That makes much more sense now - thank you very much for the advice.
Comment
bamboo Zhu

Join Date: Jul 2016

Posts: 18
#4

02 Apr 2018, 20:47

Originally posted by Weiwen Ng View Post

Devon,

This means that nobody has a modal class of 1. That could happen if the 3 classes aren't well-separated, i.e. their mean values of MosTotalTog are quite close together. Basically, class 1 is not distinct from 2 or 3. This leads to a question about how well-identified the model is, discussed below. Also note that for your code as written, if any observations had missing values on MosTotalTog, they would be coded as class 3, because missing numeric values are coded as the highest imaginable numeric value when Stata assess commands like -replace-. I don't think this caused your error, but it's maybe worth noting.

More generally, I see you are running a latent profile analysis with only one indicator variable, and that you used the -nonrtolerance- option. I do worry that one variable isn't a lot of information to identify 3 separate classes. In general, Stata has advised that we turn off the -nonrtolerance- option and re-fit the model from the identified parameters, e.g.:

Code:

gsem (MosTotalTog <- _cons), lclass(C 3) nocapslatent nonrtolerance matrix b = e(b) gsem (MosTotalTog <- _cons), lclass(C 3) nocapslatent from(b)

The -nonrtolerance- option allows the maximizer to converge in a non-concave part of the parameter space by eliminating the criterion that the second derivative of the log likelihood be 0. My impression is that if you do this, you are not guaranteed to be at a global maximum log likelihood. I would see if your model converges with the code above; the second line saves the estimated parameters in a matrix named b, and in the third line, you tell gsem to estimate using the saved parameters as a starting point. If the model is identifiable, my experience has been that it will converge quickly.

Of note, other programs seem to favor running a large number of random starting parameter values, but I'm not sure if they impose something equivalent to Stata's default -rtolerance- criterion. As your model gets more complex, I'd advise using the -startvalues(randomid, draws(k)- option as documented in the -gsem- documentation. Other programs seem to run as many as 100 start values with 5 or more classes, and I'd probably advise doing this (this Stata option gives each observation a randomly drawn class, and then iterates from the starting parameter values formed by the randomly-assigned classes).

Hi,
Thank you very much for your information.
My question is:
If the model is not identifiable after using from(b), still cannot converge, what should I do?

Thanks.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

03 Apr 2018, 10:33

Originally posted by bamboo Zhu View Post

If the model is not identifiable after using from(b), still cannot converge, what should I do?

Bamboo,

Other software programs (e.g. MPlus, the Stata LCA plugin from Penn State University, the POLCA package in R by Linzer and Lewis, and Latent Gold) appear to run as many as 100 random start values, and accept the highest solution that had consistent convergence. After some discussion with Stata, I think that Stata may use different convergence criteria than at least some of these programs. In particular, Stata's usual convergence criteria check for both the first and second derivatives of the log likelihood function. -nonrtolerance- turns the second criterion off, so it allows the maximizer to converge in a non-concave part of the likelihood function. This runs the risk that you are not at the global maximum. I think that the other programs may not perform this check, but they deal with the problem of running into a local maxima by multiple random starting parameter values.

Here is the simplest explanation I can come up with. I hope it is clear. In maximum likelihood estimation, we have a multidimensional likelihood function (one dimension for each parameter estimated), and we try to maximize it. With OLS regression, this likelihood function is usually very well-behaved, even if you have many parameters (provided they aren't too collinear).

Some likelihood functions are known to be poorly behaved. The latent class likelihood function is one of those. I suspect that we are hypothesizing that there are several distinct classes, the likelihood function is inherently prone to local maxima. Hence, other programs appear to take the stance that multiple randomly selected starting parameter values are the way to go.

You can approximate the multiple random starts process in Stata:

Code:

gsem (MosTotalTog <- _cons), lclass(C 3) nocapslatent nonrtolerance startvalues(randomid, draws(20)) emopts(iterate(10)) matrix b = e(b) gsem (MosTotalTog <- _cons), lclass(C 3) nocapslatent from(b)

This will make Stata do this:

1. It will assign each observation a latent class chosen at random.

2. I think it then calculates the mean parameter values in each class.

3. Whatever it does, it will then run 10 iterations of the expectation maximization algorithm (default is 20, you can reduce to 10 for this purpose as it is a bit slower than the usual maximization routine).

4. After doing so, it will then take the iteration with the highest likelihood and then run its usual maximization algorithm.

5. I have been in situations where it did this but it still failed to converge, or I couldn't get the model to converge with nonrtolerance!!!! So be careful!!

If you do this, and the model fails to converge after many random starts, you should check the output for logit parameters that are +/- 15 or higher. These correspond to a class-specific probability of nearly 1 or nearly 0 (i.e. logit(p) is +/- 15, and you can verify the math yourself). You can constrain the parameters at +/-15 and re-check for convergence. The manual shows yuo how to do this, and it doesn't apply to the example above because it used a continuous indicator.

If appropriate parameter constraints don't solve the problem, it could be that the model is not identifiable, and you can stop there and compare BIC for the models you successfully identified. At some point, as you increase the number of classes, this is likely to happen to you.

The -randomid- option doesn't give you the ability to save and replicate the other solutions. I posted below about how to replicate that, but it can take a long time.
https://www.statalist.org/forums/for...m-lca-stata-15

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
bamboo Zhu

Join Date: Jul 2016

Posts: 18
#6

03 Apr 2018, 14:10

#5

Thank you very very much for your detailed information. That's really helpful.
Could you provide more information about how to constrain the parameters for this example?
My model still cannot be a convergence. I may think to use other methods.
Thank you very much.

Last edited by bamboo Zhu; 03 Apr 2018, 14:13.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#7

03 Apr 2018, 14:29

Originally posted by bamboo Zhu View Post

#5

Thank you very very much for your detailed information. That's really helpful.
Could you provide more information about how to constrain the parameters for this example?
My model still cannot be a convergence. I may think to use other methods.
Thank you very much.

Imagine that the model had 3 binary indicators instead. Upon inspecting the output with nonrtolerance, you discover that for the first latent class, the parameter was -17. You would type:

Code:

gsem (X1 X2 X3 <-, logit) (1: X1<- _cons@-15), lclass(C 3)

More info below.
https://www.statalist.org/forums/for...5-gsem-problem

Again, you should inspect your output for any parameters that may make sense to constrain. I cannot help you for non-logit parameters; this discussion only applies to logit parameters for binary indicators.

Last, you might consider the Penn State University plugin, which can only take binary or ordinal indicators.

If you have any further questions, please start a different thread; this one is drifting off topic (and someone searching for your question will not be able to easily find the relevant thread).

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
bamboo Zhu

Join Date: Jul 2016

Posts: 18
#8

03 Apr 2018, 19:04

Originally posted by Weiwen Ng View Post

Imagine that the model had 3 binary indicators instead. Upon inspecting the output with nonrtolerance, you discover that for the first latent class, the parameter was -17. You would type:

Code:

gsem (X1 X2 X3 <-, logit) (1: X1<- _cons@-15), lclass(C 3)

More info below.
https://www.statalist.org/forums/for...5-gsem-problem

Again, you should inspect your output for any parameters that may make sense to constrain. I cannot help you for non-logit parameters; this discussion only applies to logit parameters for binary indicators.

Last, you might consider the Penn State University plugin, which can only take binary or ordinal indicators.

If you have any further questions, please start a different thread; this one is drifting off topic (and someone searching for your question will not be able to easily find the relevant thread).

You are great! I learn a lot from you.
My model has converged.
Thank you very very much for all your help.
Comment
bamboo Zhu

Join Date: Jul 2016

Posts: 18
#9

03 Apr 2018, 19:09

#7
Thank you very very much for your example and explanation.
I have got my model to converge by constraining the parameter.
Thank you very much.

Zhu
Comment
bamboo Zhu

Join Date: Jul 2016

Posts: 18
#10

03 Apr 2018, 21:30

#7
may I ask one more question? Do you know how whether the LCA could report adjusted BIC in Stata?
"estat lcgof" can report AIC and BIC. I see an example from Mplus, adjusted BIC was also reported.
Thank you very much.
Zhu
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#11

04 Apr 2018, 08:26

Originally posted by bamboo Zhu View Post

#7
may I ask one more question? Do you know how whether the LCA could report adjusted BIC in Stata?
"estat lcgof" can report AIC and BIC. I see an example from Mplus, adjusted BIC was also reported.
Thank you very much.
Zhu

The sample size adjusted BIC referred to in this post on the MPlus forum? You have to calculate it yourself, but it's not that hard. Do note that both Drs Muthen in the MPlus post were not that concerned about the sample size adjusted BIC over the standard BIC, and they would probably know better than you or I. Personally, I did read the article they cited, but in my own work with a sample of about 11,000, the sample size adjustment didn't appear to do much.

Stata also has no provision for the bootstrap likelihood ratio test for model selection. Just stick with BIC.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
bamboo Zhu

Join Date: Jul 2016

Posts: 18
#12

04 Apr 2018, 09:43

#11
I found it "Sample Size adjusted—Adj BIC = -2*logLikelihood + p[ln((n+2)/24). Smaller is better. Muthén reports that simulation studies indicate this is superior to BIC."
The bootstrap likelihood ratio test is also my concern, thank you for mentioning it.
Many many thanks!
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#13

04 Apr 2018, 14:58

Originally posted by bamboo Zhu View Post

#11
I found it "Sample Size adjusted—Adj BIC = -2*logLikelihood + p[ln((n+2)/24). Smaller is better. Muthén reports that simulation studies indicate this is superior to BIC."
The bootstrap likelihood ratio test is also my concern, thank you for mentioning it.
Many many thanks!

My point is that in the post I linked, both Bengt and Linda Muthén said they couldn't really recommend the normal BIC vs the sample size adjusted BIC. They did so despite the fact that at least Bengt Muthén was on one of those simulation studies, and both of them definitely have the theoretical background to understand what they are talking about. One thing I know is an issue with simulation studies is that they are simulated. We generate data with a fixed data generating process - this means we know how many classes, we know the class means, and we know the distributions of the variables (e.g. we might generate normal data, whereas normality may only be a rough approximation to something in real life). In a real dataset, you know none of this.

I would not lose too much sleep over what tests Stata has versus doesn't have. It's easy enough to calculate the sample size adjusted BIC. And yes, this tone contradicts what I wrote in my own post that I linked, but I've done some more reading since then.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
bamboo Zhu

Join Date: Jul 2016

Posts: 18
#14

04 Apr 2018, 17:00

#13
Thank you very much for pointing it out more clearly.
That's really helpful.
Thank you very much.
Comment

Announcement

Problem with generating class membership variable using latent profile analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment