Step-by step Latent class problem with convergence

Kristina Eisfeld

Join Date: Apr 2017

Posts: 4
#1

Step-by step Latent class problem with convergence

02 Jun 2020, 11:22

Hello dear Stataforum members,

I am new to latent class (especially the seeds, draws and iterations are quite new to me) and I am struggling as well with convergence as I feel insecure weather I did some mistakes in my analysis (I fear so). I am using LCA to differentiate between different classes of energy curtailment behaviours (coping in the household). In the end I want to see if there are group differences (e.g. weather older people or families with kids cope less). I have several items (4- scale likert with agreement/ disagreement recoded to binary 0/1). I’m feeling that after reading all the literature on local and global maximum here in this forum and elsewhere (e.g. Stata ppts like Mantegazzini 2019 or Canette 2017;Geiser 2011, Masyn 2013, or Nylund-Gibson and Choi 2018 etc.), I am doing several mistakes in some steps and I wanted to crosscheck this with you. I’m sorry for the long post. This is how I proceeded. I hope someone can be of help.

Code:

gsem (flipclone_U6_SQ001 flipclone_U5_SQ006 flipclone_U6_SQ003 flipclone_U5_SQ003 flipclone_U5_SQ002 flipclone_U5_SQ001 <-) if F1==2, logit lclass(C1) startvalues(randomid, seed(12356) ) iterate(100) estimates store classone gsem (flipclone_U6_SQ001 flipclone_U6_SQ003 flipclone_U5_SQ003 flipclone_U5_SQ002 flipclone_U5_SQ001 <-) if F1==2, logit lclass(C 2) startvalues(randomid, draws(50) seed(12356)) iterate(100) estimates store classtwo gsem (flipclone_U6_SQ001 flipclone_U6_SQ003 flipclone_U5_SQ003 flipclone_U5_SQ002 flipclone_U5_SQ001 <-) if F1==2, logit lclass(C 3) difficult startvalues(randomid, draws(50) seed(12356)) iterate(100) estimates store classthree

Here one variabale in class 3 does not show standard errors, which is obviously not a good sign.The model does not produce an error but cannot converge. So my model is not identified.
Output for the full model:

Code:

Iteration 100: log likelihood = -608.23402 (not concave) convergence not achieved

Trial and error: I tried not to use the option nonrtolerance because : "Never treat results obtained with the -nonrtolerance- option as final. Also, be aware that best practice in the field is to use many random draws and choose the one with the highest consistently-converged likelihood"
--> So I increased the draws as well as the iterations to 150 and 200 (as Weiwen Ng suggested once) but it did not converge either so I reduced them and interestingly sometimes it converges sometimes not (why is that so?). I changed the seed and now it converges with this syntax:

Code:

gsem (flipclone_U6_SQ001 flipclone_U6_SQ003 flipclone_U5_SQ003 flipclone_U5_SQ002 flipclone_U5_SQ001 <-) if F1==2, logit lclass(C 3) difficult startvalues(randomid, draws(50) seed(4567)) iterate(100)

Furthermore, I fitted a 4-class model and the same problem arises. I checked when the full model starts to show problems and it is after the 8th iteration.

Code:

Fitting full model: Iteration 0: log likelihood = -606.8057 (not concave) Iteration 1: log likelihood = -606.29641 (not concave) Iteration 2: log likelihood = -606.13592 (not concave) Iteration 3: log likelihood = -606.04871 (not concave) Iteration 4: log likelihood = -605.96887 Iteration 5: log likelihood = -605.89014 Iteration 6: log likelihood = -605.83681 Iteration 7: log likelihood = -605.80531 Iteration 8: log likelihood = -605.79682 (not concave)

So I increased the number of draws and it converges with this:

Code:

gsem (flipclone_U6_SQ001 flipclone_U6_SQ003 flipclone_U5_SQ003 flipclone_U5_SQ002 flipclone_U5_SQ001 <-) if F1==2, logit lclass(C 4) startvalues(randomid, draws(80) seed(12345)) iterate(100)

Then I checked the BIC and AIC output with estat lcgof (btw. also for me the LR test statistics are not working) and decide to choose the classtwo option.

Code:

estimates stats classone classtwo classthree classthreenew classfour Akaike's information criterion and Bayesian information criterion ----------------------------------------------------------------------------- Model | Obs ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- classone | 212 . -608.234 16 1248.468 1302.173 classtwo | 212 . -613.4098 11 1248.82 1285.742 classthree | 212 . -608.234 16 1248.468 1302.173 classthree~w | 212 . -608.234 17 1250.468 1307.53 classfour | 212 . -605.777 23 1257.554 1334.755 ----------------------------------------------------------------------------- Note: N=Obs used in calculating BIC; see [R] BIC note.

After obtaing estat lcgof, estat lcmean and estat lcprob, I predicted the prior and posterior and created marginsplot which looks like that (See attachment):

My question is did I do the analysis completly wrong or was that alright? How would you improve it? After I spinned my head around this analysis I would try to utilize the latent class regression. But one step after the other.
Remarks: I updated Stata 15.1 and my sample size for this subsample is N=220.
I would highly appreciate some hints and/or help. Thank you very much in advance!

Best Krissy

Last edited by Kristina Eisfeld; 02 Jun 2020, 11:52.
Tags: None
Kristina Eisfeld

Join Date: Apr 2017

Posts: 4
#2

02 Jun 2020, 11:32

Here the marginsplot
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

04 Jun 2020, 12:30

While you provide Stata code and what appears to be output, it would be much easier to help you if we could replicate your problem.

You didn't get a quick answer, partially because you have a very long complicated posting. I know that you would like to get detailed coaching on this process throughout, but Stata list is really not set up to do that. We respond much better to more focused questions.

Often when you try a new estimator it's a good idea to start by running the data and program that you've seen in somebody else's posting or paper. That at least will give you some feeling for what's going on. It is also a good idea to start with a simple model and build up.

I'm not sure exactly how this estimator works, but some estimators use some kind of simulated procedure that will be dependent on the random number generator so it is possible to get slightly different results in different runs. This should not be a problem if it converges since the point of convergence should be the same.

Just glancing at your results, there seems to be something funny about the different models – it seems odd that you would have three models with the same log likelihood. Are these really different variables?
Comment

Announcement

Step-by step Latent class problem with convergence

Comment

Comment