Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Step-by step Latent class problem with convergence


    Hello dear Stataforum members,

    I am new to latent class (especially the seeds, draws and iterations are quite new to me) and I am struggling as well with convergence as I feel insecure weather I did some mistakes in my analysis (I fear so). I am using LCA to differentiate between different classes of energy curtailment behaviours (coping in the household). In the end I want to see if there are group differences (e.g. weather older people or families with kids cope less). I have several items (4- scale likert with agreement/ disagreement recoded to binary 0/1). I’m feeling that after reading all the literature on local and global maximum here in this forum and elsewhere (e.g. Stata ppts like Mantegazzini 2019 or Canette 2017;Geiser 2011, Masyn 2013, or Nylund-Gibson and Choi 2018 etc.), I am doing several mistakes in some steps and I wanted to crosscheck this with you. I’m sorry for the long post. This is how I proceeded. I hope someone can be of help.
    Code:
    gsem (flipclone_U6_SQ001 flipclone_U5_SQ006  flipclone_U6_SQ003 flipclone_U5_SQ003 flipclone_U5_SQ002 flipclone_U5_SQ001 <-) if F1==2, logit lclass(C1) startvalues(randomid,  seed(12356) ) iterate(100)
    estimates store classone
    gsem (flipclone_U6_SQ001 flipclone_U6_SQ003 flipclone_U5_SQ003 flipclone_U5_SQ002 flipclone_U5_SQ001 <-) if F1==2, logit  lclass(C 2)  startvalues(randomid, draws(50)  seed(12356)) iterate(100)
    estimates store classtwo
    gsem (flipclone_U6_SQ001  flipclone_U6_SQ003 flipclone_U5_SQ003 flipclone_U5_SQ002 flipclone_U5_SQ001 <-) if F1==2, logit lclass(C 3)  difficult startvalues(randomid, draws(50) seed(12356)) iterate(100)
    estimates store classthree
    Here one variabale in class 3 does not show standard errors, which is obviously not a good sign.The model does not produce an error but cannot converge. So my model is not identified.
    Output for the full model:
    Code:
     Iteration 100: log likelihood = -608.23402  (not concave)
    convergence not achieved
    Trial and error: I tried not to use the option nonrtolerance because : "Never treat results obtained with the -nonrtolerance- option as final. Also, be aware that best practice in the field is to use many random draws and choose the one with the highest consistently-converged likelihood"
    --> So I increased the draws as well as the iterations to 150 and 200 (as Weiwen Ng suggested once) but it did not converge either so I reduced them and interestingly sometimes it converges sometimes not (why is that so?). I changed the seed and now it converges with this syntax:
    Code:
    gsem (flipclone_U6_SQ001  flipclone_U6_SQ003 flipclone_U5_SQ003 flipclone_U5_SQ002 flipclone_U5_SQ001 <-) if F1==2, logit lclass(C 3) difficult  startvalues(randomid, draws(50) seed(4567)) iterate(100)
    Furthermore, I fitted a 4-class model and the same problem arises. I checked when the full model starts to show problems and it is after the 8th iteration.
    Code:
    Fitting full model:
    Iteration 0:   log likelihood =  -606.8057  (not concave)
    Iteration 1:   log likelihood = -606.29641  (not concave)
    Iteration 2:   log likelihood = -606.13592  (not concave)
    Iteration 3:   log likelihood = -606.04871  (not concave)
    Iteration 4:   log likelihood = -605.96887
    Iteration 5:   log likelihood = -605.89014
    Iteration 6:   log likelihood = -605.83681
    Iteration 7:   log likelihood = -605.80531
    Iteration 8:   log likelihood = -605.79682  (not concave)
    So I increased the number of draws and it converges with this:
    Code:
    gsem (flipclone_U6_SQ001  flipclone_U6_SQ003 flipclone_U5_SQ003 flipclone_U5_SQ002 flipclone_U5_SQ001 <-) if F1==2, logit lclass(C 4)     startvalues(randomid, draws(80) seed(12345)) iterate(100)
    Then I checked the BIC and AIC output with estat lcgof (btw. also for me the LR test statistics are not working) and decide to choose the classtwo option.
    Code:
    estimates stats classone classtwo classthree classthreenew classfour
     Akaike's information criterion and Bayesian information criterion
     -----------------------------------------------------------------------------
           Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
    -------------+---------------------------------------------------------------
        classone |        212         .   -608.234      16    1248.468   1302.173
        classtwo |        212         .  -613.4098      11     1248.82   1285.742
      classthree |        212         .   -608.234      16    1248.468   1302.173
    classthree~w |        212         .   -608.234      17    1250.468    1307.53
       classfour |        212         .   -605.777      23    1257.554   1334.755
    -----------------------------------------------------------------------------
                   Note: N=Obs used in calculating BIC; see [R] BIC note.
    After obtaing estat lcgof, estat lcmean and estat lcprob, I predicted the prior and posterior and created marginsplot which looks like that (See attachment):

    My question is did I do the analysis completly wrong or was that alright? How would you improve it? After I spinned my head around this analysis I would try to utilize the latent class regression. But one step after the other.
    Remarks: I updated Stata 15.1 and my sample size for this subsample is N=220.
    I would highly appreciate some hints and/or help. Thank you very much in advance!

    Best Krissy
    Last edited by Kristina Eisfeld; 02 Jun 2020, 11:52.

  • #2
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	63.8 KB
ID:	1556555 Here the marginsplot

    Comment


    • #3
      While you provide Stata code and what appears to be output, it would be much easier to help you if we could replicate your problem.

      You didn't get a quick answer, partially because you have a very long complicated posting. I know that you would like to get detailed coaching on this process throughout, but Stata list is really not set up to do that. We respond much better to more focused questions.

      Often when you try a new estimator it's a good idea to start by running the data and program that you've seen in somebody else's posting or paper. That at least will give you some feeling for what's going on. It is also a good idea to start with a simple model and build up.

      I'm not sure exactly how this estimator works, but some estimators use some kind of simulated procedure that will be dependent on the random number generator so it is possible to get slightly different results in different runs. This should not be a problem if it converges since the point of convergence should be the same.

      Just glancing at your results, there seems to be something funny about the different models – it seems odd that you would have three models with the same log likelihood. Are these really different variables?

      Comment

      Working...
      X