Hello. With some digging on the forum and help, I am able to get gsem lca to converge with my data. My questions concern which estimation options are most important to produce a stable globally maximum likelihood model fit.
I have been using the startvalues(randomid, draws(#), seed(#)) option. What is a reasonable number of draws? I'm fine with letting Stata run for a while...
I have been using the emopts(iterate(#)) option. Same as above. Is emopts(iterate(20)) "better" than emopts(iterate(10))? What is actually being iterated?
What are reasonable ways to combine these options? Which is more important?
After I obtain model convergence, I would like to evaluate if I have found the globally best fit. Lanza and Rhoades (doi: 10.1007/s11121-011-0201-1 for example), advocate: "For each LCA model under consideration, multiple sets of random starting values should be specified in order to confirm that a solution does not reflect...a local...mode. If one solution yielding the maximum value of the likelihood function is found for the majority of the sets of starting values, then one can have confidence that the maximum-likelihood solution has been identified. If instead the different random starting values all lead to different modes, the model is unidentified. Model fit should be assessed only for models where the maximum likelihood has been identified."
How would you suggest evaluating this in Stata 15? Is there any way to see how many of the randomid draws lead to the same maximum likelihood solution? In addition to goodness-of-fit statistics, I have seen authors show the percentage of "seeds associated with best fit" for each model and reject class solutions that don't have 100%.
Would that mean having startvalues(randomid, draws(1), seed(12345)) and calling the model with different seeds multiple times? Is there a simple way to evaluate if the resulting solutions of the multiple model calls are the same? (besides comparing all of the parameters???)
I have been using the startvalues(randomid, draws(#), seed(#)) option. What is a reasonable number of draws? I'm fine with letting Stata run for a while...
I have been using the emopts(iterate(#)) option. Same as above. Is emopts(iterate(20)) "better" than emopts(iterate(10))? What is actually being iterated?
What are reasonable ways to combine these options? Which is more important?
After I obtain model convergence, I would like to evaluate if I have found the globally best fit. Lanza and Rhoades (doi: 10.1007/s11121-011-0201-1 for example), advocate: "For each LCA model under consideration, multiple sets of random starting values should be specified in order to confirm that a solution does not reflect...a local...mode. If one solution yielding the maximum value of the likelihood function is found for the majority of the sets of starting values, then one can have confidence that the maximum-likelihood solution has been identified. If instead the different random starting values all lead to different modes, the model is unidentified. Model fit should be assessed only for models where the maximum likelihood has been identified."
How would you suggest evaluating this in Stata 15? Is there any way to see how many of the randomid draws lead to the same maximum likelihood solution? In addition to goodness-of-fit statistics, I have seen authors show the percentage of "seeds associated with best fit" for each model and reject class solutions that don't have 100%.
Would that mean having startvalues(randomid, draws(1), seed(12345)) and calling the model with different seeds multiple times? Is there a simple way to evaluate if the resulting solutions of the multiple model calls are the same? (besides comparing all of the parameters???)
Comment