Stable solutions gsem LCA Stata 15

Melvin Donaldson

Join Date: Dec 2015

Posts: 18
#1

Stable solutions gsem LCA Stata 15

15 Dec 2017, 09:47

Hello. With some digging on the forum and help, I am able to get gsem lca to converge with my data. My questions concern which estimation options are most important to produce a stable globally maximum likelihood model fit.

I have been using the startvalues(randomid, draws(#), seed(#)) option. What is a reasonable number of draws? I'm fine with letting Stata run for a while...

I have been using the emopts(iterate(#)) option. Same as above. Is emopts(iterate(20)) "better" than emopts(iterate(10))? What is actually being iterated?

What are reasonable ways to combine these options? Which is more important?

After I obtain model convergence, I would like to evaluate if I have found the globally best fit. Lanza and Rhoades (doi: 10.1007/s11121-011-0201-1 for example), advocate: "For each LCA model under consideration, multiple sets of random starting values should be specified in order to confirm that a solution does not reflect...a local...mode. If one solution yielding the maximum value of the likelihood function is found for the majority of the sets of starting values, then one can have confidence that the maximum-likelihood solution has been identified. If instead the different random starting values all lead to different modes, the model is unidentified. Model fit should be assessed only for models where the maximum likelihood has been identified."

How would you suggest evaluating this in Stata 15? Is there any way to see how many of the randomid draws lead to the same maximum likelihood solution? In addition to goodness-of-fit statistics, I have seen authors show the percentage of "seeds associated with best fit" for each model and reject class solutions that don't have 100%.

Would that mean having startvalues(randomid, draws(1), seed(12345)) and calling the model with different seeds multiple times? Is there a simple way to evaluate if the resulting solutions of the multiple model calls are the same? (besides comparing all of the parameters???)
Tags: gsem, latent class analysis, lca, Stata 15, Structural Equation Model
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

15 Dec 2017, 10:35

I have been wondering about this myself. I have some answers, but also some questions.

Originally posted by Melvin Donaldson View Post

I have been using the startvalues(randomid, draws(#), seed(#)) option. What is a reasonable number of draws? I'm fine with letting Stata run for a while...

I have been using the emopts(iterate(#)) option. Same as above. Is emopts(iterate(20)) "better" than emopts(iterate(10))? What is actually being iterated?

What are reasonable ways to combine these options? Which is more important?

The -randomid, draws(n)- option is explained in SEM intro 12 and example 52. It specifies n random draws of the observation's latent class. For each draw, Stata runs the expectation maximization (EM) algorithm a certain number of times. It will then select the set of starting latent class values where it had the highest log likelihood value after the specified number of EM iterations.

I am going to guess that 20 iterations is "better" than 10. I have no clue how many is optimal.

I am under the impression that after however many EM iterations you asked it for, Stata switches to its usual quadrature-based method of likelihood maximization. In other lit on latent class/profile methods, I am getting the impression that other software use EM all the way through. I'd be interested to know if that's correct.

After I obtain model convergence, I would like to evaluate if I have found the globally best fit. Lanza and Rhoades (doi: 10.1007/s11121-011-0201-1 for example), advocate: "For each LCA model under consideration, multiple sets of random starting values should be specified in order to confirm that a solution does not reflect...a local...mode. If one solution yielding the maximum value of the likelihood function is found for the majority of the sets of starting values, then one can have confidence that the maximum-likelihood solution has been identified. If instead the different random starting values all lead to different modes, the model is unidentified. Model fit should be assessed only for models where the maximum likelihood has been identified."

How would you suggest evaluating this in Stata 15? Is there any way to see how many of the randomid draws lead to the same maximum likelihood solution? In addition to goodness-of-fit statistics, I have seen authors show the percentage of "seeds associated with best fit" for each model and reject class solutions that don't have 100%.

Would that mean having startvalues(randomid, draws(1), seed(12345)) and calling the model with different seeds multiple times? Is there a simple way to evaluate if the resulting solutions of the multiple model calls are the same? (besides comparing all of the parameters???)[/QUOTE]

I believe that to implement this, you would indeed run several models with different starting values. For replicability, then sure, you use specific random seeds. One simple way to evaluate if the resulting model solutions are the same is to simply compare the values of the log likelihood, which should be stored in e(ll). If the likelihood function has a bunch of local maxima, you can bet that you will have a bunch of different values of e(ll).

That said, starting values of what? In SEM example 52, which is a latent profile analysis replicating Masyn (2013), I am pretty sure that Stata is guessing randomly as to each observation's latent class, then going from there. Masyn's article (cited in that example) seems to be saying we should be using random starting parameter values, which I guess means the means, variances, etc of each indicator in each latent class. Did I read her wrongly? Or did I read correct, but are both techniques (random latent class starting values vs random starting parameter values) equivalent, or are both techniques not equivalent but either are acceptable? Any insight would be appreciated.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

15 Dec 2017, 15:50

A couple of points of information. First, I read through Penn State's LCA Stata plugin manual. It appears that they do allow users to request random starting values for the rho values, i.e. the item response parameters. This is distinct from random starts for what they call the gamma parameters, which are starting values for each observation's latent class membership.

I believe that Stata's option -randomid- requests that each observation gets randomly assigned to one latent class, and then Stata starts estimating all parameters from there. The -randompr- option requests that each observation gets a random vector of latent class membership probabilities, and this appears to be consistent with the Penn State LCA plugin's option for randomly assigned gamma parameters. Lastly, the -jitter- option will "randomly [perturb] the values from a Gaussian approximation to each outcome", and I'm pretty sure this is consistent with the LCA plugin's option for random rho parameteres.

NB: the jitter option has a suboption that allow you to specify how much to perturb the coefficients, intercepts, and scale parameters, with a default of 1. There is a second suboption to specify how much to perturb the variances for the outcomes, also with a default of 1. When you're using mixed indicator types, the widely differing scales might or might not be a problem - I am personally not clear on this.

The following code is based on SEM example 52. In the example, we fit 2, 3, 4, and 5 class latent profile models (with all errors uncorrelated, and all parameter variances constrained equal across classes). The 4- and 5-class estimates requested random starting class ID values. I re-wrote the code to request randomly jittered item response parameters across classes, and I requested 20 runs. In the code, the matrix a shows the final log likelihoods. I started this with the random number seed 15 to reproduce the results offered in the example.

Code:

use http://www.stata-press.com/data/r15/gsem_lca2 gsem (glucose insulin sspg <- _cons), lclass(C 5) /// startvalues(jitter, draws(5) seed(15)) emopts(iter(20)) estat lcgof di e(ll) matrix a = e(ll) estimates store c5inv_null forvalues seed = 16/35 { gsem (glucose insulin sspg <- _cons), lclass(C 5) /// startvalues(jitter, draws(1) seed(`seed')) emopts(iter(20)) matrix a = a \ e(ll) estimates store c5inv_`seed' } matrix list a estat lcmean, nose estimates restore c5inv_null estat lcmean, nose

If you compare the results of the first model run (estimates named c5inv_null), you'll see that the estimates, AIC, BIC, and log likelihood are exactly equal to the results in the manual, despite requesting randomly jittered start values instead of random latent class starting values.

The next section of the code makes Stata randomly conjure one set of random starting parameters with the specified seed, then re-fit the model 20 times. It stores the final log likelihoods in the matrix called a. When you list matrix a at the end, you should see that a lot of the final log likelihood values are equal to -1619.0104, and a few are -1578.207. Hence, there is at least one other local maxima, and in fact most of our runs ended up converging on that local maxima. However, it would appear that in this case, the request for random starting latent class probabilities also saved us from going to that other local maxima. And you can see from the -estat lcmean- commands on the null model (highest log likelihood) and on the last model (seed = 35, finished at the local maxima) that the estimated class means are quite different.

It's worth noting that Masyn (2013; example 52 is based on this very article) recommended at least 50 - 100 runs with random starting values. Here, it does look like Stata wound up at the global maximum, so I think I would rest easy if I got this result.

Per a link to an earlier discussion on maximization in LCA, you may benefit by adding the -nonrtolerance- option to the gsem command to aid convergence with nonconcave likelihoods. This might help if your estimation log is not converging fast. Normally, if I understand things correctly, Stata requires both the relative change in the log likelihood after an iteration to be less than a certain amount (default is 1e-7, i.e. 1x 10^-7), and the gradient vector (i.e. the gradient, or first derivative, of the log likelihood function) to be close to 0. The link suggests that the -nonrtolerance- option makes Stata ignore the second criteria, and that it can allow gsem to iterate to a solution "even if that solution is in a non-concave region of the parameter space." However, I suspect that this won't show you that the model didn't converge. This would be a big problem if the likelihood is flat (your log likelihood gets to a point where it is barely changing at all but Stata keeps iterating, often a sign of non-identifiability), rather than just multimodal.

Last edited by Weiwen Ng; 15 Dec 2017, 16:32.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Melvin Donaldson

Join Date: Dec 2015

Posts: 18
#4

15 Dec 2017, 16:47

That is very helpful and makes a lot of sense. I have spent the day exploring the Penn State LCA Plugin, running the same analysis with my analytic dataset. I don't have problems with model convergence thanks to that other post, with the nonrtolerance option.

With my data, in a 5-class solution, I can reach the same model with Stata gsem and the Penn State Plugin (LL within a factor of 1e-7, seems like within tolerance then?). The class frequencies are very similar too. I am comfortable that they are pretty much the same solution. However, with the Penn State plugin, I can easily see that only 40% of the seed were associated with the best fit model. I wouldn't have a way in gsem to see this. In fact, some authors have used the 40% as a grounds to reject the model in favor of a "more stable" solution. I've tried with jitter and randompr. I guess I wouldn't have any sense of the problem with stability using gsem vs. Penn State.

A separate problem is the AIC/BIC that gsem returns is way off what the Penn State plugin returns.

Stata

Code:

ll: -8067.3082 N: 1376 df: 71 AIC: 16276.62 BIC: 16647.73

Penn State

Code:

ll: -8067.3081 N: 1376 df: 16309 AIC: 2207.2285 BIC: 2594.0218

So something goofy is going on there. Bug?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

15 Dec 2017, 18:33

Originally posted by Melvin Donaldson View Post

... In fact, some authors have used the 40% as a grounds to reject the model in favor of a "more stable" solution. I've tried with jitter and randompr. I guess I wouldn't have any sense of the problem with stability using gsem vs. Penn State.

A separate problem is the AIC/BIC that gsem returns is way off what the Penn State plugin returns.

Then I guess the code I wrote will give you that information (i.e. how may iterations reached the maximum LL value). It may be worth adding as a postestimation option to GSEM. There are some other fit statistics that Stata doesn't present that appear to be emerging as best practices. I've written elsewhere about the bootstrapped likelihood ratio test for k vs k-1 classes, which I've seen mentioned in multiple articles as a best practice. Masyn (2013) also says that presenting the condition number of the information matrix estimated when maximizing the likelihood can help identify if a model is empirically unidentified (and I suspect I have a model that is bordering on that case); I can see no easy way to extract the required information in Stata without programming the likelihood function myself (which is way beyond my capabilities). This also seems to be desirable to know.

If authors were genuinely running 50-100 attempts, and 40% of their solutions converged to a higher log likelihood than the remaining 60%, I would be a little surprised that they would reject the solution with the higher log likelihood. 40% is a considerable number of attempts, and as we saw above, it is looking like a majority of random start values will hit a lower maxima on the likelihood function with the example given. If it were 4% of attempts, I'd reconsider.

Originally posted by Melvin Donaldson View Post

Stata

Code:

ll: -8067.3082 N: 1376 df: 71 AIC: 16276.62 BIC: 16647.73

Penn State

Code:

ll: -8067.3081 N: 1376 df: 16309 AIC: 2207.2285 BIC: 2594.0218

So something goofy is going on there. Bug?

Check the difference in the degrees of freedom. PSU has a df of 16,309. That is a lot of df! Someone is defining something very differently from someone else. Or there's a bug. In this case, it's something like 1 df per parameter estimated, I thought. There is no way the model is estimating 16k+ parameters, surely.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Melvin Donaldson

Join Date: Dec 2015

Posts: 18
#6

15 Dec 2017, 20:29

df: The Penn State people say "In models with no covariates, this is the number of cells in the contingency table, minus the number of parameters that are freely estimated, minus 1."
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#7

16 Dec 2017, 13:18

Originally posted by Melvin Donaldson View Post

df: The Penn State people say "In models with no covariates, this is the number of cells in the contingency table, minus the number of parameters that are freely estimated, minus 1."

The square root of 16,309 is 127.7. Does your contingency table have roughly that many cells? It's hard to see how this could happen if the items were binary. If the items are ordinal, then it's another story.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Melvin Donaldson

Join Date: Dec 2015

Posts: 18
#8

16 Dec 2017, 15:50

For this model, the items are ordinal. There are 5 levels of 14 variables. For 5 classes that's hundreds of parameters. So which BIC should I believe?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#9

22 Dec 2017, 17:43

Originally posted by Weiwen Ng View Post

...
It's worth noting that Masyn (2013; example 52 is based on this very article) recommended at least 50 - 100 runs with random starting values. Here, it does look like Stata wound up at the global maximum, so I think I would rest easy if I got this result.

Per a link to an earlier discussion on maximization in LCA, you may benefit by adding the -nonrtolerance- option to the gsem command to aid convergence with nonconcave likelihoods. This might help if your estimation log is not converging fast. Normally, if I understand things correctly, Stata requires both the relative change in the log likelihood after an iteration to be less than a certain amount (default is 1e-7, i.e. 1x 10^-7), and the gradient vector (i.e. the gradient, or first derivative, of the log likelihood function) to be close to 0. The link suggests that the -nonrtolerance- option makes Stata ignore the second criteria, and that it can allow gsem to iterate to a solution "even if that solution is in a non-concave region of the parameter space." However, I suspect that this won't show you that the model didn't converge. This would be a big problem if the likelihood is flat (your log likelihood gets to a point where it is barely changing at all but Stata keeps iterating, often a sign of non-identifiability), rather than just multimodal.

I've been doing some exploring and I would like to update my prior post.

I am an applied statistician, and I don't have the theoretical knowledge that some have. Nonetheless, when Stata maximizes a likelihood, it requires that the first and second derivatives of the estimated (matrix of) parameters with respect to the likelihood function be zero. As I understand it, requiring the first and second derivatives of the current estimate of the parameters be zero before declaring victory guarantees that you've hit a global maximum.

The -nonrtolerance- option appears to ask Stata to disregard the (estimated) second derivative matrix. If you turn this off with the -nonrtolerance- option, you can no longer guarantee that you are at a global maximum. However, Stata will still declare convergence (as in, the scalar e(converged) = 1), so I was wrong there. However, with the ill-behaved likelihoods that are typical of many mixture models, you also converge much faster.

It appears to me, then, that if you invoke -nonrtolerance-, you should also make many estimations from widely varied starting parameters. The code I posted should do that, although the advice I am seeing is that you want something like 50 to 100 attempts (I believe for 3 to 6 classes or thereabouts - more classes may require more attempts). You want to be able to consistently hit the highest log likelihood. If you have a big sample, your smallest class still has a decent sample size, and you hit the apparent global maximum 20% of the time, I am going to hazard a guess that you are OK.

The question is, how do you get Stata to vary the parameters widely enough? The SEM examples propose that we use a random assignment to a particular class ID. I experimented with this until I found the -jitter- option. I have actually not investigated using random class assignment in the context of running the model a large number of times. It may work just as well; I will report back.

In the -jitter- option, the first optional number refers to the magnitude for perturbing the coefficients, intercepts, cutpoints, and scale parameters. The default is 1. I believe that these are all logit cutpoints, so you are working on the log odds scale. I have settled on using values of 2 or 3, which is quite a significant perturbation (I think). The second optional number refers to the magnitude for perturbing variances of Gaussian outcomes. Here, you have to use values that make sense in the context of whatever Gaussian indicators you used. In the particular example, the variance for glucose, insulin, and SSPG are about 1004, 144, and 100 respectively. You do not appear to be able to select how much to perturb each indicator; it looks like you have to choose a scale that makes sense for all the indicators (else you convert them all to Z-scores and perturb them by 1 or 2).

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Melvin Donaldson

Join Date: Dec 2015

Posts: 18
#10

28 Dec 2017, 14:16

This is very interesting. I must admit this is totally out of my wheelhouse but I feel much more confident using these methods understanding this. I think invoking the nonortolerance option to control the optimization parameters is a good example of when understanding what it actually does is important.

To summarize your discussion so far (for my comprehension), LCA is a type of finite mixture modeling and the likelihood functions of mixture models are frequently poorly behaved. There may be many local minima. The Stata gsem stepping algorithm can get stuck in these local minima so it can be helpful to allow Stata to converge in non-concave regions of the second-derivative of the log likelihood. Since that may be a local minimum instead of the global minimum, it is useful to repeat the maximization many times with random starting values and let Stata use the best solution. As long as many of those random starting values converge to the same log likelihood, we can be reasonably confident that Stata indeed found the global minimum, even if it occurred in a non-concave region of the second derivative of the likelihood function.

I wrote a little mata code to count the number of tested starting values that converge to the same log likelihood (within a tolerance of 1e-7, which seems consistent with the algorithm).

Code:

mata
A = st_matrix('a') successes = sum(A :/ max(A) :- 1 :< 1e-7) st_numscalar("successes", successes)
end di successes
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#11

28 Dec 2017, 17:23

Originally posted by Melvin Donaldson View Post

To summarize your discussion so far (for my comprehension), LCA is a type of finite mixture modeling and the likelihood functions of mixture models are frequently poorly behaved. There may be many local minima. The Stata gsem stepping algorithm can get stuck in these local minima so it can be helpful to allow Stata to converge in non-concave regions of the second-derivative of the log likelihood. Since that may be a local minimum instead of the global minimum, it is useful to repeat the maximization many times with random starting values and let Stata use the best solution. As long as many of those random starting values converge to the same log likelihood, we can be reasonably confident that Stata indeed found the global minimum, even if it occurred in a non-concave region of the second derivative of the likelihood function.

Yes, that's what I intended to convey. I hope it's accurate!

I will add that in some instances, I've fit a bunch of models with the -nonrtolerance- option, then I've retrieved the estimate, saved the parameters to a matrix, then successfully re-fit the model with the gradient tolerance back to normal using the saved parameters as a start value. I have also had a few instances where the model failed to fit in this case. I'm going to guess that the first case is good. I don't know what, exactly to think about any solutions fitting the second case - should I discard them entirely?

Originally posted by Melvin Donaldson View Post

I wrote a little mata code to count the number of tested starting values that converge to the same log likelihood (within a tolerance of 1e-7, which seems consistent with the algorithm).

Code:

mata
A = st_matrix('a') successes = sum(A :/ max(A) :- 1 :< 1e-7) st_numscalar("successes", successes)
end di successes

Yep, and you can actually also get Mata to sort the matrix. That said, you can also do all that in Excel. For what it's worth, here's the code I came up with, which should put all the converged iterations before the non-converged ones, then sort each set highest to lowest. It looks like Mata will sort positive values in ascending order, and negative values in descending order, which is why the code may look a bit weird - it multiplies the log likelihood (and the indicator for convergence) by -1, which will definitely blow up if your log likelihood is positive (but this shouldn't happen often).

Code:

forvalues seed = 16/35 { gsem (glucose insulin sspg <- _cons), lclass(C 5) /// startvalues(jitter, draws(1) seed(`seed')) emopts(iter(20)) matrix a = (nullmat(a) \ e(ll), e(converged), e(ic)) estimates store c5inv_`seed' } matrix colnames a = LL converged iterations mata: b = st_matrix("a") mata: b = b, b[.,1] :* -1, b[.,2] :* -1 mata: b = sort(b,(5,4)) mata: st_replacematrix("a",b[.,1..3]) matrix list a

Last, I missed that you said you had 14 variables that were 5-point scales. There should be 5^14 cells, which works out to over 6 billion? At this point, I am fundamentally out of my league, and I unfortunately have no advice on which df to trust.

Last edited by Weiwen Ng; 28 Dec 2017, 17:36.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Mengmeng Li

Join Date: Sep 2016

Posts: 62
#12

19 Feb 2019, 11:52

Originally posted by Weiwen Ng View Post

I have been wondering about this myself. I have some answers, but also some questions.

The -randomid, draws(n)- option is explained in SEM intro 12 and example 52. It specifies n random draws of the observation's latent class. For each draw, Stata runs the expectation maximization (EM) algorithm a certain number of times. It will then select the set of starting latent class values where it had the highest log likelihood value after the specified number of EM iterations.

I am going to guess that 20 iterations is "better" than 10. I have no clue how many is optimal.

I am under the impression that after however many EM iterations you asked it for, Stata switches to its usual quadrature-based method of likelihood maximization. In other lit on latent class/profile methods, I am getting the impression that other software use EM all the way through. I'd be interested to know if that's correct.

After I obtain model convergence, I would like to evaluate if I have found the globally best fit. Lanza and Rhoades (doi: 10.1007/s11121-011-0201-1 for example), advocate: "For each LCA model under consideration, multiple sets of random starting values should be specified in order to confirm that a solution does not reflect...a local...mode. If one solution yielding the maximum value of the likelihood function is found for the majority of the sets of starting values, then one can have confidence that the maximum-likelihood solution has been identified. If instead the different random starting values all lead to different modes, the model is unidentified. Model fit should be assessed only for models where the maximum likelihood has been identified."

How would you suggest evaluating this in Stata 15? Is there any way to see how many of the randomid draws lead to the same maximum likelihood solution? In addition to goodness-of-fit statistics, I have seen authors show the percentage of "seeds associated with best fit" for each model and reject class solutions that don't have 100%.

Would that mean having startvalues(randomid, draws(1), seed(12345)) and calling the model with different seeds multiple times? Is there a simple way to evaluate if the resulting solutions of the multiple model calls are the same? (besides comparing all of the parameters???)

I believe that to implement this, you would indeed run several models with different starting values. For replicability, then sure, you use specific random seeds. One simple way to evaluate if the resulting model solutions are the same is to simply compare the values of the log likelihood, which should be stored in e(ll). If the likelihood function has a bunch of local maxima, you can bet that you will have a bunch of different values of e(ll).

That said, starting values of what? In SEM example 52, which is a latent profile analysis replicating Masyn (2013), I am pretty sure that Stata is guessing randomly as to each observation's latent class, then going from there. Masyn's article (cited in that example) seems to be saying we should be using random starting parameter values, which I guess means the means, variances, etc of each indicator in each latent class. Did I read her wrongly? Or did I read correct, but are both techniques (random latent class starting values vs random starting parameter values) equivalent, or are both techniques not equivalent but either are acceptable? Any insight would be appreciated.[/QUOTE]

Hi Weiwen,

I am currently working on projects with LCA analysis. Your posts are very helpful in clarifying my understanding about how to proceed with Stata for stable/reliable results. Thanks so much for sharing.

But I am still not sure how to specify different start values to get global maximum likelihood. Take the codes "startvalues(randomid, draws(1), seed(12345))" as an example. Will the number of draws automate the process? For example, if I specify draws(5), does this ask Stata to run with 5 different start values and get back with the solution yields the maximum likelihood?

Thank you so much.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#13

19 Feb 2019, 13:56

Originally posted by Mengmeng Li View Post

...

Hi Weiwen,

I am currently working on projects with LCA analysis. Your posts are very helpful in clarifying my understanding about how to proceed with Stata for stable/reliable results. Thanks so much for sharing.

But I am still not sure how to specify different start values to get global maximum likelihood. Take the codes "startvalues(randomid, draws(1), seed(12345))" as an example. Will the number of draws automate the process? For example, if I specify draws(5), does this ask Stata to run with 5 different start values and get back with the solution yields the maximum likelihood?

Thank you so much.

Mengmeng,

If you specify draws(k), Stata will draw a set of random start values k times. Each time, it will run the EM algorithm for the default 20 iterations (unless you specified fewer or more iterations), then pause and go to the next set of start values. That is, it only runs its EM algorithm. In practice, I don't think that most models will converge after the EM iterations alone. After k sets of random draws, it will choose the (interim) solution with the highest log-likelihood value, then continue its normal maximization procedure from there. So, the answer to your question is yes, but not 100% yes.

I think that with a complex model (e.g. 5 or more latent classes), it may be better to set k to as many as 100 draws. I believe that MPlus and the R package polca both default to 100 draws (when you invoke the random draws option), and if you read Masyn's chapter, she does recommend "a minimum of 50-100 sets of extensively, randomly varied starting values".

The code I wrote above saves the final parameter estimates and the log likelihoods of all the 100 random draws. This enables you to go back and inspect the results, and to verify how many random draws end up at the highest log likelihood. I believe that MPlus, polca, and the Penn State University Stata plugin enable this. Stata will not save the results if you just use the draws(k) option. As described above, Stata won't fully fit the models models it tries under the random draws option (it only runs the EM algorithm, which rarely results in full convergence). I do not know how important it is to inspect how many sets of start values converged at the highest log likelihood. Masyn clearly recommends it (see page 565 of her chapter).

Side note: you didn't ask, but when you say startvalues(randomid, ...), I think that Stata assigns each observation to a random latent class, and I think that its start values are defined by the randomly-generated latent classes. You could also specify the jitter option - I think this obtains start values under the default option, then adds random noise specified by the jitter option - or the randompr option (assigns each observation to each class with a randomly chosen probability). I do not know which option produces higher variance in the start values.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Mengmeng Li

Join Date: Sep 2016

Posts: 62
#14

23 Feb 2019, 12:26

Hi Weiwen,

Are these codes the ones that you referred to save e(ll) of 100 draws:
forvalues seed = 16/35 { gsem (glucose insulin sspg <- _cons), lclass(C 5) /// startvalues(jitter, draws(1) seed(`seed')) emopts(iter(20)) matrix a = a \ e(ll) estimates store c5inv_`seed' } matrix list a
To me it saves 20 e(ll), rather than 100. Did I miss something here? Best, Mengmeng
Comment
Mengmeng Li

Join Date: Sep 2016

Posts: 62
#15

23 Feb 2019, 12:41

Originally posted by Weiwen Ng View Post

Mengmeng,

If you specify draws(k), Stata will draw a set of random start values k times. Each time, it will run the EM algorithm for the default 20 iterations (unless you specified fewer or more iterations), then pause and go to the next set of start values. That is, it only runs its EM algorithm. In practice, I don't think that most models will converge after the EM iterations alone. After k sets of random draws, it will choose the (interim) solution with the highest log-likelihood value, then continue its normal maximization procedure from there. So, the answer to your question is yes, but not 100% yes.

I think that with a complex model (e.g. 5 or more latent classes), it may be better to set k to as many as 100 draws. I believe that MPlus and the R package polca both default to 100 draws (when you invoke the random draws option), and if you read Masyn's chapter, she does recommend "a minimum of 50-100 sets of extensively, randomly varied starting values".

The code I wrote above saves the final parameter estimates and the log likelihoods of all the 100 random draws. This enables you to go back and inspect the results, and to verify how many random draws end up at the highest log likelihood. I believe that MPlus, polca, and the Penn State University Stata plugin enable this. Stata will not save the results if you just use the draws(k) option. As described above, Stata won't fully fit the models models it tries under the random draws option (it only runs the EM algorithm, which rarely results in full convergence). I do not know how important it is to inspect how many sets of start values converged at the highest log likelihood. Masyn clearly recommends it (see page 565 of her chapter).

Side note: you didn't ask, but when you say startvalues(randomid, ...), I think that Stata assigns each observation to a random latent class, and I think that its start values are defined by the randomly-generated latent classes. You could also specify the jitter option - I think this obtains start values under the default option, then adds random noise specified by the jitter option - or the randompr option (assigns each observation to each class with a randomly chosen probability). I do not know which option produces higher variance in the start values.

I just tried the codes, and seems if I specify 100 different seed values, like the following:

forvalue seed = 12345/12446 {
gsem (glucose insulin sspg <- _cons), lclass(C 5) /// startvalues(jitter, draws(100) seed(`seed')) emopts(iter(20)
matrix a = a \ e(ll)
estimates store c5_`seed'
}
matrix list a

The matrix will pull out all 100 e(ll). But it seems 100 e(ll) are from 100 different seed values, but necessarily 100 sets of start values. Am I correct? So I am still confused how to write codes to store all e(ll) from all random draws.

Last edited by Mengmeng Li; 23 Feb 2019, 13:04.
Comment

Announcement

Stable solutions gsem LCA Stata 15

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment