Stable solutions gsem LCA Stata 15

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#16

24 Feb 2019, 11:34

Originally posted by Mengmeng Li View Post

I just tried the codes, and seems if I specify 100 different seed values, like the following:

forvalue seed = 12345/12446 {
gsem (glucose insulin sspg <- _cons), lclass(C 5) /// startvalues(jitter, draws(100) seed(`seed')) emopts(iter(20)
matrix a = a \ e(ll)
estimates store c5_`seed'
}
matrix list a

The matrix will pull out all 100 e(ll). But it seems 100 e(ll) are from 100 different seed values, but necessarily 100 sets of start values. Am I correct? So I am still confused how to write codes to store all e(ll) from all random draws.

Your last sentence is correct.

To store the 100 log-likelihoods in the matrix a, you need the loop to cover 100 random seeds, e.g.

Code:

forvalue seed = 1/100 { gsem (glucose insulin sspg <- _cons), lclass(C 5) /// startvalues(jitter, draws(1) seed(`seed')) emopts(iter(20) matrix a = a \ e(ll) estimates store c5_`seed' } matrix list a

That way, in each run through the loop, the seed gets passed to the gsem command, which makes one draw with that random number seed.

The way you wrote your command, you pass a random number seed through to the command 100 times, but each time, it makes also 100 random draws (and then it runs 20 EM iterations, then goes back, then when done it will use Newton Raphson from the draw with the highest log likelihood). So, your command actually runs 100^2 start values, but only stores 100.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Mengmeng Li

Join Date: Sep 2016

Posts: 62
#17

27 Feb 2019, 09:02

Originally posted by Weiwen Ng View Post

Your last sentence is correct.

To store the 100 log-likelihoods in the matrix a, you need the loop to cover 100 random seeds, e.g.

Code:

forvalue seed = 1/100 { gsem (glucose insulin sspg <- _cons), lclass(C 5) /// startvalues(jitter, draws(1) seed(`seed')) emopts(iter(20) matrix a = a \ e(ll) estimates store c5_`seed' } matrix list a

That way, in each run through the loop, the seed gets passed to the gsem command, which makes one draw with that random number seed.

The way you wrote your command, you pass a random number seed through to the command 100 times, but each time, it makes also 100 random draws (and then it runs 20 EM iterations, then goes back, then when done it will use Newton Raphson from the draw with the highest log likelihood). So, your command actually runs 100^2 start values, but only stores 100.

Hi Weiwen,

You are correct. My codes ran through 100^2 start values but only return 100 maximum likelihood values, each one of them is of the highest value from all 100 random draws associated with each seed value fed into the process (loop). Seems there is not a possible way to get the log likelihood value for each random draw? So if I specify draws(100) for each seed value I will also get 100 log likelihood values?

I have another question: Do you know how to loose local independency assumption when running LCA in Stata? This assumption is very important to specify a correct model but to my knowledge, there is not a way in Stata to modify the model to allow for local dependency but still get valid results? I read posts here and there on the internet, seems LatentGOLD has the function, but haven't seen this issue being discussed among Stata users. Would you like to share your experience with this aspect?

Thanks so much.

Kindly,
Mengmeng
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#18

27 Feb 2019, 10:07

Originally posted by Mengmeng Li View Post

Hi Weiwen,

You are correct. My codes ran through 100^2 start values but only return 100 maximum likelihood values, each one of them is of the highest value from all 100 random draws associated with each seed value fed into the process (loop). Seems there is not a possible way to get the log likelihood value for each random draw? So if I specify draws(100) for each seed value I will also get 100 log likelihood values?

The thing is, you are getting new random draws if you change the seed in the loop OR if you increase the number in the draws(n) option. If you must save the random draws, then use the former option with draws(1). Just change the start and end seed numbers to whatever will enable you to get the number of draws you want.

Originally posted by Mengmeng Li View Post

I have another question: Do you know how to loose local independency assumption when running LCA in Stata? This assumption is very important to specify a correct model but to my knowledge, there is not a way in Stata to modify the model to allow for local dependency but still get valid results? I read posts here and there on the internet, seems LatentGOLD has the function, but haven't seen this issue being discussed among Stata users. Would you like to share your experience with this aspect?

Thanks so much.

Kindly,
Mengmeng

A similar question was asked answered. This one was about testing for local dependence - no automated way I know of in Stata, but you can do an observed vs expected test manually for each combo of binary indicators. I do not know of any way to relax the local independence assumption in Stata. I don't know of any implementations of this in other software; if you can point us to one, then there's a non-zero chance someone here could say how to program it in Stata.

If you are operating with continuous indicators, then fortunately the answer is simple - use the covstructure options to relax that assumption (outlined in the other link), then you could compare BIC between models with the same number of classes but different structure.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Mengmeng Li

Join Date: Sep 2016

Posts: 62
#19

27 Feb 2019, 10:30

The thing is, you are getting new random draws if you change the seed in the loop OR if you increase the number in the draws(n) option. If you must save the random draws, then use the former option with draws(1). Just change the start and end seed numbers to whatever will enable you to get the number of draws you want.

Thank you for your further explanation. This makes sense to me now. :-)

I don't know of any implementations of this in other software; if you can point us to one, then there's a non-zero chance someone here could say how to program it in Stata.

This is the source I referred to: https://www.john-uebersax.com/stat/condep.htm. It has been said "The new Latent GOLD program (Vermunt & Magidson, 2000) has special features for handling conditional dependence in LCMs". And at the latter part of the document they introduced three ways to handle local dependency. I think I understand the first method as it is used for simplest version of assumption violation (only minimal pairs of items are correlated with each other within class membership). But in my case I have multiple pairs of items that caused violation. I don't quite understand the second and third methods discussed.

If you are operating with continuous indicators, then fortunately the answer is simple - use the covstructure options to relax that assumption (outlined in the other link), then you could compare BIC between models with the same number of classes but different structure

My indicators are binary, unfortunately.

Also, for model selection Stata only provides AIC and BIC. According to literature in the field they all recommend bootstrapped likelihood ratio test (BLRT). Does this suggest model selection criteria by Stata is limited and yield a lower chance of finding the best model?

Thank you,
Mengmeng
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#20

28 Feb 2019, 10:10

Originally posted by Mengmeng Li View Post

...

This is the source I referred to: https://www.john-uebersax.com/stat/condep.htm. It has been said "The new Latent GOLD program (Vermunt & Magidson, 2000) has special features for handling conditional dependence in LCMs". And at the latter part of the document they introduced three ways to handle local dependency. I think I understand the first method as it is used for simplest version of assumption violation (only minimal pairs of items are correlated with each other within class membership). But in my case I have multiple pairs of items that caused violation. I don't quite understand the second and third methods discussed.

Looks like the first way to handle conditional dependence (after detecting it in testing I described above) was to create a joint item out of a pair of conditionally dependent items. I can see that as being practical if you have only one pair of conditionally dependent items. The more pairs, the more unwieldy this becomes.

The second solution involves using the conditionally dependent items to model another latent class, then include that second latent class as an indicator of the higher-level latent class (i.e. the one you're interested in). This should be doable, as Stata's syntax does permit you to specify multiple latent classes. Identification and maximization are probably going to be a more major issue.

The last solution was something about a loglinear specification of the latent class model. This one went over my head entirely.

Originally posted by Mengmeng Li View Post

...

Also, for model selection Stata only provides AIC and BIC. According to literature in the field they all recommend bootstrapped likelihood ratio test (BLRT). Does this suggest model selection criteria by Stata is limited and yield a lower chance of finding the best model?
...

Re-reading the 2007 simulation study comparing two LR tests and several information criterion, the conclusion was that the BLRT was the best single test in the simulated settings, followed by BIC. However, I haven't searched for subsequent research on this topic. So, yes, Stata's lack of the BLRT may be a limitation. I'm not sure how big an issue it is.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment