Lo–Mendell–Rubin Adjusted Likelihood Ratio Test and wald test

Liyuwork Dana

Join Date: Apr 2018

Posts: 27
#1

Lo–Mendell–Rubin Adjusted Likelihood Ratio Test and wald test

10 Apr 2018, 20:27

Hello everyone,

Can you, please, advice me how to conduct Lo–Mendell–Rubin Adjusted Likelihood Ratio Test in latent profile analysis (LPA) to compare a k model with the preceding one and Wald test to compare the means in the identified profile classes (to see the difference between identified profile classes?

Appreciate you comments
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

11 Apr 2018, 06:48

For other readers' information, in latent class and latent profile models, we are interested in testing which number of classes fit the data the best. Stata provides the BIC. This is fine, but my understanding is that there is still a lot of theoretical debate on how to test for the appropriate number of classes. You might think, why not use a likelihood ratio test? People far smarter than I have shown that an LR test for k vs k-1 class does not have a chi-2 distributed test statistic because of something about boundary of the parameter space blah blah blah.

The Lo-Mendell-Rubin LR test is an analytical correction to the test statistic derived by Yungtai Lo, Nancy Mendell, and Donald Rubin, and published in Biometrika (2001). In another source I cannot recall, I think someone challenged the theoretical work, but Nylund, Asparouhov, and Bength Muthen (2007) reviewed the literature and performed a simulation study, and showed that the LMR test was adequate (but the bootstrapped LR test performed better - only thing is, Stata cannot do the BLRT - will explain later).

Lo et al's paper (my first link) is very dense, but after reading that and a few other sources, I believe that their equation 15 describes the adjustment to the LR test statistic. The terms p and q in their equation are described on page 769, under equation 1.

If that reading is correct, then I believe that this ado file I wrote for personal use will calculate the LMR test statistic. I hope the community will point out any errors if they notice them! Liuwork, you will have to save this in your personal ado file directory (on my machine, C:\ado\personal - this will vary depending on where you installed Stata).

Code:

program lmrtest, rclass version 15.0 args estimates_null estimates_alternative tempname n ll_null ll_alt parms_null parms_alt a classes_null classes_alt lr_test_stat modlr_test_stat p q quietly { estimates restore `1' scalar `n' = e(N) scalar `ll_null' = e(ll) scalar `parms_null' = e(k) matrix `a' = e(lclass_k_levels) scalar `classes_null' = `a'[1,1] estimates restore `2' scalar `ll_alt' = e(ll) scalar `parms_alt' = e(k) matrix `a' = e(lclass_k_levels) scalar `classes_alt' = `a'[1,1] scalar `p' = (3 * `classes_alt' - 1) scalar `q' = (3 * `classes_null' - 1) scalar `lr_test_stat' = -2 * (`ll_null' - `ll_alt') scalar `modlr_test_stat' = `lr_test_stat' / (1 + ((`p' - `q') * ln(`n')) ^ -1) } return scalar lmrt_p = chi2tail(`parms_alt' - `parms_null',`modlr_test_stat') end

The bootstrap LR test cannot be done by Stata, because Stata does not permit bootstrapping in gsem with categorical latent variables. Moreover, you could run into a situation where some bootstrap runs fail to converge; what do you do then? I'm not sure how other programs handle that situation. Because latent class models have frequent convergence problems, especially with higher numbers of classes, this is going to happen. I've expressed to Stata that I would like them to implement the BLRT, because it seems like it is the test that is considered most highly by practitioners, but it is not trivial to implement, and the BIC is adequate - the Nylund et al paper said that "we conclude that the BIC is superior to all other ICs", although more so for latent profile analyses than latent class analyses. Of course, the whole paper is based on simulation data, in which you simulate a known number of classes with known data generating processes ... reality is messier.

Last edited by Weiwen Ng; 11 Apr 2018, 06:53. Reason: Formatting.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
2 likes
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

11 Apr 2018, 09:07

Part 2 of your question should be fairly simple. You want to test if the means for each indicator among the latent classes are equal. I think this can be done using the -test- command. One minor complication is that you have to run -estat lcmean, post-, and importantly, you have to tell it to post the variance-covariance matrix. I have not seen people do this, but in principle, this is how I would do it (and if my approach is wrong, I'd love to hear it).

Code:

use http://www.stata-press.com/data/r15/gsem_lca1 gsem (accident play insurance stock <- ), logit lclass(C 2) estat lcmean, post test [1]accident = [2]accident

You can extend this to more than 2 classes and simultaneously test for equality. I believe that code is:

Code:

use http://www.stata-press.com/data/r15/gsem_lca1 gsem (accident play insurance stock <- ), logit lclass(C 3) estat lcmean, post test [1]accident = [2]accident, notest accum test [2]accident = [3]accident, accum

When I ran that code and I inspected the confidence intervals of the estat lcmeans output for the 3-class model ... they looked very, very strange compared to the 2-class model. Meaning the CIs run from 0 to 1. That is in very stark contrast to the 2 class model. I am running Stata 15.1, revised April 2, 2018, 64 bits for Windows. I replicated this behavior under version 15.0. If anyone gets different results from me for the 3-class model, please comment. If this is the wrong approach, I'd love to hear it as well.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

11 Apr 2018, 14:07

Originally posted by Weiwen Ng View Post

...
The Lo-Mendell-Rubin LR test is an analytical correction to the test statistic derived by Yungtai Lo, Nancy Mendell, and Donald Rubin, and published in Biometrika (2001). ....

This is the correct link to Lo et al. Apologies for the bad link.

Readers should also note that in Nylund et al's simulation study, the LMR LR test had type I error rates around 0.2, compared to the BLRT which had type I error rates around 0.6 (albeit they tested sample sizes of 200, 500, and 1,000; your situation may differ). I know that simulation data aren't real data, but I'd still rely more on the BIC, and we already get that in Stata.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Liyuwork Dana

Join Date: Apr 2018

Posts: 27
#5

12 Apr 2018, 00:42

Thank you Weiwen for your response. Very useful. I have tried the syntax you provided me to calculate Lo–Mendell–Rubin Adjusted Likelihood Ratio Test, it doesn't show me error message but not giving me the test result I am not sure why.

Any comment, thanks
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

12 Apr 2018, 15:37

Originally posted by Liyuwork Dana View Post

Thank you Weiwen for your response. Very useful. I have tried the syntax you provided me to calculate Lo–Mendell–Rubin Adjusted Likelihood Ratio Test, it doesn't show me error message but not giving me the test result I am not sure why.

Any comment, thanks

Oh dear, this is a very elementary error on my part. I wrote this command for my own use, but it presumes you had saved this file as an ado file, and stored the estimates from the k and k-1 class model using -estimates store-, and finally typed -lmrtest- in the command line. In assuming that the operations would be obvious to everyone, I committed a basic theory of mind error. My apologies for the lack of clarity!

I would suggest you do the following.

Code:

gsem (x1 x2 x3 <- , logit), lclass(C 2) estimates store class2 scalar n = e(N) scalar ll_2 = e(ll) scalar parms_2 = e(k) scalar classes_2 = 2 gsem (x1 x2 x3 <- , logit), lclass(C 3) estimates store class3 scalar n = e(N) scalar ll_3 = e(ll) scalar parms_3 = e(k) scalar classes_3 = 3 scalar p = (3 * `classes_3' - 1) scalar q = (3 * `classes_2' - 1) scalar lr_test_stat = -2 * (ll_2 - ll_3) scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1) scalar list modlr_test_stat

This requires more manual work but it will get you your LMR test p-values. If you want to automate the process, then I would copy my original block of code to a do file and save it as an ado called lmrtest.ado in your personal ado file directory.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Liyuwork Dana

Join Date: Apr 2018

Posts: 27
#7

23 Apr 2018, 19:46

Thank you Weiwen.

This requires more manual work but it will get you your LMR test p-values. If you want to automate the process, then I would copy my original block of code to a do file and save it as an ado called lmrtest.ado in your personal ado file directory.[/QUOTE]

I would appreciate if you copy your original block. I am not such familiar person on LPA. Many thanks once again.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

24 Apr 2018, 16:12

Originally posted by Liyuwork Dana View Post

Thank you Weiwen.

I would appreciate if you copy your original block. I am not such familiar person on LPA. Many thanks once again.

Take the code I posted in post #2. Save it as lmrtest.ado in your personal ado directory, which is probably c:\ado\personal on a Windows machine.

Then, you'd use the code like this:

Code:

gsem (x1 x2 x3 <- , logit), lclass(C 2) estimates store class2 gsem (x1 x2 x3 <- , logit), lclass(C 3) estimates store class3 lmrtest class2 class3

Remember, the sequence of arguments is null model, then alternate model. The null model is the one with fewer parameters (i.e. it is the more parsimonious model, and the alternate model is nested within it, just like a regular likelihood ratio test). I did not write a help file for this because I wrote it for my personal use. However, once it's saved in your personal ado directory, Stata will treat it like a regular estimation command.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#9

25 Apr 2018, 10:44

Code should read as follows.

Code:

gsem (x1 x2 x3 <- , logit), lclass(C 2) estimates store class2 gsem (x1 x2 x3 <- , logit), lclass(C 3) estimates store class3 lmrtest class2 class3

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Liyuwork Dana

Join Date: Apr 2018

Posts: 27
#10

29 Apr 2018, 20:45

Thank you indeed Weiwen.
Comment
Liyuwork Dana

Join Date: Apr 2018

Posts: 27
#11

18 May 2018, 02:08

Hi Scholars,

syntax on Lo–Mendell–Rubin Adjusted Likelihood Ratio Test

Can someone comment me what is wrong with my today's syntax? After I copied the syntax from post #2 and saved as -lmrtest- (like this: save "C:\ado\Personal\lmrtest.ado") then used the following syntax: .

gsem (q6_1 q6_2 q6_3 q6_4 <- _cons), family(gaussian) link(identity) lclass(A 2)
estimates store class2
gsem (q6_1 q6_2 q6_3 q6_4 <- _cons), family(gaussian) link(identity) lclass(A 3)
estimates store class3
lmrtest class2 class3

the Stata is not resulting the p-value but not also showing me" invalid syntax" message.

Please, help me on this.
Many thanks
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#12

18 May 2018, 06:48

Originally posted by Liyuwork Dana View Post

Hi Scholars,

syntax on Lo–Mendell–Rubin Adjusted Likelihood Ratio Test

Can someone comment me what is wrong with my today's syntax? After I copied the syntax from post #2 and saved as -lmrtest- (like this: save "C:\ado\Personal\lmrtest.ado") then used the following syntax: .

gsem (q6_1 q6_2 q6_3 q6_4 <- _cons), family(gaussian) link(identity) lclass(A 2)
estimates store class2
gsem (q6_1 q6_2 q6_3 q6_4 <- _cons), family(gaussian) link(identity) lclass(A 3)
estimates store class3
lmrtest class2 class3

the Stata is not resulting the p-value but not also showing me" invalid syntax" message.

Please, help me on this.
Many thanks

I wrote the code above to return the p-value as a scalar that can be exported to regression results. The command, as written, doesn't display anything, which I forgot to mention - I did say this was written for my own use! If you want to display the scalar, you can type:

Code:

scalar list lmrt_p

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Liyuwork Dana

Join Date: Apr 2018

Posts: 27
#13

20 May 2018, 21:00

OMG, now I give up with this STATA staff and calculating Lo–Mendell–Rubin Adjusted Likelihood Ratio Test. So annoying using STATA for LPA. Anyways, I appreciated your help Weiwen Ng.
Comment
Harshit Aneja

Join Date: Jul 2018

Posts: 2
#14

01 Jul 2018, 07:12

Originally posted by Weiwen Ng View Post

For other readers' information, in latent class and latent profile models, we are interested in testing which number of classes fit the data the best. Stata provides the BIC. This is fine, but my understanding is that there is still a lot of theoretical debate on how to test for the appropriate number of classes. You might think, why not use a likelihood ratio test? People far smarter than I have shown that an LR test for k vs k-1 class does not have a chi-2 distributed test statistic because of something about boundary of the parameter space blah blah blah.

The Lo-Mendell-Rubin LR test is an analytical correction to the test statistic derived by Yungtai Lo, Nancy Mendell, and Donald Rubin, and published in Biometrika (2001). In another source I cannot recall, I think someone challenged the theoretical work, but Nylund, Asparouhov, and Bength Muthen (2007) reviewed the literature and performed a simulation study, and showed that the LMR test was adequate (but the bootstrapped LR test performed better - only thing is, Stata cannot do the BLRT - will explain later).

Lo et al's paper (my first link) is very dense, but after reading that and a few other sources, I believe that their equation 15 describes the adjustment to the LR test statistic. The terms p and q in their equation are described on page 769, under equation 1.

If that reading is correct, then I believe that this ado file I wrote for personal use will calculate the LMR test statistic. I hope the community will point out any errors if they notice them! Liuwork, you will have to save this in your personal ado file directory (on my machine, C:\ado\personal - this will vary depending on where you installed Stata).

Code:

program lmrtest, rclass version 15.0 args estimates_null estimates_alternative tempname n ll_null ll_alt parms_null parms_alt a classes_null classes_alt lr_test_stat modlr_test_stat p q quietly { estimates restore `1' scalar `n' = e(N) scalar `ll_null' = e(ll) scalar `parms_null' = e(k) matrix `a' = e(lclass_k_levels) scalar `classes_null' = `a'[1,1] estimates restore `2' scalar `ll_alt' = e(ll) scalar `parms_alt' = e(k) matrix `a' = e(lclass_k_levels) scalar `classes_alt' = `a'[1,1] scalar `p' = (3 * `classes_alt' - 1) scalar `q' = (3 * `classes_null' - 1) scalar `lr_test_stat' = -2 * (`ll_null' - `ll_alt') scalar `modlr_test_stat' = `lr_test_stat' / (1 + ((`p' - `q') * ln(`n')) ^ -1) } return scalar lmrt_p = chi2tail(`parms_alt' - `parms_null',`modlr_test_stat') end

The bootstrap LR test cannot be done by Stata, because Stata does not permit bootstrapping in gsem with categorical latent variables. Moreover, you could run into a situation where some bootstrap runs fail to converge; what do you do then? I'm not sure how other programs handle that situation. Because latent class models have frequent convergence problems, especially with higher numbers of classes, this is going to happen. I've expressed to Stata that I would like them to implement the BLRT, because it seems like it is the test that is considered most highly by practitioners, but it is not trivial to implement, and the BIC is adequate - the Nylund et al paper said that "we conclude that the BIC is superior to all other ICs", although more so for latent profile analyses than latent class analyses. Of course, the whole paper is based on simulation data, in which you simulate a known number of classes with known data generating processes ... reality is messier.

I am not able to understand what parms_null and parms_alt contain. I ran it on my system and it's not df which in my 3 and 4 class study turn out as 18 and 23. Number of observed variables is 4 in my case. params_null and params_alt are 27 and 36 respectively for me. params in general is 9*number of classes.
I am able to compute results thanks to you but without understanding it, I won't be able to submit my findings.
Btw thanks a lot for your code and help.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#15

01 Jul 2018, 09:37

Originally posted by Harshit Aneja View Post

I am not able to understand what parms_null and parms_alt contain. I ran it on my system and it's not df which in my 3 and 4 class study turn out as 18 and 23. Number of observed variables is 4 in my case. params_null and params_alt are 27 and 36 respectively for me. params in general is 9*number of classes.
I am able to compute results thanks to you but without understanding it, I won't be able to submit my findings.
Btw thanks a lot for your code and help.

Harshit, params_null is the number of parameters in the null model, and params_alt is the number of parameters in the alternative (i.e. less parsimonious) model. In a regular LR test, these would be the difference in degrees of freedom in each model.

Now, you do raise an interesting point. I assigned the number of parameters as e(k), which is returned by gsem. That's the number of parameters in the model.

However, in Stata's gsem example data from example 50, which I used in my sample code above, one parameter is always zero - that's the parameter concerning the base class in the multinomial logit model for the classes. It is zero because it has to be.

Maybe I misinterpreted what the number of parameters means! I do apologize if this is my misunderstanding. I believe the degrees of freedom should correspond to the scalar e(rank). That's the rank of the variance-covariance matrix, which is returned as e(V). Can you check if e(rank) corresponds to your reported degrees of freedom? After running a -gsem- command, you can do that by typing:

Code:

display e(rank)

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Lo–Mendell–Rubin Adjusted Likelihood Ratio Test and wald test

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment