Latent class logit with lclogit: problem with choosing the number of classes

Gengyang TU

Join Date: Mar 2020
Posts: 4

Latent class logit with lclogit: problem with choosing the number of classes

27 Mar 2020, 04:25

Hi statalist,

I have run a latent class model with a discrete choice experiment data (9882 observations) using the lclogit package written by Pacifico and Yoo (http://www.stata-journal.com/article.html?artic).

I would like to ask two questions:

Why Stata shows me the model fit indicators of 2-7 classes multiple times? (I have a few guesses but I'd like to confirm)

Should I choose the 4-classes model? (see code below) Besides, if I remove all the membership variables, the best model is the 3-classes model.

I have another model which can't converge if the number of classes is more than 5. In this case, should I use a 4-classes model if it is the best fit model?

BTW, I have also tried the lclogit2 and I have specified all attributes as random variables. But the doesn't converge if the number of classes is more than 3. So, I prefer to stay with lclogit.

Any thoughts and suggestions are greatly appreciated.

Many thanks!

Best,
Gengyang

Code:

. forvalues c = 2/7 {
  2.                 quietly lclogit choice price attribute2 attribute3 attribute4 attribute5 attribute6, group(group) id(id) nclasses(`c') membe
> rship(x1 x2 x3 x4 x5 x6)
  3.                 matrix b = e(b)
  4.                 matrix ic = nullmat(ic) \ `e(nclasses)', `e(ll)',`=colsof(b)', `e(aic)', `e(caic)', `e(bic)'
  5.          }
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular
Warning:  variance matrix is nonsymmetric or highly singular

.
. matrix colnames ic = "Classess" "LLF" "Nparam" "AIC" "CAIC" "BIC"

.
. matlist ic, name(columns)

 Classess        LLF     Nparam        AIC       CAIC        BIC
-----------------------------------------------------------------
        2  -2856.544         19   5751.088   5851.941   5832.941
        3   -2809.69         31   5681.381   5845.932   5814.932
        4   -2734.25         43   5554.501   5782.749   5739.749
        5  -2703.635         55   5517.269   5809.215   5754.215
        2  -2856.544         19   5751.088   5851.941   5832.941
        3  -2809.691         31   5681.381   5845.932   5814.932
        4   -2758.73         43   5603.461   5831.709   5788.709
        5  -2705.848         55   5521.696   5813.642   5758.642
        6  -2681.539         67   5497.077    5852.72    5785.72
        7  -2657.916         79   5473.832   5893.172   5814.172
        2  -2856.544         19   5751.088   5851.941   5832.941
        3  -2809.691         31   5681.381   5845.932   5814.932
        4   -2758.73         43   5603.461   5831.709   5788.709
        5  -2705.848         55   5521.696   5813.642   5758.642
        6  -2681.539         67   5497.077    5852.72    5785.72
        7  -2657.914         79   5473.828   5893.168   5814.168
        2  -2848.066         20   5736.131   5842.293   5822.293
        3  -2799.096         33   5664.193    5839.36    5806.36
        4  -2746.507         46   5585.015   5829.187   5783.187
        5  -2692.474         59   5502.948   5816.126   5757.126
        6  -2670.848         72   5485.697    5867.88    5795.88
        7  -2641.101         85   5452.203   5903.391   5818.391
        2  -2846.607         24   5741.214   5868.608   5844.608
        3  -2796.342         41   5674.683   5892.316   5851.316
        4  -2740.002         58   5596.004   5903.874   5845.874
        5  -2684.407         75   5518.814   5916.922   5841.922
        6      -2659         92       5502   5990.346   5898.346
        7  -2627.486        109   5472.972   6051.555   5942.555

Tags: None

Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#2

27 Mar 2020, 05:13

You must have already had a matrix named -ic- before starting the loop in question. You can -capture matrix drop ic- before starting the loop to avoid the issue that you have reported.

That -lclogit- returns some results while -lclogit2- does not converge is a very bad reason to prefer -lclogit- to -lclogit2-. Based on the warning messages that you have shared with us, I suspect that you're working with a model that may not be empirically identified given your data. My suggestion here is that you modify -seed()- option to estimate the model in question from several different sets of starting values, and check whether -lclogit2- still fails to achieve convergence. If it fails to achieve convergence from all starting values, you may consider reducing the number of classses.
1 like
Comment
Gengyang TU

Join Date: Mar 2020

Posts: 4
#3

29 Mar 2020, 15:24

Originally posted by Hong Il Yoo View Post

You must have already had a matrix named -ic- before starting the loop in question. You can -capture matrix drop ic- before starting the loop to avoid the issue that you have reported.

That -lclogit- returns some results while -lclogit2- does not converge is a very bad reason to prefer -lclogit- to -lclogit2-. Based on the warning messages that you have shared with us, I suspect that you're working with a model that may not be empirically identified given your data. My suggestion here is that you modify -seed()- option to estimate the model in question from several different sets of starting values, and check whether -lclogit2- still fails to achieve convergence. If it fails to achieve convergence from all starting values, you may consider reducing the number of classses.

Dear Professor Yoo,

Thank you very much for your reply. Now I have gotten the right matrix.

The package lclogit2 works well. It shows that my model doesn't converge with more than 4 classes. The matrix shows the model fit indices of the 1-4-classes model, too. I guess that I can either use a 4-classes model or reduce the number of variables predicting the class membership.

Best regards,
Gengyang

Code:

. forvalues c = 2/7 { 2. . quietly lclogit2 choice, rand(price attribute2 attribute3 attribute4 attribute5 attribute6)group(group) id(id) nclasses(`c') me > mbership(x1 x2 x3 x4 x5 x6)seed(1234) 3. . matrix b = e(b) 4. . matrix ic = nullmat(ic) \ `e(nclasses)', `e(ll)',`=colsof(b)', `e(aic)', `e(caic)', `e(bic)' 5. . } convergence not achieved convergence not achieved r(430); . . . matrix colnames ic = "Classess" "LLF" "Nparam" "AIC" "CAIC" "BIC" . . matlist ic, name(columns) Classess LLF Nparam AIC CAIC BIC ----------------------------------------------------------------- 2 -2846.848 22 5737.697 5854.475 5832.475 3 -2797.124 37 5668.247 5864.647 5827.647 4 -2741.52 52 5587.04 5863.061 5811.061 . .
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#4

30 Mar 2020, 11:08

: If you'd like to double-check that the 5-class model is indeed empirically unidentified, you can attempt to estimate the model outside the loop, by writing out a separate command line for it; the value of -seed()- that leads to a (local) maximum may be class-specific.
Comment
Gengyang TU

Join Date: Mar 2020

Posts: 4
#5

06 Apr 2020, 03:36

Originally posted by Hong Il Yoo View Post

: If you'd like to double-check that the 5-class model is indeed empirically unidentified, you can attempt to estimate the model outside the loop, by writing out a separate command line for it; the value of -seed()- that leads to a (local) maximum may be class-specific.

Thank you for your advice.

I have changed the value of -seed()-. Finally, the 5-class model converges. I still would like to know if there is a shortcut to find the right number of seeds.

Thanks again!
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#6

06 Apr 2020, 13:01

Gengyang TU: I've been dreaming of such a shortcut for 8 years but I have not found any unfortunately. If you require a source of inspiration, you can go to just-eat.co.uk or your local equivalent, and pick any restaurant that you see on the first page. Then you can combine their menu prices to construct a lot of random number seeds that you can experiment with. For example, the first three items for "We Wings" are priced at £3.90, £4.50 and £6.00 so you may try -seed(3904506)- and progress in a similar manner if you need more numbers.
Comment
Gengyang TU

Join Date: Mar 2020

Posts: 4
#7

08 Apr 2020, 08:30

Originally posted by Hong Il Yoo View Post

Gengyang TU: I've been dreaming of such a shortcut for 8 years but I have not found any unfortunately. If you require a source of inspiration, you can go to just-eat.co.uk or your local equivalent, and pick any restaurant that you see on the first page. Then you can combine their menu prices to construct a lot of random number seeds that you can experiment with. For example, the first three items for "We Wings" are priced at £3.90, £4.50 and £6.00 so you may try -seed(3904506)- and progress in a similar manner if you need more numbers.

Many thanks!

Gengyang
Comment

Announcement

Latent class logit with lclogit: problem with choosing the number of classes

Comment

Comment

Comment

Comment

Comment

Comment