Parameter interpretation in lclogit

Pedro Ramos

Join Date: Apr 2015

Posts: 18
#1

Parameter interpretation in lclogit

31 Mar 2016, 16:39

Hi statalist,

I have run a latent class model with some discrete choice experiment (DCE) data, using lclogit. lclogit is a user-written package by Pacifico and Yoo (http://www.stata-journal.com/article.html?artic).
The DCE asked students to choose between 2 jobs based on several attributes (salary, location, etc.).
I have evaluated the model at different classes, and using BIC and AIC criteria I eventually ran a model with 4 classes and a membership variable that is a score the student has in a test (e.g. GRE).
I used lclogitml and got the coefficients for job attributes in each of the 4 classes.
I have some doubts regarding the interpretation of the coefficients of the membership variable (score) : I got 3 membership variable coefficients (4 classes model). What do these actually mean? And, how may I understand the average score in each class? At this point, I don't even know whether class1 has the students with the highest or lowest score in the exam... So, some help is greatly appreciated ...

(the bottom half of the output of lclogitml command , that has the membership coefficients)

--------------+--Coef. Std. Err. z P>|z| [95% Conf. Interval]--------------------------------------------------------------
share1 |
score | .0262831 .0135368 1.94 0.052 -.0002486 .0528148
_cons | -1.699223 .8519358 -1.99 0.046 -3.368987 -.0294598
--------------+----------------------------------------------------------------
share2 |
score | .0289432 .0116016 2.49 0.013 .0062044 .051682
_cons | -1.439014 .7379209 -1.95 0.051 -2.885312 .0072847
--------------+----------------------------------------------------------------
share3 |
score | .0895226 .0134564 6.65 0.000 .0631485 .1158967
_cons | -5.536603 .9503466 -5.83 0.000 -7.399248 -3.673958
-------------------------------------------------------------------------------

Thank you
Best,
Pedro
Tags: None
wbuchanan

Join Date: Mar 2014

Posts: 1362
#2

01 Apr 2016, 10:17

Pedro Ramos I'm not sure if there is any good literature available related to interpreting the coefficients of Latent Class Models directly beyond quantifying the relationship between class membership and manifest variables. Once you've fitted your model and have predicted class membership, you can use something like:

Code:

tabstat variables_of_interest, by(class_membership)

To get the observed summary statistics for the variables of interest. From there you can start to develop a "profile" of sorts to explain the characteristics of the classes. I'm not sure how well this generalizes to discrete choice experiments, but it is a relatively typical approach to interpreting/inferring some type of meaning about the classes that Muthen and others have discussed.
1 like
Comment
Pedro Ramos

Join Date: Apr 2015

Posts: 18
#3

01 Apr 2016, 16:09

wbuchanan Thank you for your reply.
I have found some papers that use LCM and give some idea of how to understand my results. (e.g. http://www.sciencedirect.com/science...98301511014197) At first glance, the likelihood to be a member of class 3 increases with an increasing score, and overall I have a group of "high achievers" (group 3) , "low achievers (group 4) and 2 intermediate, that have different preferences' structures and responses to the DCE. That is OK and is a piori what I thought I would find.

I still, however, did not find a way of calculating the average score (the class membership variable) at each class.
I tried your code but there is probably something I am doing wrong:

tabstat score, by(???) Thank you for your help!
Comment
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#4

03 Apr 2016, 04:38

Hi Pedro, I think you missed the part of predicting the class membership (this is well documented in the paper). After you estimate several models with different # of classes and choose (based on AIC or BIC for example) the "best" number of classes, you should predict the class membership, based on the conditional probability. This will help you to compare your variable as wbuchanan suggested, or by plotting the effect of the predictors on the probability of class membership (since each id have positive probability of being in every class).

Code:

//predicts the posterior probabilities (help lclogit postestimation) lclogitpr cp, cp //flag the maximum probability egen double cpmax = rowmax(cp1-cp4) // create the class membership based on the highest probability gen byte class = . forval c=1/4 { replace class = `c' if cpmax==cp`c' }

* You should refresh the attached link in #1.
* Next time share your output in [CODE] tags.
1 like
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#5

03 Apr 2016, 06:47

Pedro Ramos Oded Mcdossi helped to clarify what I was trying to suggest. Aside from their affect on classification (e.g., probability of membership in class n vs class m), the coefficients don't have any meaningful interpretation since the classes themselves are essentially arbitrary (e.g., the latent classes are nominal in nature so the values lack any inherent meaning unto themselves). That's why descriptive statistics are used after fitting the model in an attempt to describe the "meaning" - or label - that could be attached to each class (e.g., "low achievers" vs "high achievers").
1 like
Comment

Babatope Akinyemi

Join Date: Nov 2014
Posts: 15

16 May 2016, 02:20

Hi Statalist,

I ran latent class logit model in Stata 13 with discrete choice experiment (DCE) data using lclogit written by Pacifico and Yoo.
To fix idea about my research, here is what I am working on
My choice experiment measure tourist willingness to pay for attributes of ecotourism trip (Village accommodation, craft making market, village tour and price).
110 tourists were presented with 7 choice sets each with three alternatives (i.e. 2 alternatives with attributes of ecotourism and a status quo equivalent to their current trip)
Tourist can either choose Trip A or Trip B or stick with current trip (equivalent to status quo or opt out).
Tourist socio-demographic characteristics such as Gender Age Education years, nationality and income were used as determinants of class membeship

The challenge encountered is that parameter estimates for a particular class are missing. I really do not know why these parameters are missing and the implications for the applicability of the result to my data. Any assistance will be much appreciated. Thanking you in advance for your comments!

Below is the command ran

Code:

***Estimation of asmptotic standard errors and z-values of estimates from lclogit through gllam
lclogitml, iterate(5)

After sending the above command to stata, I got the result below

Code:

 ***Estimation of asmptotic standard errors and z-values of estimates from lclogit through gllam
. lclogitml, iterate(5)
-gllamm- is initializing. This process may take a few minutes.

numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
Iteration 0:   log likelihood = -441.06276  (not concave)
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
Iteration 1:   log likelihood = -441.06276  (not concave)
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
Iteration 2:   log likelihood = -441.06276  (not concave)
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
Iteration 3:   log likelihood = -441.06276  (not concave)
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
Iteration 4:   log likelihood = -441.06276  (not concave)
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
numerical derivatives are approximate
flat or discontinuous region encountered
Iteration 5:   log likelihood = -441.06276  (not concave)
convergence not achieved

Latent class model with 3 latent classes
-------------------------------------------------------------------------------
       Choice |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
choice1       |
       Price2 |   .0659244   .0107037     6.16   0.000     .0449455    .0869032
Accommodation |  -.7282424   .2369424    -3.07   0.002    -1.192641   -.2638437
       Market |  -.0795953   .2947011    -0.27   0.787    -.6571989    .4980083
         Tour |   .2114257   .2318396     0.91   0.362    -.2429715    .6658229
--------------+----------------------------------------------------------------
choice2       |
       Price2 |  -2.520002   1.075836    -2.34   0.019    -4.628602   -.4114024
Accommodation |   33.41135   17.16425     1.95   0.052    -.2299641    67.05266
       Market |   17.96667          .        .       .            .           .
         Tour |  -1.970766          .        .       .            .           .
--------------+----------------------------------------------------------------
choice3       |
       Price2 |  -.0027144   .0100404    -0.27   0.787    -.0223932    .0169644
Accommodation |  -.9510399   .1919325    -4.96   0.000    -1.327221    -.574859
       Market |   -.406724   .1761886    -2.31   0.021    -.7520474   -.0614006
         Tour |  -.4181075   .1473625    -2.84   0.005    -.7069328   -.1292822
--------------+----------------------------------------------------------------
share1        |
         male |  -.0801244   .6545101    -0.12   0.903    -1.362941    1.202692
          Age |   .0322988   .0250557     1.29   0.197    -.0168093     .081407
     Eduyears |  -.1015891   .1269271    -0.80   0.423    -.3503617    .1471835
     national |  -.8126188   .7034051    -1.16   0.248    -2.191267    .5660298
       Income |   .5675667   .3120162     1.82   0.069    -.0439738    1.179107
        _cons |  -1.547812    2.07203    -0.75   0.455    -5.608915    2.513292
--------------+----------------------------------------------------------------
share2        |
         male |    .510867   .4993067     1.02   0.306    -.4677561     1.48949
          Age |   .0517979   .0193701     2.67   0.007     .0138332    .0897626
     Eduyears |   -.189617   .1004979    -1.89   0.059    -.3865893    .0073553
     national |  -.8631777    .544691    -1.58   0.113    -1.930752    .2043971
       Income |    .231292   .2207554     1.05   0.295    -.2013807    .6639647
        _cons |   .6208119   1.523915     0.41   0.684    -2.366006     3.60763
-------------------------------------------------------------------------------

As you can see, parameter estimates for the standard error, z-statistics, p>|z| and 95% confidence interval for Market and Tour in choice 2 are missing.

I look forward to comments

Babatope Akinyemi

Comment

wbuchanan

Join Date: Mar 2014

Posts: 1362
#7

23 May 2016, 05:45

This is usually an issue with identification/estimation. The flat or discontinuous region message from the command is telling you that the likelihood function isn't a smooth surface and is a way of indicating that you may be finding solutions based on local vs global maxima. Were you able to successfully fit any simpler models to the data?
Comment
Pedro Ramos

Join Date: Apr 2015

Posts: 18
#8

03 Jun 2016, 16:02

Oded Mcdossi

A little bit late, but thank you for your help!!
BR,
Pedro
Comment
Babatope Akinyemi

Join Date: Nov 2014

Posts: 15
#9

03 Jun 2016, 22:06

wbuchanan thank you for your comment on my post. I am not yet successful in fixing the identification/estimation issues. Any suggestions or advice will be much appreciated.

Kind Regards!
Babatope
Comment
Babatope Akinyemi

Join Date: Nov 2014

Posts: 15
#10

03 Jun 2016, 22:13

wbuchanan again thanks for your comment. I was able to fit conditional logit and mixed logit models for the data. Please clarify what you mean by

Code:

finding solutions based on local vs global maxima

.

Kind regards!

Babatope
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#11

04 Jun 2016, 02:59

Babatope:
William helpful comment can be probably translate into the following (usual) receipt: start with a simpler model, add one predictor at time and see where Stata starts to choke on.

Kind regards,
Carlo
(Stata 19.0)
Comment
Babatope Akinyemi

Join Date: Nov 2014

Posts: 15
#12

04 Jun 2016, 03:34

Carlo,

Oh thanks for interpreting William's comment to me. I will try that and keep you posted of my result.

Kindest regards,

Babatope
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#13

04 Jun 2016, 19:34

Babatope Akinyemi sorry for the delay; next door neighbors thought it was a good time to set the building on fire the other night and the wife and I have been trying to figure things out since then. Maximum Likelihood Estimators are trying to maximize the likelihood function of your model. If the model is overly complex, there can be non-smooth points along the likelihood. In some cases this results in error/warning messages about the likelihood function not being concave and can lead to the model not converging. If the likelihood function isn't smooth it can cause the estimation algorithm to believe it has found the global maxima when really it is something analogous to a saddle point in the middle of the likelihood function (e.g., the model converged on a bump in the likelihood function instead of the maxima of the function).

Last edited by wbuchanan; 04 Jun 2016, 19:40.
Comment
Babatope Akinyemi

Join Date: Nov 2014

Posts: 15
#14

05 Jun 2016, 10:14

William,
Thanks a lot for your detailed response. I am already working with Carlo clarifications on your earlier suggestion in #7 and this additional detailed explanations provided by you will definitely help me to figure out issues with model and correct it. I will get back to you on my results.

Kindest regards,

Babatope
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#15

05 Jun 2016, 10:19

Carlo's suggestion is definitely a way to find problematic predictors. In the end you may find issues regardless of how parsimonious the model is, in which case you may want to reconsider the model itself and then work from simplest to most complex model that allows you to test your hypothesis.
Comment

Announcement