Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parameter interpretation in lclogit

    Hi statalist,

    I have run a latent class model with some discrete choice experiment (DCE) data, using lclogit. lclogit is a user-written package by Pacifico and Yoo (http://www.stata-journal.com/article.html?artic).
    The DCE asked students to choose between 2 jobs based on several attributes (salary, location, etc.).
    I have evaluated the model at different classes, and using BIC and AIC criteria I eventually ran a model with 4 classes and a membership variable that is a score the student has in a test (e.g. GRE).
    I used lclogitml and got the coefficients for job attributes in each of the 4 classes.
    I have some doubts regarding the interpretation of the coefficients of the membership variable (score) : I got 3 membership variable coefficients (4 classes model). What do these actually mean? And, how may I understand the average score in each class? At this point, I don't even know whether class1 has the students with the highest or lowest score in the exam... So, some help is greatly appreciated ...

    (the bottom half of the output of lclogitml command , that has the membership coefficients)

    --------------+--Coef. Std. Err. z P>|z| [95% Conf. Interval]--------------------------------------------------------------
    share1 |
    score | .0262831 .0135368 1.94 0.052 -.0002486 .0528148
    _cons | -1.699223 .8519358 -1.99 0.046 -3.368987 -.0294598
    --------------+----------------------------------------------------------------
    share2 |
    score | .0289432 .0116016 2.49 0.013 .0062044 .051682
    _cons | -1.439014 .7379209 -1.95 0.051 -2.885312 .0072847
    --------------+----------------------------------------------------------------
    share3 |
    score | .0895226 .0134564 6.65 0.000 .0631485 .1158967
    _cons | -5.536603 .9503466 -5.83 0.000 -7.399248 -3.673958
    -------------------------------------------------------------------------------


    Thank you
    Best,
    Pedro



  • #2
    Pedro Ramos I'm not sure if there is any good literature available related to interpreting the coefficients of Latent Class Models directly beyond quantifying the relationship between class membership and manifest variables. Once you've fitted your model and have predicted class membership, you can use something like:

    Code:
    tabstat variables_of_interest, by(class_membership)
    To get the observed summary statistics for the variables of interest. From there you can start to develop a "profile" of sorts to explain the characteristics of the classes. I'm not sure how well this generalizes to discrete choice experiments, but it is a relatively typical approach to interpreting/inferring some type of meaning about the classes that Muthen and others have discussed.

    Comment


    • #3
      wbuchanan Thank you for your reply.
      I have found some papers that use LCM and give some idea of how to understand my results. (e.g. http://www.sciencedirect.com/science...98301511014197) At first glance, the likelihood to be a member of class 3 increases with an increasing score, and overall I have a group of "high achievers" (group 3) , "low achievers (group 4) and 2 intermediate, that have different preferences' structures and responses to the DCE. That is OK and is a piori what I thought I would find.

      I still, however, did not find a way of calculating the average score (the class membership variable) at each class.
      I tried your code but there is probably something I am doing wrong:

      tabstat score, by(???) Thank you for your help!

      Comment


      • #4
        Hi Pedro, I think you missed the part of predicting the class membership (this is well documented in the paper). After you estimate several models with different # of classes and choose (based on AIC or BIC for example) the "best" number of classes, you should predict the class membership, based on the conditional probability. This will help you to compare your variable as wbuchanan suggested, or by plotting the effect of the predictors on the probability of class membership (since each id have positive probability of being in every class).
        Code:
         //predicts the posterior probabilities (help lclogit postestimation)
         lclogitpr cp, cp
        
        //flag the maximum probability
        
        egen double cpmax = rowmax(cp1-cp4) 
        
        // create the class membership based on the highest probability
        
        gen byte class = .
        forval c=1/4 {
            replace class = `c' if cpmax==cp`c'
        }
        * You should refresh the attached link in #1.
        * Next time share your output in [CODE] tags.

        Comment


        • #5
          Pedro Ramos Oded Mcdossi helped to clarify what I was trying to suggest. Aside from their affect on classification (e.g., probability of membership in class n vs class m), the coefficients don't have any meaningful interpretation since the classes themselves are essentially arbitrary (e.g., the latent classes are nominal in nature so the values lack any inherent meaning unto themselves). That's why descriptive statistics are used after fitting the model in an attempt to describe the "meaning" - or label - that could be attached to each class (e.g., "low achievers" vs "high achievers").

          Comment


          • #6
            Hi Statalist,

            I ran latent class logit model in Stata 13 with discrete choice experiment (DCE) data using lclogit written by Pacifico and Yoo.
            To fix idea about my research, here is what I am working on
            My choice experiment measure tourist willingness to pay for attributes of ecotourism trip (Village accommodation, craft making market, village tour and price).
            110 tourists were presented with 7 choice sets each with three alternatives (i.e. 2 alternatives with attributes of ecotourism and a status quo equivalent to their current trip)
            Tourist can either choose Trip A or Trip B or stick with current trip (equivalent to status quo or opt out).
            Tourist socio-demographic characteristics such as Gender Age Education years, nationality and income were used as determinants of class membeship

            The challenge encountered is that parameter estimates for a particular class are missing. I really do not know why these parameters are missing and the implications for the applicability of the result to my data. Any assistance will be much appreciated. Thanking you in advance for your comments!

            Below is the command ran
            Code:
            ***Estimation of asmptotic standard errors and z-values of estimates from lclogit through gllam
            lclogitml, iterate(5)
            After sending the above command to stata, I got the result below

            Code:
             ***Estimation of asmptotic standard errors and z-values of estimates from lclogit through gllam
            . lclogitml, iterate(5)
            -gllamm- is initializing. This process may take a few minutes.
            
            numerical derivatives are approximate
            flat or discontinuous region encountered
            numerical derivatives are approximate
            flat or discontinuous region encountered
            numerical derivatives are approximate
            flat or discontinuous region encountered
            Iteration 0:   log likelihood = -441.06276  (not concave)
            numerical derivatives are approximate
            flat or discontinuous region encountered
            numerical derivatives are approximate
            flat or discontinuous region encountered
            numerical derivatives are approximate
            flat or discontinuous region encountered
            Iteration 1:   log likelihood = -441.06276  (not concave)
            numerical derivatives are approximate
            flat or discontinuous region encountered
            numerical derivatives are approximate
            flat or discontinuous region encountered
            numerical derivatives are approximate
            flat or discontinuous region encountered
            Iteration 2:   log likelihood = -441.06276  (not concave)
            numerical derivatives are approximate
            flat or discontinuous region encountered
            numerical derivatives are approximate
            flat or discontinuous region encountered
            numerical derivatives are approximate
            flat or discontinuous region encountered
            Iteration 3:   log likelihood = -441.06276  (not concave)
            numerical derivatives are approximate
            flat or discontinuous region encountered
            numerical derivatives are approximate
            flat or discontinuous region encountered
            numerical derivatives are approximate
            flat or discontinuous region encountered
            Iteration 4:   log likelihood = -441.06276  (not concave)
            numerical derivatives are approximate
            flat or discontinuous region encountered
            numerical derivatives are approximate
            flat or discontinuous region encountered
            numerical derivatives are approximate
            flat or discontinuous region encountered
            Iteration 5:   log likelihood = -441.06276  (not concave)
            convergence not achieved
            
            Latent class model with 3 latent classes
            -------------------------------------------------------------------------------
                   Choice |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
            choice1       |
                   Price2 |   .0659244   .0107037     6.16   0.000     .0449455    .0869032
            Accommodation |  -.7282424   .2369424    -3.07   0.002    -1.192641   -.2638437
                   Market |  -.0795953   .2947011    -0.27   0.787    -.6571989    .4980083
                     Tour |   .2114257   .2318396     0.91   0.362    -.2429715    .6658229
            --------------+----------------------------------------------------------------
            choice2       |
                   Price2 |  -2.520002   1.075836    -2.34   0.019    -4.628602   -.4114024
            Accommodation |   33.41135   17.16425     1.95   0.052    -.2299641    67.05266
                   Market |   17.96667          .        .       .            .           .
                     Tour |  -1.970766          .        .       .            .           .
            --------------+----------------------------------------------------------------
            choice3       |
                   Price2 |  -.0027144   .0100404    -0.27   0.787    -.0223932    .0169644
            Accommodation |  -.9510399   .1919325    -4.96   0.000    -1.327221    -.574859
                   Market |   -.406724   .1761886    -2.31   0.021    -.7520474   -.0614006
                     Tour |  -.4181075   .1473625    -2.84   0.005    -.7069328   -.1292822
            --------------+----------------------------------------------------------------
            share1        |
                     male |  -.0801244   .6545101    -0.12   0.903    -1.362941    1.202692
                      Age |   .0322988   .0250557     1.29   0.197    -.0168093     .081407
                 Eduyears |  -.1015891   .1269271    -0.80   0.423    -.3503617    .1471835
                 national |  -.8126188   .7034051    -1.16   0.248    -2.191267    .5660298
                   Income |   .5675667   .3120162     1.82   0.069    -.0439738    1.179107
                    _cons |  -1.547812    2.07203    -0.75   0.455    -5.608915    2.513292
            --------------+----------------------------------------------------------------
            share2        |
                     male |    .510867   .4993067     1.02   0.306    -.4677561     1.48949
                      Age |   .0517979   .0193701     2.67   0.007     .0138332    .0897626
                 Eduyears |   -.189617   .1004979    -1.89   0.059    -.3865893    .0073553
                 national |  -.8631777    .544691    -1.58   0.113    -1.930752    .2043971
                   Income |    .231292   .2207554     1.05   0.295    -.2013807    .6639647
                    _cons |   .6208119   1.523915     0.41   0.684    -2.366006     3.60763
            -------------------------------------------------------------------------------
            As you can see, parameter estimates for the standard error, z-statistics, p>|z| and 95% confidence interval for Market and Tour in choice 2 are missing.

            I look forward to comments

            Babatope Akinyemi

            Comment


            • #7
              This is usually an issue with identification/estimation. The flat or discontinuous region message from the command is telling you that the likelihood function isn't a smooth surface and is a way of indicating that you may be finding solutions based on local vs global maxima. Were you able to successfully fit any simpler models to the data?

              Comment


              • #8
                Oded Mcdossi


                A little bit late, but thank you for your help!!
                BR,
                Pedro

                Comment


                • #9
                  wbuchanan thank you for your comment on my post. I am not yet successful in fixing the identification/estimation issues. Any suggestions or advice will be much appreciated.

                  Kind Regards!
                  Babatope

                  Comment


                  • #10
                    wbuchanan again thanks for your comment. I was able to fit conditional logit and mixed logit models for the data. Please clarify what you mean by
                    Code:
                     finding solutions based on local vs global maxima
                    .

                    Kind regards!

                    Babatope

                    Comment


                    • #11
                      Babatope:
                      William helpful comment can be probably translate into the following (usual) receipt: start with a simpler model, add one predictor at time and see where Stata starts to choke on.
                      Kind regards,
                      Carlo
                      (Stata 18.0 SE)

                      Comment


                      • #12
                        Carlo,

                        Oh thanks for interpreting William's comment to me. I will try that and keep you posted of my result.

                        Kindest regards,

                        Babatope

                        Comment


                        • #13
                          Babatope Akinyemi sorry for the delay; next door neighbors thought it was a good time to set the building on fire the other night and the wife and I have been trying to figure things out since then. Maximum Likelihood Estimators are trying to maximize the likelihood function of your model. If the model is overly complex, there can be non-smooth points along the likelihood. In some cases this results in error/warning messages about the likelihood function not being concave and can lead to the model not converging. If the likelihood function isn't smooth it can cause the estimation algorithm to believe it has found the global maxima when really it is something analogous to a saddle point in the middle of the likelihood function (e.g., the model converged on a bump in the likelihood function instead of the maxima of the function).
                          Last edited by wbuchanan; 04 Jun 2016, 19:40.

                          Comment


                          • #14
                            William,
                            Thanks a lot for your detailed response. I am already working with Carlo clarifications on your earlier suggestion in #7 and this additional detailed explanations provided by you will definitely help me to figure out issues with model and correct it. I will get back to you on my results.

                            Kindest regards,

                            Babatope

                            Comment


                            • #15
                              Carlo's suggestion is definitely a way to find problematic predictors. In the end you may find issues regardless of how parsimonious the model is, in which case you may want to reconsider the model itself and then work from simplest to most complex model that allows you to test your hypothesis.

                              Comment

                              Working...
                              X