Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • LCA: Classification diagnostics - average posterior class probabilities and odds of correct classification ratio

    Dear State Users,

    I did a Latent Class Analysis for up to five classes with six ordinal variables, using the gsem command. After deciding upon the number of classes with the BIC, I now would like to perform further classification diagnostics to be sure that my five-class solution is a suitable one. The literature suggests to asses the average posterior class probabilities (AvPP) as well as the odds of correct classification ratio (OCC).

    AvPP tests for in-class homogeneity and represents the average probability of an individual being assigned to a class given its response pattern. According to the literature, AvPP values should be at least above 0.7.
    OCC evaluates across-class separation and thereby indicates how good the latent classes are separated; values should be above 5.00 (Masyn, 2013).

    I have already calculated the predicted class probabilities using

    Code:
    predict classpost*, classposteriorpr 
    
    egen max = rowmax(classpostf*)
    generate predclass = 1 if classpost1==max & max<.
    replace predclass = 2 if classpost2==max & max<.
    replace predclass = 3 if classpost3==max & max<.
    replace predclass = 4 if classpost4==max & max<.
    replace predclass = 5 if classpost5==max & max<.
    fre predclass
    however, I do not think that this is giving me AvPP.


    Unfortunately, I couldn't find any explanation how these two diagnostics can be performed with State. Maybe anyone has already done it, or at least an idea?


    Thank you very much in advance.





  • #2
    You have clearly read the Masyn chapter on LCAs cited in the SEM example. Just a minor semantic/conceptual thing: the variable you call predclass really is the modal latent class. Conceptually, there is no single predicted latent class; we got a vector of class membership probabilities, and you decided to predict membership based on the most likely class.

    If you refer to page 570, Masyn said that

    That is, AvePPk is the mean of the Class k pos- terior class probabilities across all individuals whose maximum posterior class probability is for Class k.
    "Maximum posterior class probability" means modal class = k. So, you want to do something like

    Code:
    mean classpost1 if predclass == 1
    And, still on pg 570, the odds of correct classification ratio for class k is related to AvePPk and the model-estimated probability of class membership in class k. You get that from estat lcprob. For class 1, it's just

    [AvePP_1 / (1 - AvePP_1)] / [pi_1 / (1 - pi_1)]

    So, this isn't pre-baked in Stata, and I haven't seen any papers refer to these quantities. This is something that's amenable to Excel.

    NB: estat lcprob can be run without standard errors (nose option) to speed up computation. If your data set is truly enormous such that this computation is too big, it can be manually calculated from the intercepts at the head of the output table; you're just using the formula for multinomial logit regression. Just look that formula up and ignore the references to XB, because you have no predictors of the latent class membership.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thank you so much Weiwen. This was extremely helpful and worked perfectly.

      Comment

      Working...
      X