Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use Output from LCA with Penn State Plugin

    Dear Statalisters,

    i am fairly inexperienced in Stata and i currently try to run some latent class analyses with the plugin from Penn State University (https://www.methodology.psu.edu/research-and-rigor/). While i can recreate the example models provided by the creators in the attached do-file, i now wonder how to use the generated output (i.e. the identified latent classes) for further analysis. First and foremost, i cannot figure out how to generate a variable that assigns all observations into their respective classes based on their estimated probabilities. I assume i have to store the generated predicitive class probabilities into a matrix using "mat" and "svmat", but do not know where to go from there. For example, in the proposed example 1, there are five classes identified and variables generated for estimated predicted probabilities (these are named _post_prob1, _post_prob2 etc.). When i compute the matrix for post_prob and save it through svmat, i just get a variable with missing values.
    I use Stata 15 and before installing the Penn State plugin, i also used Stata´s gsem command to run lca. Unfortunately, this resulted in problems by not reaching model fits, despite trying the various methods described in Stata´s introduction to sem (section 12). Resorting to the plugin is fine with me, because it offers more comprehensive information criteria and runs much faster. I´m afraid i might just lack the basic understanding of the Stata syntax to handle the plugins´ output correctly.

    I am sure this is not a very complex problem and it would be much appreciated if someone could provide some general information/coding examples on how to generate variables out of doLCA´s output.

  • #2
    The link given in post #1 does make it immediately obvious where the Stata plugin is to be obtained. Can you give a link to a page on which the plugin is described and from which it can found for downloading?

    Added in edit: On further reflection, your question generically seems to be

    how to generate a variable that assigns all observations into their respective classes based on their estimated probabilities
    How would you expect an observation to be assigned to a class? To the class with the highest probability for that observation? Or something else? Is there a specific technique described by Penn State?
    Last edited by William Lisowski; 25 Nov 2021, 08:49.

    Comment


    • #3
      Originally posted by Dominik Harder View Post
      ...While i can recreate the example models provided by the creators in the attached do-file, i now wonder how to use the generated output (i.e. the identified latent classes) for further analysis. First and foremost, i cannot figure out how to generate a variable that assigns all observations into their respective classes based on their estimated probabilities...
      First, a clarification. When you run a latent class model, you don't get the latent class that each observation belongs to. You get the probability that each observation is in each of the latent classes you assumed (if you are familiar with vectors, you get a vector of class membership probabilities).

      You can assume that each observation belongs to the latent class where the membership probability is highest, i.e. you can assign them to their modal latent class, aka you can do modal class assignment. Presumably, you're trying to tabulate some characteristics by latent class membership. Now, depending on how good your indicators are (i.e. the variables you fed into the LCA), you may be more or less certain about the membership probabilities. This is quantified by the normalized entropy of the model, which is scaled from 0 to 1, where 1 is better. I would guess that above 0.8 is considered high, and 0.6 or so is low). The Penn State plugin should return this as the scalar r(EntropyRsqd). As a worked example, say that after a 3-class model, most observations have a membership probability vector looking something like (0.9, 0.05, 0.05), i.e. you're pretty sure which class they're in. That's high entropy. If they all tend to look something like (0.45, 0.28, 0.27), you're a lot less certain. If you somehow came out with everyone having a vector of (1/3, 1/3, 1/3), that would be an entropy of 0, and it would mean that the indicators tell you absolutely nothing about which latent class each person is in. I don't think the model would even converge in that case, so I'm offering this just as an extreme example.

      Anyway, if you have a high entropy situation, then if you did modal class assignment, you're making an assumption that is technically wrong but not terribly wrong. People who are real technical experts on LCA might still object. It has been shown that modal class assignment will bias any relationships you get when you tabulate stuff by class membership. It's also, I believe, been shown that probabilistic or random assignment (meaning you do multiple random draws and then you work multiple imputation style to do your tabulations) are still biased. There's been work done to overcome this, which base Stata and I believe the PSU plugin don't implement. If you're interested in this, you could try searching for work by Jeroen Vermunt, but I find the math hard to understand.

      After that major caveat, let's get back to your question. I don't have the plugin installed. However, if you go through the documentation for version 1.2.1, it seems to indicate that after your model converges, the plugin should automatically add variables for each posterior class probability and the modal class assignment, called BestIndex, to your dataset. In base Stata, this is something you'd do with the predict command post-estimation. Are those variables present? If not, which version of the plugin are you using?

      In the plugin, you could also add 20 pseudo-class draws with the plugin's seeddraws() option (you just specify a random number seed in there). If you wanted to use them, I believe you would need to use the multiple imputation commands to manually declare the variables as mi data.
      Please use the code delimiters to show code and results - use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Please use the command -dataex- to show a representative sample of data; it is installed already if you have Stata 14.2 or 15.1, else you can install it by typing

      Code:
      ssc install dataex

      Comment


      • #4
        Dear William, dear Weiwen,

        thank you for your replies. As you both rightfully assumed, I intend to assign every observation based on their modal latent class. Sorry for my unclear statements. I am aware that this comes with possible biases and errors and am familiar with the concept of normalized entropy.
        I checked the models again and I´m a bit embarrassed to admit that I did not see the Best_Index variable. I just assumed it was necessary to compute the modal class assignment manually, as when using the gsem command. Anyway, this should solve my problem.
        Thanks again for your help. I can also say that Weiwen´s responses in other threads in this forum have helped me quite a bit in learning to use LCA with Stata,

        William, you were right about my link. In case you are still interested in the plugin, here is the right link: https://www.latentclassanalysis.com/...-stata-plugin/

        Comment


        • #5
          Originally posted by Dominik Harder View Post
          Dear William, dear Weiwen,

          thank you for your replies. As you both rightfully assumed, I intend to assign every observation based on their modal latent class. Sorry for my unclear statements. I am aware that this comes with possible biases and errors and am familiar with the concept of normalized entropy.
          I checked the models again and I´m a bit embarrassed to admit that I did not see the Best_Index variable. I just assumed it was necessary to compute the modal class assignment manually, as when using the gsem command. Anyway, this should solve my problem.
          Thanks again for your help. I can also say that Weiwen´s responses in other threads in this forum have helped me quite a bit in learning to use LCA with Stata,

          William, you were right about my link. In case you are still interested in the plugin, here is the right link: https://www.latentclassanalysis.com/...-stata-plugin/
          Thanks for the kind words. Don't worry about missing the Best_Index variable - this is a complex method, and the PSU documentation is quite long and the options are more numerous than the bas Stata options. Plus adding these variables is something that base Stata users would normally think of as a post-estimation thing that you need to do manually.
          Please use the code delimiters to show code and results - use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Please use the command -dataex- to show a representative sample of data; it is installed already if you have Stata 14.2 or 15.1, else you can install it by typing

          Code:
          ssc install dataex

          Comment

          Working...
          X