Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • latent class analysis and sample size requirements

    Dear Stata users, I have a question to ask concerning latent class analysis (LCA). I am working with a sample of 57 American cities. I am using a total of 83 city-level variables to place my 57 cities into a set of latent classes (I am considering first using exploratory factor analysis to aggregate my 83 city-level variables into 7 or so factors/indexes). My goal is to study how different latent classes of American cities are associated with the overall happiness of city residents. I am concerned that my sample size of 57 cities might be inadequate to develop a stable set of latent classes. I would very much appreciate it if anyone could offer me advice concerning whether an LCA can be done with a sample of 57. Might it be the case that I should limit the number of latent classes I develop to a certain number?

    I am aware of some of the techniques (like BIC) used to establish what is the best number of latent classes to develop with LCA. I’m wondering if there is any way to get a sense of whether the outcomes of these techniques are stable and accurate.

    I plan on doing LCA with Stata 15's 'gsem lca.'

    I would very much appreciate your help, thank you in advance!

    Yours sincerely,
    Jason Settels

  • #2
    There's a thread on the MPlus discussion forum here, although only registered MPlus users can register and post. Basically, some of it will depend on how well-separated the classes are. If you have indicators where the cities are really distinct, that will help. On the other hand, if only a small number of cities have positive responses to some of the indicators, and your smallest latent class gets down to a very small number, that's probably not good. Bengt Muthen said that he'd run successful LCAs with as few as 30 subjects under the right conditions, so apparently it depends.

    I don't think you necessarily have to limit your number of classes a priori. Your model should cease to be well-identified if the data get too sparse (i.e. gsem won't converge). You should be aware that some of the other programs constrain the maximum value of logit parameters to +/- 15 as a default option - this means the class has nearly 100% or nearly 0% endorsement of one indicator. Stata does not do this. If you think that the situation is justified, you can go ahead and constrain the parameters yourself after viewing the initial result. In my experience, if you have a number of classes in this situation, Stata may not converge with the default convergence criteria; you might want to read this thread.

    The discussion suggests using a Monte Carlo study - that's probably beyond my ability to conceive of. Not sure how much that suggestion helps you!
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thank you very much Weiwen Ng! This helps a lot. I will carefully consider your advice. Best, Jason

      Comment

      Working...
      X