Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Known groups in LCA

    Hi all,

    I am still quite new to this so maybe it is just a simple problem. I am using Stata 16 at a mac.

    I have done an LCA with 6 variables and 3 classes.
    Code:
     gsem (ancestry culture lang  jobs pride civic  <-), logit lclass(C 3),
    I now try to do the same thing but testing the results over different regions. I keep receiving the error message
    Code:
    quietly gsem (ancestry culture lang  jobs pride civic  <-), over(land) logit lclass(C > 3)
    option over() not allowed
    r(198);
    Where am I supposed to ad the categorical variable here? What am I doing wrong here?

    All help is very appreciated!
    Last edited by David Isaksson; 28 Mar 2020, 09:53.

  • #2
    Subjectively, it seems to me like there are a bunch of people who are on lockdown and are turning to LCA, perhaps to relieve boredom? But this could be surveillance bias on my part.

    More seriously, define "testing the results over different regions." I assume you mean that you want to see if the coefficients vary over different regions? If that's it, you actually wanted the group option; there's no over option in gsem. However, you will need to tell Stata which parameters are allowed to vary over groups, as defined in the manual here. If this is what you actually want, then I believe that as per this post, you'd want to allow the constants to vary. The default is that constants, coefficients, and loading parameters are constrained equal. I think that in this context, the constants are the class-specific response probabilities to each item, e.g. the probability of endorsing ancestry in each of the 3 latent classes. So, if I'm right, you'd type this:

    Code:
    gsem (ancestry culture lang jobs pride civic <-), group(land) ginvariant(coef loading) logit lclass(C > 3)
    Then, you should be able to inspect the parameters by each group.

    Be aware, however, that in your model, you are fitting 6 parameters per latent class (for 6 response variables), plus K - 1 parameters for the multinomial model describing the probability of being in each latent class (here, that's 3 classes minus 1 = 2 parameters). If you try this for x groups, I think that you're multiplying the number of parameters by x. This could rapidly produce identification issues. Or you could have convergence issues.

    I've got no direct experience with this technique, and I haven't done any reading on it yet, so I can't offer any sound theoretically-based guidance, just my gut instincts.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thank you Weiwen! It seems to work and you interpret my question correctly. However, when I want to compare the lc prob between the regions I get this error message:
      Code:
      estat lcprob, group(land)
      option group() not allowed
      r(198);
      How is this supposed to be coded? I cannot find any information regarding this issue anywhere. I am very thankful for all help.

      Comment


      • #4
        Originally posted by David Isaksson View Post
        Thank you Weiwen! It seems to work and you interpret my question correctly. However, when I want to compare the lc prob between the regions I get this error message:
        Code:
        estat lcprob, group(land)
        option group() not allowed
        r(198);
        How is this supposed to be coded? I cannot find any information regarding this issue anywhere. I am very thankful for all help.
        What does estat lcprob alone do?
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          It reports the latent class probabilities. How the distribution over the classes looks. I want to compare the distribution between the regions.

          Comment


          • #6
            Originally posted by David Isaksson View Post
            It reports the latent class probabilities. How the distribution over the classes looks. I want to compare the distribution between the regions.
            Hmm. This is not what I experienced. I have Stata 16 on my machine. I asked it to revert to version 15 using the version command, and it also reported the class prevalences over each of the values in group.

            Going off my old code in the other thread, we're going to create a fictitious version of the dataset used in SEM example 50. That dataset had 4 indicators: accident, play, insurance, and stock. The example has 4 indicators that basically ask if you'd favor your friend's interest over broader societal/legal interests (e.g. would you testify against your friend in an accident case?). My fictitious adaptation adds gender to the equation. In this randomly generated dataset, 40% of the sample are men. There are 2 latent classes: particularistic (i.e. you favor your friend) and universalistic (i.e. you favor society's interest in honesty). Men are less likely to belong to the universalistic latent class (28% of them are in this class, vs. 42% of women). I'm modifying things vis a vis my example data in the other post to simplify things (in that post, women had different response probabilities as well).

            Code:
            clear
            set seed 4142
            set obs 1000
            gen male = rbinomial(1,0.4)
            gen class = rbinomial(1,0.42) + 1 if male == 0
            replace class = rbinomial(1,0.28) + 1 if male == 1
            gen accident = .
            gen play = .
            gen insurance = .
            gen stock = .
            
            /*Latent class #1, particularistic*/
            replace accident = rbinomial(1,0.71) if class == 1
            replace play = rbinomial(1,0.33) if class == 1
            replace insurance = rbinomial(1,0.35) if class == 1
            replace stock = rbinomial(1,0.13) if class == 1
            
            /*Latent class #2, universalistic*/
            replace accident = rbinomial(1,0.92) if class == 2
            replace play = rbinomial(1,0.71) if class == 2
            replace insurance = rbinomial(1,0.93) if class == 2
            replace stock = rbinomial(1,0.82) if class == 2
            
            /*Optional if you're on version 15 of Stata*/ version 15
            quietly gsem (accident play insurance stock <-, logit), lclass(C 2) group(male) byparm ginvariant(coef loading) nonrtolerance
            estat lcprob, nose
            Latent class marginal probabilities             Number of obs     =      1,000
            
            --------------------------------------------------------------
                    male |     Margin
            -------------+------------------------------------------------
            0            |
                       C |
                      1  |    .623417
                      2  |    .376583
            -------------+------------------------------------------------
            1            |
                       C |
                      1  |    .728638
                      2  |    .271362
            --------------------------------------------------------------
            So, for each of your known groups, Stata should report the distribution of latent classes, e.g. 37.7% of women and 27.1% of men are in latent class #2, the universalistic class (if you delete the nose option, you'll see that the CI does contain the simulated probabilities of 42% for women, 28% for men).

            Please clarify if this isn't what you were looking for.
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #7
              Thank you so much! This was exactly what I needed and it worked just fine. Problem solved.

              Comment


              • #8
                Originally posted by David Isaksson View Post
                Thank you so much! This was exactly what I needed and it worked just fine. Problem solved.
                You're welcome!

                Just for info for others who read this: the term "known groups" references the fact that the grouping variable here is known/observed (whereas the latent class is categorical, but not directly observed). This page on the UCLA site discusses this method in the context of MPlus, and is does verify that I correctly deduced David's intentions (that is, do the response probabilities vary over the values of the known group).

                Another thing you could do with latent class and some sort of observed variable is that you could say, I don't have any strong a priori reason to think that the response probabilities vary, but I would like to know if the observed variable is associated with membership in some of the latent classes. For example, you might wonder if men were more or less likely to belong to the universalistic class (sticking with the fictitious example above). In the example, we know this to be the case, because that's what I simulated the data. But in real life, we don't know this. We might use latent class regression to examine this question.

                Code:
                gsem (accident play insurance stock <-, logit) (C <- male), lclass(C 2)
                margins, predict(classpr class(1)) predict(classpr class(2)) over(male)
                Predictive margins                              Number of obs     =      1,000
                Model VCE    : OIM
                
                over         : male
                1._predict   : Predicted probability (1.C), predict(classpr class(1))
                2._predict   : Predicted probability (2.C), predict(classpr class(2))
                
                -------------------------------------------------------------------------------
                              |            Delta-method
                              |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
                --------------+----------------------------------------------------------------
                _predict#male |
                         1 0  |   .5989598   .0413331    14.49   0.000     .5179485    .6799712
                         1 1  |   .7381489   .0372952    19.79   0.000     .6650517    .8112461
                         2 0  |   .4010402   .0413331     9.70   0.000     .3200288    .4820515
                         2 1  |   .2618511   .0372952     7.02   0.000     .1887539    .3349483
                -------------------------------------------------------------------------------
                To reiterate, this model estimated that women had a 40.1% probability of being in class 2, and men had a 26.2% probability of being in class 2. The confidence intervals for these probabilities contain the simulated probabilities.
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment

                Working...
                X