Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Latent class analysis with gsem: "path from latent class variable to observed variable is not allowed"

    Dear Statalisters,

    I'm using Stata 15.1.

    I have 6 indicators of a latent binary variable (C2), and one predictor (txcond).

    If I write:

    gsem (var1_2 var2_2 var3_2 var4_2 var5_2 var6_2<-, regress)(C2 <-txcond), nocapslatent lclass (C2 2)

    the command works. If I want to make explicit that the 6 variables measure latent class C2,

    gsem (var1_2 var2_2 var3_2 var4_2 var5_2 var6_2<-C2, regress)(C2 <-txcond), nocapslatent lclass (C2 2)

    then I get the following error message: "invalid path specification;
    the path from latent class variable C2 to observed variable var1_2 is not allowed"


    I've noticed that the same problem doesn't occurr if I pretend my latent variable is continuous:

    gsem (var1_2 var2_2 var3_2 var4_2 var5_2 var6_2<-C2, regress)(C2 <-txcond), nocapslatent latent (C2)

    The reason why I want to make the relationship between the latent variable and its indicators explicit is that I'd like to actually build a model with 2 latent classes, each one with its own set of measures (but the same predictor), i.e. something like:

    gsem (var1_2 var2_2 var3_2 var4_2 var5_2 var6_2<-C2, regress)(var1_3 var2_3 var3_3 var4_3 var5_3 var6_3<-C3, regress)(C2 C3<-txcond), nocapslatent lclass (C2 2)lclass (C3 2)

    How can I do? I've seen the command "lclogit" is also available, but it seems to me it only allows for one latent (binary) outcome.

    Federico

  • #2
    Federico,

    The reason why I want to make the relationship between the latent variable and its indicators explicit is that I'd like to actually build a model with 2 latent classes, each one with its own set of measures (but the same predictor), i.e. something like:

    Code:
    gsem (var1_2 var2_2 var3_2 var4_2 var5_2 var6_2<-C2, regress) ///
    (var1_3 var2_3 var3_3 var4_3 var5_3 var6_3<-C3, regress) ///
    (C2 C3<-txcond), nocapslatent lclass (C2 2)lclass (C3 2)
    Actually, as described in the gsem manual on latent class syntax (bottom of page 35), here's how you'd accomplish something like that using a stock Stata dataset. (1: ... ) and (2: ... ) refer to each of the latent classes. Forgive me if this is obvious, but if you expand the number of classes, you would just add one set of brackets for each class. Here, I'm going off SEM example 52, which builds latent profile models based on blood glucose, insulin, and steady-state plasma glucose. In this example, I'm adding relative weight as an indicator for class 2 only. As shown in the output, class 2 has a mean and error variance for relative weight, but class 1 does not. Some output omitted.

    Code:
    use http://www.stata-press.com/data/r15/gsem_lca2
    gsem (1: glucose insulin sspg <- _cons) (2: glucose insulin sspg relwgt <- _cons), lclass(C 2)
    
    Class          : 1
    
    Response       : glucose
    Family         : Gaussian
    Link           : identity
    
    Response       : insulin
    Family         : Gaussian
    Link           : identity
    
    Response       : sspg
    Family         : Gaussian
    Link           : identity
    
    --------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    glucose        |
             _cons |    41.1155   1.271729    32.33   0.000     38.62296    43.60805
    ---------------+----------------------------------------------------------------
    insulin        |
             _cons |   21.00523   .9996737    21.01   0.000     19.04591    22.96456
    ---------------+----------------------------------------------------------------
    sspg           |
             _cons |   14.95387   .6886884    21.71   0.000     13.60407    16.30368
    ---------------+----------------------------------------------------------------
     var(e.glucose)|   188.6979   22.69097                      149.0767    238.8494
     var(e.insulin)|   118.8307   13.96669                      94.38067    149.6146
        var(e.sspg)|   56.23344   6.670097                      44.56869    70.95115
    --------------------------------------------------------------------------------
    
    Class          : 2
    
    Response       : glucose
    Family         : Gaussian
    Link           : identity
    
    Response       : insulin
    Family         : Gaussian
    Link           : identity
    
    Response       : sspg
    Family         : Gaussian
    Link           : identity
    
    Response       : relwgt
    Family         : Gaussian
    Link           : identity
    
    --------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    glucose        |
             _cons |   115.4334   2.766208    41.73   0.000     110.0117    120.8551
    ---------------+----------------------------------------------------------------
    insulin        |
             _cons |   7.576112   2.147696     3.53   0.000     3.366705    11.78552
    ---------------+----------------------------------------------------------------
    sspg           |
             _cons |   34.40495   1.493472    23.04   0.000      31.4778    37.33211
    ---------------+----------------------------------------------------------------
    relwgt         |
             _cons |   .9795043   .0230807    42.44   0.000     .9342669    1.024742
    ---------------+----------------------------------------------------------------
     var(e.glucose)|   188.6979   22.69097                      149.0767    238.8494
     var(e.insulin)|   118.8307   13.96669                      94.38067    149.6146
        var(e.sspg)|   56.23344   6.670097                      44.56869    70.95115
      var(e.relwgt)|   .0136118   .0038029                      .0078724    .0235356
    --------------------------------------------------------------------------------
    The syntax you specified is actually telling Stata that there are two latent multinomial variables out there, C2 and C3. I don't know what that's called, but I'm going to call it multidimensional latent class analysis. I've not yet seen this done in practice. If this is what you want to do, look at the end of page 36 of the SEM manual for the correct syntax.

    I have no idea why the syntax you describe earlier in your post is invalid. As far as I know, it is what it is, and I treat it as a quirk of gsem. I have not seen any papers which use different sets of indicators for different latent classes (but I'm not a super-expert on this topic). And you seem to be using different versions of the same underlying variables in each latent class? I gently urge you to consider the substantive implications for your model. Ignore me if you've done so already.

    For the record, if you compare the means on the 3 common indicators from my model to the 2-class model fit in example 52, they're very, very similar. If you took that model, did modal class assignment (i.e. assign class membership based on most likely class) and took a mean of relative weight over those two classes, the relative weight is very, very similar. It doesn't really play into a 2-class solution. If you added relwgt as an indicator for both classes, you'd see that it doesn't separate the two classes at all (i.e. the class-specific means of relwgt in that 2-class model are very similar). Results may differ in your proposed use case.
    Last edited by Weiwen Ng; 26 Jul 2019, 11:08.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thank-you very much Weiven for your reply.
      I've started to use the code delimiter in my posts. Are you referring to this version: https://www.stata.com/manuals13/sem.pdf of the SEM manual or to another one?

      Comment


      • #4
        Originally posted by Federico Tedeschi View Post
        Thank-you very much Weiven for your reply.
        I've started to use the code delimiter in my posts. Are you referring to this version: https://www.stata.com/manuals/sem.pdf of the SEM manual or to another one?
        Actually, use this link: https://www.stata.com/manuals/sem.pdf

        Latent class analysis was only implemented in Stata 15.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Thank-you. The proper code is then:
          Code:
          gsem (1.C2:varlist2<-, regress)(2.C2:varlist2<-, regress)(1.C3:varlist3<-, regress)(2.C3:varlist3<-, regress)(C2 C3 <-txcond), nocapslatent lclass (C2 2) lclass (C3 2)

          Comment


          • #6
            Originally posted by Federico Tedeschi View Post
            Thank-you. The proper code is then:
            Code:
            gsem (1.C2:varlist2<-, regress)(2.C2:varlist2<-, regress)(1.C3:varlist3<-, regress)(2.C3:varlist3<-, regress)(C2 C3 <-txcond), nocapslatent lclass (C2 2) lclass (C3 2)
            Let's clarify something.

            A latent class is an unobserved categorical variable. In LCA, we hypothesize that this categorical variable causes varying responses to the indicators specified. In my specification, the same single latent variable is causing responses to varlist 2 and varlist 3. I had assumed that varlist 3 is just a differently coded version of varlist 2.

            Your code is saying that there are two different categorical variables (you start them with 2 categories each). You say that one categorical variable causes responses to varlist 2. A different categorical variable causes responses to varlist 3. So, for example, an observation can belong to class 1 in C2 and class 2 in C3. Maybe your varlists are completely different things, and that would make more sense if they were.

            I can't advise in depth, as I have never done anything with multiple latent classes.
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #7
              Yes, exactly: I have two different categorical variables, each having two categories.
              You say that one categorical variable causes responses to varlist 2. A different categorical variable causes responses to varlist 3.
              Yes, but I prefer to put it in these terms: one categorical variable is measured by the set of indicators "varlist2", while the other is measured by the set of indicators "varlist3".
              So, for example, an observation can belong to class 1 in C2 and class 2 in C3.
              For sure: 4 combinations are possible,

              Federico

              Comment


              • #8
                Originally posted by Federico Tedeschi View Post
                Yes, exactly: I have two different categorical variables, each having two categories.

                Yes, but I prefer to put it in these terms: one categorical variable is measured by the set of indicators "varlist2", while the other is measured by the set of indicators "varlist3".

                For sure: 4 combinations are possible,

                Federico
                Then I think your syntax is correct. My apologies for mis-reading.
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment

                Working...
                X