Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intercept in latent profile analysis

    Dear Stata users.
    I have a question, maybe more related to the theory so please tell me if I am off-topic.
    I was just wondering: what does it change if I remove the intercept that I have in each class-membership function in a latent profile analysis? And most important, when do you suggest to remove it (or constraing the intercepts to be equal across classes?).
    Finally, what is the Stata command that allows me to do this? Thank you

  • #2
    Andrea,

    As far as I'm concerned, theory questions are on topic. If the theory is too obscure, then nobody may be able to respond. However, I'm not clear what you're asking. To recap from SEM example 52, we're taking 3 indicators and fitting a model like below:

    glucose = a1k + e.glucose
    insulin = a2k + e.insulin
    sspg = a3k + e.sspg

    Where a is the intercept, the first digit after a indexes the 3 indicators, and the second digit indexes the latent classes.

    I don't think you can remove those intercept. They denote the mean level of each indicator in each class.

    If you just meant to omit the _cons from the gsem command, then yes, it looks like you can, and it makes no difference:

    Code:
    use http://www.stata-press.com/data/r15/gsem_lca2
    quietly gsem (glucose insulin sspg <- _cons), lclass(C 2) lcinvariant(none)
    est store c2variant
    quietlygsem (glucose insulin sspg <- ), lclass(C 2) lcinvariant(none)
    est store c2variant_nointercept
    est table c2variant*
    
    ----------------------------------------
        Variable | c2variant    c2varian~t  
    -------------+--------------------------
    1b.C         |
           _cons |  (omitted)    (omitted)  
    -------------+--------------------------
    2.C          |
           _cons |   -.236545     -.236545  
    -------------+--------------------------
    glucose      |
               C |
              1  |  35.987969    35.987969  
              2  |     77.638       77.638  
    -------------+--------------------------
    insulin      |
               C |
              1  |  16.519601    16.519601  
              2  |  21.262161    21.262161  
    -------------+--------------------------
    sspg         |
               C |
              1  |  11.179191    11.179191  
              2  |  27.594687    27.594687  
    -------------+--------------------------
    var(e.gluc~e)|
               C |
              1  |  22.626931    22.626931  
              2  |   1263.401     1263.401  
    var(e.insu~n)|
               C |
              1  |  26.366033    26.366033  
              2  |  283.27753    283.27753  
      var(e.sspg)|
               C |
              1  |  25.260446    25.260446  
              2  |  70.493577    70.493577  
    ----------------------------------------
    Constraining the intercepts to be the same across classes makes no sense. It would be like asking Stata, please fit a 2-class model, but constrain the means of each indicator to be equal. That would eliminate the point of fitting a latent profile model. That would mean there's no heterogeneity in the saple.

    Or did you mean to constrain the error variances to be equal across classes? (Note, they are constrained by default unless you invoke the lcinvariant(...) option.) I like to think about latent profile analysis as taking a magic elliptical cookie cutter, and you are taking k stamps out of a (multidimensional) sheet of cookie dough. If you constrain the error variances to be equal across classes, it's like you're taking equal-sized stamps each time. If you don't constrain the error variances to be equal, your cookie cutter will re-size itself between stamps.

    Honestly, I'm not sure why the identity covariance structure (across classes, all errors have equal variance, all error terms have zero covariance) is default. It seems very restrictive. In the R package flexmix, which looks like it offers a close parallel to Stata's capabilities in gsem, the only options for the covariance structure appear to be diagonal (across classes, all error variances unrestricted, all error covariances zero) and full or unstructured (all error variances and covariances distinctly estimated).

    As a side note, figure 6 in this document about flexmix shows a nice illustration of what happens when you fit a model with a diagonal versus unstructured covariance. It's harder to illustrate this in Stata because there isn't a convenient way to draw circles corresponding to the class-specific means and variances on a scatterplot.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thank you Weiwen for your reply.
      I mean the intercept in the membership function, when I add covariates. But I think your reasoning holds also in this case. I just remember that the software Latent Gold allows for this option and in different (ma not all) papers that apply the latent profile analysis, the intercept is not included among the results

      Comment


      • #4
        Originally posted by Andrea Baldin View Post
        Thank you Weiwen for your reply.
        I mean the intercept in the membership function, when I add covariates. But I think your reasoning holds also in this case. I just remember that the software Latent Gold allows for this option and in different (ma not all) papers that apply the latent profile analysis, the intercept is not included among the results
        Ah, I see I misunderstood. You're talking about the multinomial part of the model, the one that predicts class membership. There are no predictors entered in SEM example 52, but you can obviously enter covariates as predictors of membership in a latent class.

        However, I'm still not sure what you mean by omitting the intercepts from the multinomial model. I don't think the multinomial model works at all if there are no intercepts. The intercepts control the proportion of the sample that's in each latent class. If a paper omitted presenting the intercepts, the latent class/profile model would still have estimated them behind the scenes. If you had constrained the intercepts to be equal across all classes, you'd be telling Stata to operate under the constraint that the proportions of each latent class are equal, which is not something I have ever seen anybody do.

        Per the manual, the probability of being in latent class 1 is:

        P(C = 1) = exp(gamma1) / [exp(gamma1) + exp(gamma2)]

        where gamma-c is the intercept for the c-th latent class, and gamma1 = 0 because it's the base class.

        So, you can verify for yourself from the table above that, by the formula, P(C = 1) = 1 / [1 + exp(-.236545)] = 0.5586204. Or you can use the appropriate postestimation command:

        Code:
        estat lcprob
        
        Latent class marginal probabilities             Number of obs     =        145
        
        --------------------------------------------------------------
                     |            Delta-method
                     |     Margin   Std. Err.     [95% Conf. Interval]
        -------------+------------------------------------------------
                   C |
                  1  |    .558862   .0445136      .4706988    .6434637
                  2  |    .441138   .0445136      .3565363    .5293012
        --------------------------------------------------------------
        I guess this is to show that the multinomial intercepts are an essential part of the model, even if they don't make sense on their face and even if they weren't presented in a table.

        If all you wanted was to export your results to Excel without the multinomial intercepts, you could use coefplot (avail. on SSC) and the drop option:

        Code:
        estout ., drop(1b.C:* 2.C:*)
        
        -------------------------
                                .
                                b
        -------------------------
        glucose                  
        1.C              35.98797
        2.C                77.638
        -------------------------
        insulin                  
        1.C               16.5196
        2.C              21.26216
        -------------------------
        sspg                    
        1.C              11.17919
        2.C              27.59469
        -------------------------
        /                        
        var(e.gluc~C     22.62693
        var(e.gluc~C     1263.401
        var(e.insu~C     26.36603
        var(e.insu~C     283.2775
        var(e.sspg~C     25.26045
        var(e.sspg~C     70.49358
        -------------------------
        Last edited by Weiwen Ng; 06 Feb 2019, 13:50.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Originally posted by Andrea Baldin View Post
          ...(or constraing the intercepts to be equal across classes?).
          ...
          I've noted that I haven't seen anybody constrain the multinomial intercepts to be equal across classes (i.e. constraining the class proportions to be equal), and I can't really think of a good reason to do this, but in general, gsem accepts constraints:

          Code:
          constraint 1 [2.C]_cons = [1.C]_cons
          quietly gsem (glucose insulin sspg <- _cons), lclass(C 2) lcinvariant(none) constraint(1) nolog
          estat lcprob
          --------------------------------------------------------------
                       |            Delta-method
                       |     Margin   Std. Err.     [95% Conf. Interval]
          -------------+------------------------------------------------
                     C |
                    1  |         .5          .             .           .
                    2  |         .5          .             .           .
          --------------------------------------------------------------
          
          estimates table
          ---------------------------
              Variable |   active    
          -------------+-------------
          1b.C         |
                 _cons |  (omitted)  
          -------------+-------------
          2.C          |
                 _cons |          0  
          -------------+-------------
          glucose      |
                     C |
                    1  |  35.918484  
                    2  |   76.94573  
          -------------+-------------
          insulin      |
                     C |
                    1  |  16.486854  
                    2  |  21.213746  
          -------------+-------------
          sspg         |
                     C |
                    1  |  11.037997  
                    2  |  27.461208  
          -------------+-------------
          var(e.gluc~e)|
                     C |
                    1  |  21.977333  
                    2  |  1265.8426  
          var(e.insu~n)|
                     C |
                    1  |  26.146252  
                    2  |  278.78742  
            var(e.sspg)|
                     C |
                    1  |  23.972947  
                    2  |  70.536606  
          ---------------------------
          Again, this just for general info, and I can't see this going over well with reviewers.
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment

          Working...
          X