Intercept in latent profile analysis

Andrea Baldin

Join Date: Feb 2016

Posts: 35
#1

Intercept in latent profile analysis

06 Feb 2019, 10:35

Dear Stata users.
I have a question, maybe more related to the theory so please tell me if I am off-topic.
I was just wondering: what does it change if I remove the intercept that I have in each class-membership function in a latent profile analysis? And most important, when do you suggest to remove it (or constraing the intercepts to be equal across classes?).
Finally, what is the Stata command that allows me to do this? Thank you
Tags: gsem, intercept, latent class, latent profile
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

06 Feb 2019, 11:29

Andrea,

As far as I'm concerned, theory questions are on topic. If the theory is too obscure, then nobody may be able to respond. However, I'm not clear what you're asking. To recap from SEM example 52, we're taking 3 indicators and fitting a model like below:

glucose = a1k + e.glucose
insulin = a2k + e.insulin
sspg = a3k + e.sspg

Where a is the intercept, the first digit after a indexes the 3 indicators, and the second digit indexes the latent classes.

I don't think you can remove those intercept. They denote the mean level of each indicator in each class.

If you just meant to omit the _cons from the gsem command, then yes, it looks like you can, and it makes no difference:

Code:

use http://www.stata-press.com/data/r15/gsem_lca2 quietly gsem (glucose insulin sspg <- _cons), lclass(C 2) lcinvariant(none) est store c2variant quietlygsem (glucose insulin sspg <- ), lclass(C 2) lcinvariant(none) est store c2variant_nointercept est table c2variant* ---------------------------------------- Variable | c2variant c2varian~t -------------+-------------------------- 1b.C | _cons | (omitted) (omitted) -------------+-------------------------- 2.C | _cons | -.236545 -.236545 -------------+-------------------------- glucose | C | 1 | 35.987969 35.987969 2 | 77.638 77.638 -------------+-------------------------- insulin | C | 1 | 16.519601 16.519601 2 | 21.262161 21.262161 -------------+-------------------------- sspg | C | 1 | 11.179191 11.179191 2 | 27.594687 27.594687 -------------+-------------------------- var(e.gluc~e)| C | 1 | 22.626931 22.626931 2 | 1263.401 1263.401 var(e.insu~n)| C | 1 | 26.366033 26.366033 2 | 283.27753 283.27753 var(e.sspg)| C | 1 | 25.260446 25.260446 2 | 70.493577 70.493577 ----------------------------------------

Constraining the intercepts to be the same across classes makes no sense. It would be like asking Stata, please fit a 2-class model, but constrain the means of each indicator to be equal. That would eliminate the point of fitting a latent profile model. That would mean there's no heterogeneity in the saple.

Or did you mean to constrain the error variances to be equal across classes? (Note, they are constrained by default unless you invoke the lcinvariant(...) option.) I like to think about latent profile analysis as taking a magic elliptical cookie cutter, and you are taking k stamps out of a (multidimensional) sheet of cookie dough. If you constrain the error variances to be equal across classes, it's like you're taking equal-sized stamps each time. If you don't constrain the error variances to be equal, your cookie cutter will re-size itself between stamps.

Honestly, I'm not sure why the identity covariance structure (across classes, all errors have equal variance, all error terms have zero covariance) is default. It seems very restrictive. In the R package flexmix, which looks like it offers a close parallel to Stata's capabilities in gsem, the only options for the covariance structure appear to be diagonal (across classes, all error variances unrestricted, all error covariances zero) and full or unstructured (all error variances and covariances distinctly estimated).

As a side note, figure 6 in this document about flexmix shows a nice illustration of what happens when you fit a model with a diagonal versus unstructured covariance. It's harder to illustrate this in Stata because there isn't a convenient way to draw circles corresponding to the class-specific means and variances on a scatterplot.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Andrea Baldin

Join Date: Feb 2016

Posts: 35
#3

06 Feb 2019, 13:13

Thank you Weiwen for your reply.
I mean the intercept in the membership function, when I add covariates. But I think your reasoning holds also in this case. I just remember that the software Latent Gold allows for this option and in different (ma not all) papers that apply the latent profile analysis, the intercept is not included among the results
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

06 Feb 2019, 13:39

Originally posted by Andrea Baldin View Post

Thank you Weiwen for your reply.
I mean the intercept in the membership function, when I add covariates. But I think your reasoning holds also in this case. I just remember that the software Latent Gold allows for this option and in different (ma not all) papers that apply the latent profile analysis, the intercept is not included among the results

Ah, I see I misunderstood. You're talking about the multinomial part of the model, the one that predicts class membership. There are no predictors entered in SEM example 52, but you can obviously enter covariates as predictors of membership in a latent class.

However, I'm still not sure what you mean by omitting the intercepts from the multinomial model. I don't think the multinomial model works at all if there are no intercepts. The intercepts control the proportion of the sample that's in each latent class. If a paper omitted presenting the intercepts, the latent class/profile model would still have estimated them behind the scenes. If you had constrained the intercepts to be equal across all classes, you'd be telling Stata to operate under the constraint that the proportions of each latent class are equal, which is not something I have ever seen anybody do.

Per the manual, the probability of being in latent class 1 is:

P(C = 1) = exp(gamma1) / [exp(gamma1) + exp(gamma2)]

where gamma-c is the intercept for the c-th latent class, and gamma1 = 0 because it's the base class.

So, you can verify for yourself from the table above that, by the formula, P(C = 1) = 1 / [1 + exp(-.236545)] = 0.5586204. Or you can use the appropriate postestimation command:

Code:

estat lcprob Latent class marginal probabilities Number of obs = 145 -------------------------------------------------------------- | Delta-method | Margin Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ C | 1 | .558862 .0445136 .4706988 .6434637 2 | .441138 .0445136 .3565363 .5293012 --------------------------------------------------------------

I guess this is to show that the multinomial intercepts are an essential part of the model, even if they don't make sense on their face and even if they weren't presented in a table.

If all you wanted was to export your results to Excel without the multinomial intercepts, you could use coefplot (avail. on SSC) and the drop option:

Code:

estout ., drop(1b.C:* 2.C:*) ------------------------- . b ------------------------- glucose 1.C 35.98797 2.C 77.638 ------------------------- insulin 1.C 16.5196 2.C 21.26216 ------------------------- sspg 1.C 11.17919 2.C 27.59469 ------------------------- / var(e.gluc~C 22.62693 var(e.gluc~C 1263.401 var(e.insu~C 26.36603 var(e.insu~C 283.2775 var(e.sspg~C 25.26045 var(e.sspg~C 70.49358 -------------------------

Last edited by Weiwen Ng; 06 Feb 2019, 13:50.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Weiwen Ng

Join Date: Jun 2015
Posts: 1241

06 Feb 2019, 13:55

Originally posted by Andrea Baldin View Post

...(or constraing the intercepts to be equal across classes?).
...

I've noted that I haven't seen anybody constrain the multinomial intercepts to be equal across classes (i.e. constraining the class proportions to be equal), and I can't really think of a good reason to do this, but in general, gsem accepts constraints:

Code:

constraint 1 [2.C]_cons = [1.C]_cons
quietly gsem (glucose insulin sspg <- _cons), lclass(C 2) lcinvariant(none) constraint(1) nolog
estat lcprob
--------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
           C |
          1  |         .5          .             .           .
          2  |         .5          .             .           .
--------------------------------------------------------------

estimates table
---------------------------
    Variable |   active    
-------------+-------------
1b.C         |
       _cons |  (omitted)  
-------------+-------------
2.C          |
       _cons |          0  
-------------+-------------
glucose      |
           C |
          1  |  35.918484  
          2  |   76.94573  
-------------+-------------
insulin      |
           C |
          1  |  16.486854  
          2  |  21.213746  
-------------+-------------
sspg         |
           C |
          1  |  11.037997  
          2  |  27.461208  
-------------+-------------
var(e.gluc~e)|
           C |
          1  |  21.977333  
          2  |  1265.8426  
var(e.insu~n)|
           C |
          1  |  26.146252  
          2  |  278.78742  
  var(e.sspg)|
           C |
          1  |  23.972947  
          2  |  70.536606  
---------------------------

Again, this just for general info, and I can't see this going over well with reviewers.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

Announcement

Intercept in latent profile analysis

Comment

Comment

Comment

Comment