Comparing Finite Mixture Modeling Regression to Latent Class Regression

Red Owl

Join Date: Nov 2016

Posts: 127
#1

Comparing Finite Mixture Modeling Regression to Latent Class Regression

21 Sep 2018, 12:07

I would like to understand whether (and, if so, how) Finite Mixture Modeling regression differs from Latent Class regression with -gsem-.

In the code below, I first estimate starting values and then estimate an FMM regression (Model A) and a Latent Class regression (Model B), specifying 2 latent classes in each model. The variables score and x1-x4 are all continuous measures.

Code:

* Obtain a matrix of starting values for FMM regression (Model A below) quietly { fmm 2, vce(cluster id) difficult nonrtolerance startvalues(randomid, seed(1234567)): /// regress score x1-x4 matrix FMMb02 = e(b) } * Obtain a matrix of starting values for Latent Class regression (Model B below) quietly { gsem (score <- x1-x4) (C <- _cons), lclass(C 2) vce(cluster id) /// lcinvariant(none) covstructure(e._OEn, unstructured) difficult nonrtolerance matrix LCRegb02 = e(b) } * Model A: FMM Regression fmm 2, vce(cluster id) difficult from(FMMb02): regress score x1-x4 * Model B: Latent Class Regression gsem (score <- x1-x4) (C <- _cons), lclass(C 2) vce(cluster id) /// lcinvariant(none) covstructure(e._OEn, unstructured) difficult from(LCRegb02)

As coded above, are these two approaches and the estimating models equivalent?

I apologize that I cannot provide the data with dataex and I understand that might make it more difficult to respond to my request, but I would appreciate any insight or advice that can be offered based on the code I have provided above.

By the way, I have not been able to get Model A to resolve to a solution even with the -nonrtolerance- option specified unless I add the -nocons- option. I decided not to pursue that issue further until I have advice about whether my general approaches are equivalent and correctly coded.

Thanks in advance for any help or insight you will offer.

Red Owl
Stata/IC 15.1 (Windows 10, 64-bit)

* Edited to clarify that the variables are all continuous measures.

Last edited by Red Owl; 21 Sep 2018, 12:10.
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

21 Sep 2018, 13:44

Red,

In my understanding, LCA is a subset of FMM. If I am wrong about this I hope someone will correct me.

In FMM, you are telling Stata that y = XB + e differs in each of your k latent classes. You can fit FMMs to multiple ys and use the same xs, as demonstrated in example 54. Interpreting your model A, you think that maybe there are two groups of people who differ across values of the dependent variable score. You think that x1-x4 have a relationship to score, and that relationship differs by latent class.

In LCA, you are telling Stata that the mean values of y1, y2, y3, etc differ across each of the k latent classes. We would normally write x1, x2, x3, etc, I suspect, but here, they mean the same thing. For example, you might think that there are two groups of people who differ across values of x1-x4. Maybe some are low in all 4 variables, maybe some are high in (for example) x1 and x3 but low on x2 and x4, and some are high on all 4 variables.

I am not certain what your model B, as written, does. The first part is an FMM where the betas for the regression part differ across latent classes: (see the syntax replicated in SEM example 54)

Code:

gsem (score <- x1-x4), logit lclass(C 2) vce(cluster id)

(Note: I'm assuming x1-x4 are binary, and if they are, then -lcinvariant- and -covstructure- have no effect. Experiment if you like, but you'll see that. If they were Gaussian, then -lcinvariant- and -covstructure- definitely do have an effect.)

The second part,

Code:

gsem ... (C <- _cons), ...

is the one that throws me off. C is a multinomial variable. If you mean to write a latent class regression model where you think score has some relationship to the latent class, you would write

Code:

gsem (score <- x1-x4) (C <- score), lclass(C 2) vce(cluster id) lcinvariant(none) covstructure(e._OEn, unstructured)

Basically, that part of the model is written like you would write a multinomial logit model. I think the constant is already implied. The statement as you entered it may do nothing.

Last edited by Weiwen Ng; 21 Sep 2018, 13:50.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Weiwen Ng

Join Date: Jun 2015
Posts: 1241

21 Sep 2018, 15:00

The code below demonstrates that the latent classes identified by a latent class analysis can differ very significantly from those identified by an FMM. Say we take the dataset in SEM example 54 and we fit a latent profile model to it. We'll assume there are 3 latent classes, treating only Dr visits and other health professional visits (hpvisits) as indicators.

Code:

use http://www.stata-press.com/data/r15/gsem_mixture
gsem (drvisits hpvisits <- _cons), poisson lclass(C 3) startvalues(randomid, draws(5) seed(15))
matrix b = e(b)
estimates store lpm3
estat lcmean

Latent class marginal means                     Number of obs     =      3,677

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1            |
    drvisits |   11.19169   .2334491    47.94   0.000     10.73414    11.64924
    hpvisits |   21.55226   .4950087    43.54   0.000     20.58206    22.52246
-------------+----------------------------------------------------------------
2            |
    drvisits |   13.21406    .188319    70.17   0.000     12.84496    13.58315
    hpvisits |   2.224026   .0852777    26.08   0.000     2.056884    2.391167
-------------+----------------------------------------------------------------
3            |
    drvisits |   3.079365   .0551981    55.79   0.000     2.971178    3.187551
    hpvisits |    .596466   .0245997    24.25   0.000     .5482515    .6446804
------------------------------------------------------------------------------

Then, we fit the FMM specified in the example.

Code:

quietly gsem (drvisits hpvisits <- private medicaid c.age##c.age educ actlim chronic), poisson lclass(C 3) startvalues(randomid, draws(5) seed(15))
estimates store fmm3
estat lcmean
Latent class marginal means                     Number of obs     =      3,677

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1            |
    drvisits |   10.53552   .1875668    56.17   0.000      10.1679    10.90314
    hpvisits |   5.288516   .2145267    24.65   0.000     4.868051    5.708981
-------------+----------------------------------------------------------------
2            |
    drvisits |   10.61758   .1903746    55.77   0.000     10.24445    10.99071
    hpvisits |    16.8819   .3076167    54.88   0.000     16.27898    17.48482
-------------+----------------------------------------------------------------
3            |
    drvisits |   5.665389   .0735717    77.01   0.000     5.521191    5.809587
    hpvisits |   1.230097   .0325677    37.77   0.000     1.166265    1.293928
------------------------------------------------------------------------------

Here, it's very obvious that the maximum likelihood estimates for each model identify two very different latent classes.

Bonus content: Imagine we took the latent profile model, then we fit a latent class regression using predictors of the latent class that were specified in the FMM. Using the saved LPM parameters as intiial values produces what appears to be an infinitely advancing iteration log. Using the random IDs as start values, I get latent classes that look pretty similar to the ones identified by the original LPM, with some differences and in a rather different order.

Code:

gsem (drvisits hpvisits <- _cons, poisson) (C <-private medicaid c.age##c.age educ actlim chronic), lclass(C 3) from(b) iterate(1000)
gsem (drvisits hpvisits <- _cons, poisson) (C <-private medicaid c.age##c.age educ actlim chronic), lclass(C 3) startvalues(randomid, draws(5) seed(15))
estat lcmean, nose
Latent class marginal means                     Number of obs     =      3,677

------------------------------------------------------------------------------
             |     Margin
-------------+----------------------------------------------------------------
1            |
    drvisits |   2.935566
    hpvisits |   .5761258
-------------+----------------------------------------------------------------
2            |
    drvisits |   11.26341
    hpvisits |   21.28599
-------------+----------------------------------------------------------------
3            |
    drvisits |   12.77952
    hpvisits |   2.115511
------------------------------------------------------------------------------

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

Comment

Rafal Raciborski (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 83
#4

22 Sep 2018, 10:07

fmm is a wrapper for gsem. After running fmm, the underlying gsem syntax is stored in e(cmdline2):

Code:

di e(cmdline2)
1 like
Comment

Red Owl

Join Date: Nov 2016
Posts: 127

23 Sep 2018, 10:07

Respondong to Weiwen Ng :

Thank you so much for your prompt and very thoughtful response. (I apologize that I could not be equally prompt.)

Let me clarify that I was not trying to compare a latent profile analysis to a finite mixture model and also that my all of my variables are continuous.

I was not clear enough in what I was asking. On reflection, the question I was trying to address is whether it is possible to estimate a finite mixture model using -gsem- syntax. I was not able to provide even a sample of the data using dataex due to human subjects privacy protections, but I recognize that not providing a data example contributed to my lack of clarity.

I have now answered my own question and can demonstrate the results using Stata's auto.dta data set. The answer is that we can conduct finite mixture modeling with -gsem- syntax rather than using Stata's direct -fmm- command and syntax.

The following is the code I developed:

Code:

* STEP 1
* Load data.
sysuse auto, clear

**************************************************************
* First conduct comparison without a latent class covariate. *
**************************************************************

* STEP 2
* Conduct FMM via fmm with 2 latent classes.
fmm 2: regress price mpg weight foreign
*
matrix b0fmm = e(b)

* STEP 3a
* Conduct FMM via gsem to obtain starting values and store them in matrix b02.
quietly {
  gsem (price &lt;- mpg weight foreign) (C &lt;- _cons), lclass(C 2) lcinvariant(none) covstructure(e._OEn, unstructured) nonrtolerance
  *
  matrix b0gsem = e(b)
  }

* STEP 3b
* Conduct FMM via gsem using starting values from matrix b02.
gsem (price &lt;- mpg weight foreign) (C &lt;- _cons), lclass(C 2) lcinvariant(none) covstructure(e._OEn, unstructured) from(b0gsem)

* STEP 4
* Create and display matrix comparing b0fmm from b0gsem and gsem.
mat Compare1 = b0fmm', b0gsem'
mat colnames Compare1 = FMM GSEM
matlist Compare1, format(%9.3f)

************************************************************************
* Now conduct comparison soecifying rep78 as a latent class covariate. *
************************************************************************

* STEP 5
* Conduct FMM via fmm with 2 latent classes and rep78 as latent class covariate.
fmm 2, lcprob(rep78): regress price mpg weight foreign
*
matrix b0fmm = e(b)

* STEP 6a
* Conduct FMM via gsem with rep78 as latent class covariate
* to obtain starting values and store them in matrix b02.
quietly {
  gsem (price &lt;- mpg weight foreign) (C &lt;- rep78), lclass(C 2) lcinvariant(none) covstructure(e._OEn, unstructured)nonrtolerance
  *
  matrix b0gsem = e(b)
  }

* STEP 6b
* Conduct FMM via gsem with rep78 as latent class covariate.
* using starting values from matrix b02.
gsem (price &lt;- mpg weight foreign) (C &lt;- rep78), lclass(C 2) lcinvariant(none) covstructure(e._OEn, unstructured) from(b0gsem)

* STEP 7
* Create and display matrix comparing b0 from b0fmm and b0gsem.
mat Compare2 = b0fmm', b0gsem'
mat colnames Compare2 = FMM GSEM
matlist Compare2, format(%9.3f)

Thanks, as always, for your very thoughtful and helpful advice.

Responding to Rafal Raciborski (StataCorp) :

Thanks so much. I always learn a lot from your posts.

I was not aware that the -gsem- syntax was stored in e(cmdline2), and that is very good to know for the future. I don't see e(cmdline2) listed in the documentation, so I'm glad to learn about that. I do notice that the lcprob() option is not reflected in e(cmdline2). Is there a macro that stores the -gsem- syntax for that portion of the analysis? I know the syntax now, but I'm curious if that -gsem- syntax is also stored in a macro after -fmm-.

I could have saved some time if I had known about the e(cmdline2) macro left after -fmm-, but I probably learned more by working through the syntax on my own without realizing that -fmm- actually stores the -gsem- syntax in a macro.

Red Owl
Stata/IC 15.1 (Windows 10, 64-bit)

Comment

Rafal Raciborski (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 83
#6

23 Sep 2018, 13:32

Originally posted by Red Owl View Post

I do notice that the lcprob() option is not reflected in e(cmdline2).

That is a bug. We will fix it in a future update. The workaround for the time being is to use the hybrid syntax with lcprob() specified inside a component:

Code:

fmm : (regress price mpg weight foreign) (regress price mpg weight foreign, lcprob(rep78))
1 like
Comment
Red Owl

Join Date: Nov 2016

Posts: 127
#7

23 Sep 2018, 19:08

Rafal Raciborski (StataCorp)

Thanks.

Red Owl
Stata/IC 15.1 (Windows 10, 64-bit)
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

24 Sep 2018, 13:31

Red, thanks for clarifying! Perhaps my response will be useful to someone else. I should note, all of Stata's IRT commands are similarly wrappers for -gsem-, and the full -gsem- syntax is also stored in e(cmdline). Many IRT commands do involve a number of constraints, but I am not sure where they're stored.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement