Latent class analysis: marginal predicted probabilities vs marginal predicted posterior probabilities and estat vs predict

Josephine George

Join Date: Dec 2018

Posts: 34
#1

Latent class analysis: marginal predicted probabilities vs marginal predicted posterior probabilities and estat vs predict

16 Mar 2021, 04:56

I have a latent class model that I'm broadly happy with. I want to be able to say that x% of the sample is in class 1, y % of the sample is in class 2, etc.
Previously I have gotten these summary statistics using:

Code:

estat lcprob, nose

Following posts elsewhere on these boards about calculating entropy in these models, I ran

Code:

predict pr*, classposteriorpr sum pr1-pr4

and noticed these are a bit different from the results for the earlier code, for example, 24% vs 27% in one group.

Code:

estat lcprob, nose classposteriorpr

produces results quite close, but not identical to, those produced by the default estat lcprob specification, for example, 27.XYA% vs 27.XYB%.

My reading of the manuals doesn't get me much closer to understanding what predict and estat are doing differently. I'd appreciate 1. guidance on which command I should use to generate summary statistics, and 2. a pointer to anything I can read to make sure I understand this.
Tags: None

Joseph Luchman

Join Date: Mar 2014
Posts: 114

16 Mar 2021, 07:30

Hi Josephine,

Take this example:

Code:

. sysuse auto

. gsem ( price mpg <- ), lclass(C 3)

Fitting class model:

Iteration 0:   (class) log likelihood = -81.283734  
Iteration 1:   (class) log likelihood = -81.283734  

Fitting outcome model:

Iteration 0:   (outcome) log likelihood = -865.61911  
Iteration 1:   (outcome) log likelihood = -865.61368  
Iteration 2:   (outcome) log likelihood = -865.61368  

Refining starting values:

Iteration 0:   (EM) log likelihood =  -944.0659
Iteration 1:   (EM) log likelihood = -924.15751
Iteration 2:   (EM) log likelihood =  -909.2925
Iteration 3:   (EM) log likelihood = -904.22959
Iteration 4:   (EM) log likelihood = -901.28907
Iteration 5:   (EM) log likelihood = -899.18989
Iteration 6:   (EM) log likelihood = -897.74984
Iteration 7:   (EM) log likelihood = -896.82187
Iteration 8:   (EM) log likelihood = -896.24632
Iteration 9:   (EM) log likelihood = -895.89329
Iteration 10:  (EM) log likelihood = -895.67485
Iteration 11:  (EM) log likelihood = -895.53711
Iteration 12:  (EM) log likelihood = -895.44837
Iteration 13:  (EM) log likelihood = -895.39004
Iteration 14:  (EM) log likelihood = -895.35094
Iteration 15:  (EM) log likelihood = -895.32452
Iteration 16:  (EM) log likelihood = -895.30639
Iteration 17:  (EM) log likelihood = -895.29385
Iteration 18:  (EM) log likelihood = -895.28508
Iteration 19:  (EM) log likelihood = -895.27893
Iteration 20:  (EM) log likelihood =  -895.2746
Note: EM algorithm reached maximum iterations.

Fitting full model:

Iteration 0:   log likelihood = -887.49716  
Iteration 1:   log likelihood = -887.49715  

Generalized structural equation model           Number of obs     =         74
Log likelihood = -887.49715

 ( 1)  [/]var(e.price)#1bn.C - [/]var(e.price)#3.C = 0
 ( 2)  [/]var(e.price)#2.C - [/]var(e.price)#3.C = 0
 ( 3)  [/]var(e.mpg)#1bn.C - [/]var(e.mpg)#3.C = 0
 ( 4)  [/]var(e.mpg)#2.C - [/]var(e.mpg)#3.C = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.C          |  (base outcome)
-------------+----------------------------------------------------------------
2.C          |
       _cons |   1.509649   .3438539     4.39   0.000     .8357078     2.18359
-------------+----------------------------------------------------------------
3.C          |
       _cons |  -.1657982   .5160308    -0.32   0.748      -1.1772    .8456037
------------------------------------------------------------------------------

Class          : 1

Response       : price
Family         : Gaussian
Link           : identity

Response       : mpg
Family         : Gaussian
Link           : identity

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
price        |
       _cons |   12191.51   454.2355    26.84   0.000     11301.22    13081.79
-------------+----------------------------------------------------------------
mpg          |
       _cons |   15.67828   1.247806    12.56   0.000     13.23263    18.12394
-------------+----------------------------------------------------------------
 var(e.price)|    1726442   330010.8                       1186982     2511076
   var(e.mpg)|   12.87172   2.632581                       8.62075    19.21888
------------------------------------------------------------------------------

Class          : 2

Response       : price
Family         : Gaussian
Link           : identity

Response       : mpg
Family         : Gaussian
Link           : identity

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
price        |
       _cons |   5189.398   205.3997    25.26   0.000     4786.822    5591.974
-------------+----------------------------------------------------------------
mpg          |
       _cons |   20.56298   .5967426    34.46   0.000     19.39339    21.73257
-------------+----------------------------------------------------------------
 var(e.price)|    1726442   330010.8                       1186982     2511076
   var(e.mpg)|   12.87172   2.632581                       8.62075    19.21888
------------------------------------------------------------------------------

Class          : 3

Response       : price
Family         : Gaussian
Link           : identity

Response       : mpg
Family         : Gaussian
Link           : identity

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
price        |
       _cons |   4264.501   432.0495     9.87   0.000       3417.7    5111.303
-------------+----------------------------------------------------------------
mpg          |
       _cons |   31.85174   1.693221    18.81   0.000     28.53309     35.1704
-------------+----------------------------------------------------------------
 var(e.price)|    1726442   330010.8                       1186982     2511076
   var(e.mpg)|   12.87172   2.632581                       8.62075    19.21888
------------------------------------------------------------------------------

The default behavior of -estat lcprob- is to use the -classpr- option which produces:

Code:

. estat lcprob, classpr

Latent class marginal probabilities             Number of obs     =         74

--------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
           C |
          1  |   .1569277   .0444545      .0878773    .2645015
          2  |   .7101204   .0640861      .5709634    .8184907
          3  |   .1329519   .0525122      .0590814    .2724409
--------------------------------------------------------------

These values are based on model parameters - specifically these:

Code:

...
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.C          |  (base outcome)
-------------+----------------------------------------------------------------
2.C          |
       _cons |   1.509649   .3438539     4.39   0.000     .8357078     2.18359
-------------+----------------------------------------------------------------
3.C          |
       _cons |  -.1657982   .5160308    -0.32   0.748      -1.1772    .8456037
------------------------------------------------------------------------------
...

Which you can pull through a multinomial logit-function to reproduce the probabilities above.

Code:

. di 1/(1+exp(1.509649)+exp(-.1657982))
.15692775

. di exp(1.509649)/(1+exp(1.509649)+exp(-.1657982))
.71012037

. di exp(-.1657982)/(1+exp(1.509649)+exp(-.1657982))
.13295188

These values can also be obtained as predictions:

Code:

. sum C3pr*

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       C3pr1 |         74    .1569277           0   .1569277   .1569277
       C3pr2 |         74    .7101204           0   .7101204   .7101204
       C3pr3 |         74    .1329519           0   .1329519   .1329519

As they are based on a parameter for each class, they are constants within a class.

The alternative, as you saw in your model, differs:

Code:

. estat lcprob, classposteriorpr

Latent class marginal posterior probabilities   Number of obs     =         74

--------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
           C |
          1  |   .1569278   .0105166      .1373989    .1786575
          2  |   .7101205   .0301459      .6476988    .7654866
          3  |   .1329518   .0293357      .0851858     .201599
--------------------------------------------------------------

Which, as you've posted above, are also available as predicted values:

Code:

. predict C3postpr*, classposteriorpr

. sum C3postpr*

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   C3postpr1 |         74    .1569278    .3581648   1.05e-15          1
   C3postpr2 |         74    .7101205    .4212197   2.63e-12   .9999955
   C3postpr3 |         74    .1329518    .3023448   1.24e-18   .9999822

Which do have variability within classes.

Why do these ones have variability and the ones before do not? That's because the posterior predicted probabilities are put together more complexly as an interweaving of the data, parameter estimates for each variable in each class, and the class probabilities used in the -classpr- computations above (see p. 571 in the user's manual for -sem-/-gsem-).

In the end, -classpr- is based on one set of parameter estimates (as is shown above) whereas -classposteriorpr- is based on two sets along with the data itself (won't try to reproduce those by hand here).

To your question, I think it is more common practice to use the class posterior predicted probabilities broadly and I believe this is what -estat lcmean- does:

Code:

. estat lcmean

Latent class marginal means                     Number of obs     =         74

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1            |
       price |   12191.51   454.2355    26.84   0.000     11301.22    13081.79
         mpg |   15.67828   1.247806    12.56   0.000     13.23263    18.12394
-------------+----------------------------------------------------------------
2            |
       price |   5189.398   205.3997    25.26   0.000     4786.822    5591.974
         mpg |   20.56298   .5967426    34.46   0.000     19.39339    21.73257
-------------+----------------------------------------------------------------
3            |
       price |   4264.501   432.0495     9.87   0.000       3417.7    5111.303
         mpg |   31.85174   1.693221    18.81   0.000     28.53309     35.1704
------------------------------------------------------------------------------

. sum price mpg [aw=C3postpr1]

    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
       price |      74  11.6126554    12191.51   1916.005       3291      15906
         mpg |      74  11.6126554    15.67828   3.229073         12         41

Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
----
Research Fellow
Fors Marsh
----
Version 18.0 MP

Comment

Josephine George

Join Date: Dec 2018

Posts: 34
#3

16 Mar 2021, 09:17

Thanks for that detailed explanation. Just to check I understand, the -classpr- option should produce identical estimates regardless of whether I I use lcprob or predict, whereas using -classposteriorpr- estimates may be different between lcprob and predict, because the posterior predicted probabilities can vary within classes? I ask because your classpostpr outputs are very similar whether generated through lcstat or predict, whereas mine are a bit different.

Last edited by Josephine George; 16 Mar 2021, 09:37.
Comment
Joseph Luchman

Join Date: Mar 2014

Posts: 114
#4

16 Mar 2021, 09:41

...the -classpr- option should produce identical estimates regardless of whether I I use lcprob or predict,...

Agreed - should be the same both ways.

...whereas using -classposteriorpr- estimates may be different between lcprob and predict, because the posterior predicted probabilities can vary within classes?

estat lcprob, classposteriorpr's results and the means from a combination of: predict varlist, classposteriorpr then summarize varlist should be identical.

The only way predict varlist, classposteriorpr then summarize varlist would be different, that I can think of, is in a situation where the predicted values using -classposteriorpr- were not identical to the entire estimation sample (i.e., is is a subset of the estimation sample or includes observations that were outside the estimation sample).

Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
----
Research Fellow
Fors Marsh
----
Version 18.0 MP
1 like
Comment

Announcement