Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Latent class analysis: marginal predicted probabilities vs marginal predicted posterior probabilities and estat vs predict

    I have a latent class model that I'm broadly happy with. I want to be able to say that x% of the sample is in class 1, y % of the sample is in class 2, etc.
    Previously I have gotten these summary statistics using:
    Code:
    estat lcprob, nose
    Following posts elsewhere on these boards about calculating entropy in these models, I ran
    Code:
     predict pr*, classposteriorpr
    sum pr1-pr4
    and noticed these are a bit different from the results for the earlier code, for example, 24% vs 27% in one group.

    Code:
    estat lcprob, nose classposteriorpr
    produces results quite close, but not identical to, those produced by the default estat lcprob specification, for example, 27.XYA% vs 27.XYB%.

    My reading of the manuals doesn't get me much closer to understanding what predict and estat are doing differently. I'd appreciate 1. guidance on which command I should use to generate summary statistics, and 2. a pointer to anything I can read to make sure I understand this.

  • #2
    Hi Josephine,

    Take this example:

    Code:
    . sysuse auto
    
    . gsem ( price mpg <- ), lclass(C 3)
    
    Fitting class model:
    
    Iteration 0:   (class) log likelihood = -81.283734  
    Iteration 1:   (class) log likelihood = -81.283734  
    
    Fitting outcome model:
    
    Iteration 0:   (outcome) log likelihood = -865.61911  
    Iteration 1:   (outcome) log likelihood = -865.61368  
    Iteration 2:   (outcome) log likelihood = -865.61368  
    
    Refining starting values:
    
    Iteration 0:   (EM) log likelihood =  -944.0659
    Iteration 1:   (EM) log likelihood = -924.15751
    Iteration 2:   (EM) log likelihood =  -909.2925
    Iteration 3:   (EM) log likelihood = -904.22959
    Iteration 4:   (EM) log likelihood = -901.28907
    Iteration 5:   (EM) log likelihood = -899.18989
    Iteration 6:   (EM) log likelihood = -897.74984
    Iteration 7:   (EM) log likelihood = -896.82187
    Iteration 8:   (EM) log likelihood = -896.24632
    Iteration 9:   (EM) log likelihood = -895.89329
    Iteration 10:  (EM) log likelihood = -895.67485
    Iteration 11:  (EM) log likelihood = -895.53711
    Iteration 12:  (EM) log likelihood = -895.44837
    Iteration 13:  (EM) log likelihood = -895.39004
    Iteration 14:  (EM) log likelihood = -895.35094
    Iteration 15:  (EM) log likelihood = -895.32452
    Iteration 16:  (EM) log likelihood = -895.30639
    Iteration 17:  (EM) log likelihood = -895.29385
    Iteration 18:  (EM) log likelihood = -895.28508
    Iteration 19:  (EM) log likelihood = -895.27893
    Iteration 20:  (EM) log likelihood =  -895.2746
    Note: EM algorithm reached maximum iterations.
    
    Fitting full model:
    
    Iteration 0:   log likelihood = -887.49716  
    Iteration 1:   log likelihood = -887.49715  
    
    Generalized structural equation model           Number of obs     =         74
    Log likelihood = -887.49715
    
     ( 1)  [/]var(e.price)#1bn.C - [/]var(e.price)#3.C = 0
     ( 2)  [/]var(e.price)#2.C - [/]var(e.price)#3.C = 0
     ( 3)  [/]var(e.mpg)#1bn.C - [/]var(e.mpg)#3.C = 0
     ( 4)  [/]var(e.mpg)#2.C - [/]var(e.mpg)#3.C = 0
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    1.C          |  (base outcome)
    -------------+----------------------------------------------------------------
    2.C          |
           _cons |   1.509649   .3438539     4.39   0.000     .8357078     2.18359
    -------------+----------------------------------------------------------------
    3.C          |
           _cons |  -.1657982   .5160308    -0.32   0.748      -1.1772    .8456037
    ------------------------------------------------------------------------------
    
    Class          : 1
    
    Response       : price
    Family         : Gaussian
    Link           : identity
    
    Response       : mpg
    Family         : Gaussian
    Link           : identity
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    price        |
           _cons |   12191.51   454.2355    26.84   0.000     11301.22    13081.79
    -------------+----------------------------------------------------------------
    mpg          |
           _cons |   15.67828   1.247806    12.56   0.000     13.23263    18.12394
    -------------+----------------------------------------------------------------
     var(e.price)|    1726442   330010.8                       1186982     2511076
       var(e.mpg)|   12.87172   2.632581                       8.62075    19.21888
    ------------------------------------------------------------------------------
    
    Class          : 2
    
    Response       : price
    Family         : Gaussian
    Link           : identity
    
    Response       : mpg
    Family         : Gaussian
    Link           : identity
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    price        |
           _cons |   5189.398   205.3997    25.26   0.000     4786.822    5591.974
    -------------+----------------------------------------------------------------
    mpg          |
           _cons |   20.56298   .5967426    34.46   0.000     19.39339    21.73257
    -------------+----------------------------------------------------------------
     var(e.price)|    1726442   330010.8                       1186982     2511076
       var(e.mpg)|   12.87172   2.632581                       8.62075    19.21888
    ------------------------------------------------------------------------------
    
    Class          : 3
    
    Response       : price
    Family         : Gaussian
    Link           : identity
    
    Response       : mpg
    Family         : Gaussian
    Link           : identity
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    price        |
           _cons |   4264.501   432.0495     9.87   0.000       3417.7    5111.303
    -------------+----------------------------------------------------------------
    mpg          |
           _cons |   31.85174   1.693221    18.81   0.000     28.53309     35.1704
    -------------+----------------------------------------------------------------
     var(e.price)|    1726442   330010.8                       1186982     2511076
       var(e.mpg)|   12.87172   2.632581                       8.62075    19.21888
    ------------------------------------------------------------------------------
    The default behavior of -estat lcprob- is to use the -classpr- option which produces:

    Code:
    . estat lcprob, classpr
    
    Latent class marginal probabilities             Number of obs     =         74
    
    --------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
               C |
              1  |   .1569277   .0444545      .0878773    .2645015
              2  |   .7101204   .0640861      .5709634    .8184907
              3  |   .1329519   .0525122      .0590814    .2724409
    --------------------------------------------------------------
    These values are based on model parameters - specifically these:

    Code:
    ...
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    1.C          |  (base outcome)
    -------------+----------------------------------------------------------------
    2.C          |
           _cons |   1.509649   .3438539     4.39   0.000     .8357078     2.18359
    -------------+----------------------------------------------------------------
    3.C          |
           _cons |  -.1657982   .5160308    -0.32   0.748      -1.1772    .8456037
    ------------------------------------------------------------------------------
    ...
    Which you can pull through a multinomial logit-function to reproduce the probabilities above.

    Code:
    . di 1/(1+exp(1.509649)+exp(-.1657982))
    .15692775
    
    . di exp(1.509649)/(1+exp(1.509649)+exp(-.1657982))
    .71012037
    
    . di exp(-.1657982)/(1+exp(1.509649)+exp(-.1657982))
    .13295188
    These values can also be obtained as predictions:

    Code:
    . sum C3pr*
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           C3pr1 |         74    .1569277           0   .1569277   .1569277
           C3pr2 |         74    .7101204           0   .7101204   .7101204
           C3pr3 |         74    .1329519           0   .1329519   .1329519
    As they are based on a parameter for each class, they are constants within a class.

    The alternative, as you saw in your model, differs:

    Code:
    . estat lcprob, classposteriorpr
    
    Latent class marginal posterior probabilities   Number of obs     =         74
    
    --------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
               C |
              1  |   .1569278   .0105166      .1373989    .1786575
              2  |   .7101205   .0301459      .6476988    .7654866
              3  |   .1329518   .0293357      .0851858     .201599
    --------------------------------------------------------------
    Which, as you've posted above, are also available as predicted values:

    Code:
    . predict C3postpr*, classposteriorpr
    
    . sum C3postpr*
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
       C3postpr1 |         74    .1569278    .3581648   1.05e-15          1
       C3postpr2 |         74    .7101205    .4212197   2.63e-12   .9999955
       C3postpr3 |         74    .1329518    .3023448   1.24e-18   .9999822
    Which do have variability within classes.

    Why do these ones have variability and the ones before do not? That's because the posterior predicted probabilities are put together more complexly as an interweaving of the data, parameter estimates for each variable in each class, and the class probabilities used in the -classpr- computations above (see p. 571 in the user's manual for -sem-/-gsem-).

    In the end, -classpr- is based on one set of parameter estimates (as is shown above) whereas -classposteriorpr- is based on two sets along with the data itself (won't try to reproduce those by hand here).


    To your question, I think it is more common practice to use the class posterior predicted probabilities broadly and I believe this is what -estat lcmean- does:

    Code:
    . estat lcmean
    
    Latent class marginal means                     Number of obs     =         74
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    1            |
           price |   12191.51   454.2355    26.84   0.000     11301.22    13081.79
             mpg |   15.67828   1.247806    12.56   0.000     13.23263    18.12394
    -------------+----------------------------------------------------------------
    2            |
           price |   5189.398   205.3997    25.26   0.000     4786.822    5591.974
             mpg |   20.56298   .5967426    34.46   0.000     19.39339    21.73257
    -------------+----------------------------------------------------------------
    3            |
           price |   4264.501   432.0495     9.87   0.000       3417.7    5111.303
             mpg |   31.85174   1.693221    18.81   0.000     28.53309     35.1704
    ------------------------------------------------------------------------------
    
    . sum price mpg [aw=C3postpr1]
    
        Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
    -------------+-----------------------------------------------------------------
           price |      74  11.6126554    12191.51   1916.005       3291      15906
             mpg |      74  11.6126554    15.67828   3.229073         12         41
    Joseph Nicholas Luchman, Ph.D., PStatĀ® (American Statistical Association)
    ----
    Research Fellow
    Fors Marsh

    ----
    Version 18.0 MP

    Comment


    • #3
      Thanks for that detailed explanation. Just to check I understand, the -classpr- option should produce identical estimates regardless of whether I I use lcprob or predict, whereas using -classposteriorpr- estimates may be different between lcprob and predict, because the posterior predicted probabilities can vary within classes? I ask because your classpostpr outputs are very similar whether generated through lcstat or predict, whereas mine are a bit different.
      Last edited by Josephine George; 16 Mar 2021, 09:37.

      Comment


      • #4
        ...the -classpr- option should produce identical estimates regardless of whether I I use lcprob or predict,...
        Agreed - should be the same both ways.

        ...whereas using -classposteriorpr- estimates may be different between lcprob and predict, because the posterior predicted probabilities can vary within classes?
        estat lcprob, classposteriorpr's results and the means from a combination of: predict varlist, classposteriorpr then summarize varlist should be identical.

        The only way predict varlist, classposteriorpr then summarize varlist would be different, that I can think of, is in a situation where the predicted values using -classposteriorpr- were not identical to the entire estimation sample (i.e., is is a subset of the estimation sample or includes observations that were outside the estimation sample).
        Joseph Nicholas Luchman, Ph.D., PStatĀ® (American Statistical Association)
        ----
        Research Fellow
        Fors Marsh

        ----
        Version 18.0 MP

        Comment

        Working...
        X