Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpretation coefficient Latent profile analysis with a mixture of continuous and dummy variables

    Dear Stata users,
    I am running a Latent profile analysis using a mixture of continuous and dummy variables. Looking at the SEM examples (50 and 52) it seems that for latent class analysis (only categorical variables), the coefficients are interpreted as coefficient of the multinomial logistic regression (so not very informative). For latent prifile analysis, the coefficients are interpreted as the estimated mean value for a given latent class.
    It is not clear to me why, using both type of variables, the estimated coefficients for the dummy variables can't be interpreted in terms of proportion of a given class. Can anyone provide some references for this case of a mixture of continuous and dummy variables? What is the underlying equations in this case: a mixture of regression and logistic?
    Thank you for your attention

  • #2
    Hello, Andrea,

    I hope all is well. I hope you get this reply, despite its tardiness!

    I am running a Latent profile analysis using a mixture of continuous and dummy variables. Looking at the SEM examples (50 and 52) it seems that for latent class analysis (only categorical variables), the coefficients are interpreted as coefficient of the multinomial logistic regression (so not very informative). For latent prifile analysis, the coefficients are interpreted as the estimated mean value for a given latent class.
    Which coefficients are you talking about? Latent class analysis combines two stages of modeling. First, there's a multinomial logistic model for the probability of being in each class. Then, there's a logistic (or whatever) model for each indicator, conditional on class membership. When you say multinomial logistic regression, I would normally assume you referred to the first stage model (unless you have indicators that you are treating as un-ordered categorical). However, the rest of the sentence makes me think that you were referring to the model for the response probabilities to each indicator. Those coefficients are on the log odds scale. You can extract the probability using the inverse logit function. I shall demonstrate, omitting unnecessary output.

    Code:
    use https://www.stata-press.com/data/r16/gsem_lca1
    gsem (accident play insurance stock <- ), logit lclass(C 2)
    
    /*output for class 2:*/
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    accident     |
           _cons |   4.983017   3.745987     1.33   0.183    -2.358982    12.32502
    -------------+----------------------------------------------------------------
    play         |
           _cons |   2.747366   1.165853     2.36   0.018     .4623372    5.032395
    -------------+----------------------------------------------------------------
    insurance    |
           _cons |   2.534582   .9644841     2.63   0.009     .6442279    4.424936
    -------------+----------------------------------------------------------------
    stock        |
           _cons |   1.203416   .5361735     2.24   0.025     .1525356    2.254297
    ------------------------------------------------------------------------------
    estat lcmean, nose
    /*Class 2*/
    2            |
        accident |   .9931933
            play |   .9397644
       insurance |   .9265309
           stock |    .769132
    --------------------------------------------------------------
    
    di invlogit(4.983017)
    .99319329
    Feel free to repeat this for each coefficient in class 2 (or class 1, for that matter). It will line up with the reported probability on the marginal means table. So, why even have estat lcmean? That's the only way to get a standard error and confidence interval for each probability or mean. However, maybe that's not critical to your analysis.

    Anyway, this goes to show that the coefficients for the dummy variables can indeed be interpreted as proportions, after you transform them. You are right that for continuous indicators, the coefficient is just the mean on the same scale the indicator was given in. (Actually, that should be Gaussian indicators; recall that you could have said the indicators were Poisson, or negative binomial, or any function supported by gsem, in which case the interpretation of the coefficients will be governed by the type of indicator.)

    For that matter, you can take the coefficients from the multinomial model and hand-calculate probabilities of being in each class. Recall that in multinomial logistic regression, P(C = 2) = exp(gamma_2) / [exp(gamma_1) + exp(gamma_2)], where gamma_1 and _2 are the intercepts, and where gamma_1 always equals 1. This is outlined in SEM example 50. Go ahead, try it.
    Last edited by Weiwen Ng; 03 Apr 2020, 17:09.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Hi Weiwen,
      thank you for your reply.
      I was indeed referring to the model for the response probabilities to each indicator. Now I have what I need. Thank you

      Comment

      Working...
      X