Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Latent class analysis: - gsem- & pseudo R-squared

    Dear users,

    This may be a dumb question, but I am trying familiarizing myself with Latent class cluster analysis. In most of the published papers in which they have employed a latent class analysis approach ( regardless of the software they chose) they report a pseudo-R2, alongside the log-likelihood value and BIC.

    However, I do not see any R-squared in the outcomes when -gsem- is used to conduct LCA. Is it possible to get that?

    Thanks in advance,
    Best,
    Lena

  • #2
    Code:
    estat lcgof
    McFadden's Pseudo-R2 would just be 1 minus the ratio of log-likehoods.

    Comment


    • #3
      Thank you very much Andrew.
      Does this mean that when the observed variables are NOT all categorical, in the case where Stata does not report the likelihood ration tests, we cannot calculate the Pseudo-R2?

      Best,
      Lena

      Comment


      • #4
        The continuous variables do not contribute to the log-likelihood, so just run the command excluding these to obtain the fit statistics.

        Edit: To be more specific, in logit, for example, if you include a continuous dependent variable, it does not vary. Can you show your command?
        Last edited by Andrew Musau; 08 Dec 2019, 10:58.

        Comment


        • #5
          Dear Andrew, thanks again for your response.
          But I cannot exclude continuous variables, because all of the variables are continuous. My data looks like this:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double age float(gender educ inc x1 x2 x3 x4 x5 x6 x7)
          49 0 1 0  3 -2  2 -2 -1 1  2
          33 1 0 0 -3  3  0  4  1 0 -4
          45 1 1 1 -2  2  1  4  1 0 -4
          19 1 0 1 -2  2  1  4  0 1 -4
          33 1 1 0  2  2 -3  3 -1 3 -2
          29 0 1 0  2  0  0 -1 -2 0 -4
          54 1 0 1 -1  3  2  4  1 1 -2
          61 1 0 0  1 -1  3  1  0 3  0
          47 1 0 0  2  1 -1  3 -2 4 -4
          32 1 0 1 -3  0  1  4 -1 3 -2
          end


          And I am specifying the model as:

          Code:
          local x "x1 x2 x3 x4 x5 x6 x7"
          gsem ($x <- ) (C <- age gender educ inc), lclass(C 3)
          estat lcgof
          Therefore, if I want to exclude the continuous variables, there will be no endogenous variables at all.
          I am not sure if I can specify another family of distribution for these observed variables. They are ranging from -4 to 4, and they represent the number of times an attribute is picked as best minus the number of times an attribute is picked as worse.

          Is there any other ways to get R, and/or specify the model?

          Comment


          • #6
            This is OK, you can include continuous variables here. I thought your model was logit. So in this case, Stata cannot compute the likelihood ratio. If I get time, I will check if there is a workaround.



            Comment


            • #7
              For linear models, while it is possible to calculate McFadden's Pseudo R2, it does not make much sense to do it as you have the R2 statistic = Model Sum of Squares/Total Sum of Squares. Recall that this is a substitute for R2 in nonlinear models such as logit and probit. Here is an example of how to calculate the former in a linear model.

              Code:
              sysuse auto
              qui glm mpg displacement weight gear
              local ll1= e(ll)
              qui glm mpg
              local ll0= e(ll)
              di "McFaddens Pseudo R2 is `= 1-(`ll1'/`ll0')'"
              Res.

              Code:
              . di "McFaddens Pseudo R2 is `= 1-(`ll1'/`ll0')'"
              McFaddens Pseudo R2 is .1674128332010786
              ​​​​​​
              So in the case of latent class models in gsem, the issue is how to define the comparison model. If you define this as log-likelihood= 0, then the Pseudo R2 is not defined. I do not see an easy way of doing this, so I would just stick to the AIC and BIC. As I said, even if we were able to calculate the statistic, it is not useful for linear models.

              Comment


              • #8
                Dear Andrew,
                Thank you so much for your response.

                I was wondering is assuming continuous variable is correct in this specification, and my Xs are ranging from -4 till max 4, and they are only in integers. They are basically count data, but effect coded. So number of times an attribute chosen as best minus no. of times it is chosen as worst. Is this correct to treat them as continuous?

                The reason for asking is that, in a similar paper which used similar best and worst ranking of the attribute, the author is reporting R2 ( not mentioning which software is used though). Nevertheless, I was wondering maybe the problem is the continuous assumption that I am making here?

                I really appreciate your help,
                Best,
                Lena

                Comment


                • #9
                  I was wondering is assuming continuous variable is correct in this specification, and my Xs are ranging from -4 till max 4
                  Yes, a linear model works here since the support is \((-\infty, \infty)\). Because you have differences in counts and not counts, you cannot use count models such as poisson since there is no such thing as a negative count.

                  The reason for asking is that, in a similar paper which used similar best and worst ranking of the attribute, the author is reporting R2
                  I cannot tell what the author did or what model he/she used. Do you have a link to the paper? The method of analysis should have been discussed in the paper.
                  Last edited by Andrew Musau; 09 Dec 2019, 07:35.

                  Comment


                  • #10
                    The method of analysis is not discussed in a great detail, because the latent class analysis is provided in the appendix as an alternative way of analyzing such data.
                    Here is the paper, and here is the appendix.

                    Thank you for the time you take,
                    Best,
                    Lena

                    Comment


                    • #11
                      Just wanted to point: in the appendix, under the table " A5: Latent Class Cluster Analysis Based on Effects-Coded Count Estimates" an R-squared is reported.

                      Comment


                      • #12
                        Thanks for posting the link. I believe what the authors call R2 is entropy R2, which is an indication of the quality of classification. Here is the code to implement it in R.
                        https://gist.github.com/daob/c2b6d83...3cebfdc2c267b3

                        This is a supplementary statistic and I would not worry if I have not reported it. From the estimated model's point of view, AIC/BIC are more important in assessing fit.

                        Comment


                        • #13
                          Thanks a lot Andrew,
                          Best,
                          Lena
                          Last edited by Lena Garnik; 09 Dec 2019, 11:36.

                          Comment


                          • #14
                            Hello,
                            In a latent class cluster analysis (aka, latent profile analysis, mixture of normals?), where the classes are nominal categorical and the indicators are continuous and assumed conditionally normally distributed, R^2 for each item is reported as measure of the variance in that item accounted for by the latent class/profile variable. It is akin to the item R^2 reported in a confirmatory factor analysis. It is telling you about measurement quality. AIC/BIC are relative model selection criteria. You could have a set of models, all with very poor measurement, and AIC/BIC is still going to say one is best of the set. Thus item R^2 is giving a different view of the quality of the model and estimates.
                            Hope this helps.
                            Brian

                            Comment

                            Working...
                            X