Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is it possible to do multilevel latent class analysis with Stata 15/IC?

    Dear Statalist,

    I'd like to conduct whether level 1 latent class varies across level 2 units. For example, I'd like to test whether the probability that an individual will be categorized into risky behaviors class is likely to vary across regions. I have four behaviors (behavior1 behavior2 behavior3 behavior4) and 3 regions (region1 region2 region3).

    Is it possible to be conducted with Stata 15/IC? If so, could you recommend materials/syntax that I can refer?

    Thanks,

  • #2
    I believe not. I have tried to get the gsem command to fit one of these myself.

    If I understand correctly, in a multilevel latent class model, you allow a random effect to enter the multinomial equation. It is akin to a latent class regression with known covariates in specification. I've fit one of the latter and the model works fine.

    In your case, to get a multilevel latent class where region were treated as a random intercept, the syntax should look like:

    Code:
    gsem (behavior1-behavior4 <- _cons, logit) (C <- M1[region])
    
    invalid path specification;
    paths between latent variable M1[mds_id] and latent class 1.C are not allowed
    You would get the error message I indicated above. I know because I have tried that with my code.

    However, you said you have 3 regions. Isn't that a small enough number of regions to treat with fixed effects? Remember, in a latent class regression, you assume that you have some variables that are associated with which latent class the person belongs to, but they don't enter the measurement part of the model. For your purposes, if you really have only 3 regions, I believe that a fixed effects specification will do the job for you. Say you recoded a variable -region- that took on values of 1, 2, and 3 depending on which region it was:

    Code:
    egen region = group(region1 region2 region3)
    gsem (behavior1-behavior4 <- _cons, logit) (C <- i.region)
    Quite simple.

    As to your question about Stata version, IC only limits the number of variables and observations. You should be good to go.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Moon,

      With only three regions, I don't believe a multilevel analysis is feasible.

      I would suggest instead that you consider running a latent class regression as in the following example. (Note you will need to install combomarginsplot ado from SSC.)
      Code:
      *Install/update combomarginsplot ado.
      ssc install combomarginsplot, replace
      
      * Load data.
      use http://www.stata-press.com/data/r15/gsem_lca2, clear
      
      * Run latent class regression using patient's relative weight to predict probability of latent class membership.
      gsem (glucose insulin sspg <- _cons) (C <- relwgt), lclass(C 3) lcinvariant(none) covstructure(e._OEn, unstructured)
      
      * Estimate margins and create marginsplots and combine into single graph.
      margins, at(relwgt=(.7(.1)1.2)) predict(classpr class(1)) saving(marg1, replace)
      marginsplot, scheme(s1color) name(marg1, replace)
      *
      margins, at(relwgt=(.7(.1)1.2)) predict(classpr class(2)) saving(marg2, replace)
      marginsplot, scheme(s1color) name(marg2, replace)
      *
      margins, at(relwgt=(.7(.1)1.2)) predict(classpr class(3)) saving(marg3, replace)
      marginsplot, scheme(s1color) name(marg3, replace)
      *
      combomarginsplot marg1 marg2 marg3, noci plotdim(_filenumber) labels("Latent Class 1" "Latent Class 2" "Latent Class 3") file1opts(mcolor(blue) lcolor(blue)) file2opts(mcolor(green) lcolor(green)) file3opts(mcolor(brown) lcolor(brown)) ylabels(0(.1).9, format(%3.2f) angle(hor) labsize(vsmall)) ytitle("Class Membership Probability") xlabels(, format(%3.2f) labsize(vsmall)) xtitle("Patient Relative Weight") legend(rows(1) span) scheme(s1color) name(lrcombinedmargs, replace)
      Hope that helps.

      Red Owl
      Stata/IC 15.1, Windows 10 (64-bit)

      Edit: I sent this response before Weiwen Ng, but it was incorrectly blocked as spam because I was using a VPN and so it was delayed in being posted. My suggestion is essentially the same as Weiwen's.
      Last edited by Red Owl; 10 Jan 2018, 12:58.

      Comment


      • #4
        Thank you so much Weiwen and Red Owl. Very much helpful!

        Comment


        • #5
          Originally posted by Moon Hoon View Post
          Thank you so much Weiwen and Red Owl. Very much helpful!
          Not a problem. Red Owl's post actually came up in a discussion we had earlier! Do note that in his example, the predictor of class membership is continuous; this was based on Stata's latent profile analysis example. If you should choose to use margins after fitting the model, you would obviously need to account for that in the syntax, e.g.

          Code:
           
           margins region, predict(classpr class(1)) saving(marg1, replace)
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment


          • #6
            Thank you!

            Comment


            • #7
              Originally posted by Weiwen Ng View Post

              Code:
              margins region, predict(classpr class(1)) saving(marg1, replace)
              Hello! Digging up this thread. To estimate marginal effects, why did you recommend classpr as opposed to classposteriorpr? Especially since the covariate of interest was entered into the laten class model as a predictor, wouldn't you need to use

              Code:
               
               margins region, predict(classpost class(1)) saving(marg1, replace)
              in order to incorporate the effect as it was fully estimated?

              Comment


              • #8
                Originally posted by Melvin Donaldson View Post

                Hello! Digging up this thread. To estimate marginal effects, why did you recommend classpr as opposed to classposteriorpr? Especially since the covariate of interest was entered into the laten class model as a predictor, wouldn't you need to use

                Code:
                margins region, predict(classpost class(1)) saving(marg1, replace)
                in order to incorporate the effect as it was fully estimated?
                Melvin,

                classposteriorpr is definitely what you would type for the postestimation command -predict- to predict the posterior probabilities. However, it appears to not be appropriate for margins. The gsem postestimation entry says that classposteriorpr is not allowed with -margins-. Try it and see.
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment


                • #9
                  Very clear, thank you!

                  Comment


                  • #10
                    I am posting in this thread because I have a similar issue.
                    I am trying to conduct a latent profile analysis accounting for the fact that I have multiple measurements for the same individual. Specifically, I have measures such as systolic blood pressure, total cholesterol, weight, glycated haemoglobin for a total of fourteen clinical measures For the same subjects (N=30) each clinical measure has been measured approximately six times across one year. As covariates I have: time of measurement (morning or afternoon), sex, age, and medication (categorical). I would like to profile the subjects assigning them to a certain number of classes and check how these are associated with three different outcomes.

                    I cannot post a subset of my dataset for violation of data sharing agreement but this are the codes I used:

                    Code:
                    gsem (var1-var14 <- _cons) (C <- i.id), lclass(C 4) iter(2000)
                    and

                    Code:
                    gsem (var1-var14 <- _cons) (C <- i.id sex age i.medication), lclass(C 4) iter(2000)

                    I also tried with 2,3, and 5 classes but goodness of fit measures seem to prefer the model with 4 categories. From my understanding I cannot fit analyses using a multilevel approach, therefore, I thought to include "id" as predictor. My problem is that when I try to predict the classes after running gsem, in some cases, the same individual is assigned to different classes (even to three different classes) which doesn't seem right at all. I also looked at the "group" option for sem but it doesn't seem to be what I am looking for.

                    As my solution to the problem is not perfect I would have expected that an individual would have been assigned to multiple classes but not certainly with this frequency. I also though to transform the database in wide format but I am skeptical in doing that because it would create many missing data (although the majority of subjects have 6 measures, some of them have only 2-3, other more than 6 for a maximum of 8. Therefore, Stata would create eight versions of the same variable for each individual but with a lot of missing).

                    I am using Stata 15

                    Comment


                    • #11
                      Originally posted by Eduardo Torre View Post
                      I am posting in this thread because I have a similar issue.
                      I am trying to conduct a latent profile analysis accounting for the fact that I have multiple measurements for the same individual. Specifically, I have measures such as systolic blood pressure, total cholesterol, weight, glycated haemoglobin for a total of fourteen clinical measures For the same subjects (N=30) each clinical measure has been measured approximately six times across one year. As covariates I have: time of measurement (morning or afternoon), sex, age, and medication (categorical). I would like to profile the subjects assigning them to a certain number of classes and check how these are associated with three different outcomes.

                      I cannot post a subset of my dataset for violation of data sharing agreement but this are the codes I used:

                      Code:
                      gsem (var1-var14 <- _cons) (C <- i.id), lclass(C 4) iter(2000)
                      and

                      Code:
                      gsem (var1-var14 <- _cons) (C <- i.id sex age i.medication), lclass(C 4) iter(2000)

                      I also tried with 2,3, and 5 classes but goodness of fit measures seem to prefer the model with 4 categories. From my understanding I cannot fit analyses using a multilevel approach, therefore, I thought to include "id" as predictor. My problem is that when I try to predict the classes after running gsem, in some cases, the same individual is assigned to different classes (even to three different classes) which doesn't seem right at all. I also looked at the "group" option for sem but it doesn't seem to be what I am looking for.

                      As my solution to the problem is not perfect I would have expected that an individual would have been assigned to multiple classes but not certainly with this frequency. I also though to transform the database in wide format but I am skeptical in doing that because it would create many missing data (although the majority of subjects have 6 measures, some of them have only 2-3, other more than 6 for a maximum of 8. Therefore, Stata would create eight versions of the same variable for each individual but with a lot of missing).

                      I am using Stata 15
                      Your question is sufficiently distinct from the original one that I'd consider posting it separately. That said, my thoughts are as follows.

                      It sounds like you have data in long format, i.e. you have subjects with up to 8 measures throughout the year. If you had subjects who were measured only once, or you had the averages of each indicator, this would be like creating clinical profiles for your subjects and seeing which subject fell into which profile.

                      However, you set the data up in long format, and subjects can change their weight, SBP, etc during the year. I'm not sure why you are surprised that people (more precisely, a person-month or a person-measure or whatever we call the unit of analysis here, which is not a person if the data are in long format) are getting assigned to different classes. You have set this up like something called latent transition analysis. I'm not sure that the -gsem- command is set up to properly estimate one (never done one myself), but I'm pretty sure it won't estimate transition probabilities between classes. Nonetheless, that's closer to what you would get.

                      If you want to assign people to a stable clinical profile over the entire year, then I think I would probably take the average of all their measurements - that said, in LCA, people are not assigned to one class by default, they are assigned probabilistically to each of the classes. We can choose modal class assignment if we want, but the most proper way to do it is to keep the probabilistic assignment.

                      Last, you invoked the option -iter(2000)-. You may already know this, but I'm saying for the benefit of a reader who might not: if the model has not achieved convergence by 2,000 iterations, then estimation will terminate and Stata will spit out the solution, and this solution will be invalid. Also, in my experience, LCA models tend to either converge within many fewer iterations, or else it will be clear that they won't converge by 100 or 200 iterations. If this comment is cryptic, please look through my post history.

                      If you need any further questions, I'd encourage you to post a new thread, and I will respond there. It makes it easier for interested parties to find information.
                      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                      Comment

                      Working...
                      X