Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Latent class analysis using gsem - Cross validation

    Hi,
    I'm trying to learn LCA/LPA using gsem command in Stata by walking myself through Masyn (2013) - cited in SEM example 52 - and trying to replicate the steps mentioned in her empirical examples.

    In her article, it is recommended to cross validate the optimal number of classes in large samples.
    In particular:
    • Divide the sample in two subsamples A and B.
    • Obtain the optimal number of classes (say K-class) in one of the sub samples; say A - using a long procedure explained in the text.
    • Estimate model (1): a K-class model in subsample B fixing all parameters to parameters obtained from a K-class model in subsample A.
    • Estimate model (2): an unrestricted K-class model in subsample B
    • Test Model (1) against Model (2).
    My question is: using the @sign on each coefficient and equation separately is the only way to estimate the restricted model (1)? (Which will be time consuming in case of having large number of indicators). Or is there any other ways to do it? Moreover, in case of the LPA, one would need to fix the estimated variance and covariance as well. In particular, hw can one restrict the entire e(b) matrix to specific numbers?

    Thanks in advance,
    Emma


    Reference:
    Masyn, K. E. (2013). 25 latent class analysis and finite mixture modeling. The Oxford handbook of quantitative methods, 551.


  • #2
    For interested readers, MPlus hosts a copy of Masyn's chapter on its website. Also, regarding Emma's reference to e(b): this involves fitting the model in your calibration sample, saving the parameter estimates in the matrix e(b), then telling Stata to estimate a model in the validation data from that matrix (you have to name it something other than e(b)).

    Initially, I thought you could trick gsem to just take the starting matrix as the final parameter estimate through the noestimate option. However, if you just add noestimate, Stata will run the EM algorithm for 20 steps, and your parameter estimates will differ from your original starting matrix. You can further trick Stata by setting the number of EM steps to 0, e.g.

    Code:
    gsem (glucose insulin sspg <- _cons) if validation == 1, lclass(C 3) lcinvariant(none) covstructure(e._OEn, unstructured) from(b) noestimate emopts(iterate(0))
    However, if you do this, Stata doesn't return a log-likelihood for the model. Thus, it is impossible to conduct a likelihood ratio test.

    It does seem like if you are familiar with Stata's matrix commands and/or Mata, you could write a routine to read the e(b) matrix, and write each entry out as a constraint. For example, you need to automate something like the following:

    Code:
    constraint 1 1b.C:o._cons = b[1,1]
    constraint 2 1b.C:_cons = b[1,2]
    ...
    Then, you fit the model with all the constraints specified. However, this is beyond me right now. Does anyone know how to take a parameter matrix and convert it to valid constraints? Or are we able to specify gsem constraints as a matrix?
    Last edited by Weiwen Ng; 29 Apr 2020, 08:33.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Emma,

      This solution can be partially automated.

      Code:
      use https://www.stata-press.com/data/r16/gsem_lca2
      gen validation = _n > 72
      gsem (glucose insulin sspg <- _cons) if validation == 0, lclass(C 3) lcinvariant(none) covstructure(e._OEn, unstructured)
      est store initial
      mat b = e(b)
      
      constraint define 1 [2.C]: _cons = b[1,2]
      constraint define 2 [3.C]: _cons = b[1,3]
      constraint define 3 [glucose]: 1.C = b[1,4]
      constraint define 4 [glucose]: 2.C = b[1,5]
      constraint define 5 [glucose]: 3.C = b[1,6]
      /*Define the other constraints manually*/
      gsem (glucose insulin sspg <- _cons) if validation == 1, lclass(C 3) lcinvariant(none) covstructure(e._OEn, unstructured) constraints(1/30)
      est store validation1
      gsem (glucose insulin sspg <- _cons) if validation == 1, lclass(C 3) lcinvariant(none) covstructure(e._OEn, unstructured)
      est store validation2
      lrtest validation1 validation2
      A fully automated solution would involve taking that matrix b, and automatically generating the constraints using the equation names and column names of the matrix. This points to the built-in command svmat, or Nick Cox's extension svmat2. These commands would save the matrix as a new dataset, hopefully using the matrix equation and column names as variable names. Unfortunately, neither command is successfully running. I get an error message that is hard to interpret, but I suspect that the equation names are not valid variable names.

      The partially automated solution above does need you to type in quite a few constraints (there are will be 30 above; you omit the constraint for the intercept for the first latent class), but at least you don't need to copy-paste the values of the coefficients and put a bunch of @s in the gsem command. The latter would lose some precision, and it would definitely be prone to copy-paste errors and typos.
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Thank you very much for your response Weiwen.

        Comment

        Working...
        X