Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • SEM with Multiply Imputed Data

    Greetings,

    INTRO: I am working with a longitudinal dataset. The study from which the data was derived was an RCT evaluating a program. There were 6 separate data collection periods that took place over 18 months. There was a lot of attrition in the study; so, I multiply imputed the data using stata. The study uses a social science framework; so there are observed variables (imputed) and latent (passive) scale variables in the dataset. I imputed the observed variables and calculated the passive variables based on the imputed variables.

    QUESTION 1: I am struggling with how to setup the format of the imputed dataset. Technically, the original dataset is in the wide format (e.g. each row contained all the information from all six longitudinal surveys per individual). However, right now, the dataset is in the long format, where the new iteration of the imputed, longitudinal dataset is appended to the bottom of the previous iteration of the imputed, longitudinal dataset. There is a variable that identifies which cases belong to which imputation. Is this the best format for a multiply imputed longitudinal dataset?

    QUESTION 2: I need to run some path models on the imputed data (e.g. latent growth curve models; latent class analyses; etc). As far as I know, the mi estimates command is not going to work with the SEM builder. So how do I let stata know that I am working with an imputed dataset? Do I just run the SEM models with different groups based on the imputation variable? Do I run the SEM models on each dataset individually and then manually calculate the pooled estimates? Any thoughts are appreciated.

    ​Thanks in advance,
    Sam

  • #2
    I hate to throw cold water on your fire, but I don't think you're going to be able to do this in Stata, for a couple of reasons:

    1. Unless it has changed recently, -mi estimate- does not support the -sem- command.
    2. Even if I'm wrong about #1, Stata's -sem- does not support discrete latent variables, so your latent class models are a non-starter.

    While I am not fond of MPlus, it can do all of this. It will do SEM with MI, and it definitely handles discrete latent variables. In addition, you may be able to avoid the overhead of doing MI by using one of MPlus' full information estimators. (Stata has -method(mlmv)- which is full information but relies on multivariate normality. MPlus has a full information estimator which is also robust to non-normality.)

    Probably the most unpleasant aspect of working in MPlus is, in my opinion, its dreadful data management and clunky command language. You can get pretty much the best of both worlds by doing your data management in Stata and then running the SEM analyses in MPlus under the user-written Stata command -runmplus-, which you can get from SSC. Although -runmplus- was written for MPlus version 6 and, as far as I know, has not been modified specifically for use with MPlus version 7, I have been using it with MPlus version 7 and have not encountered any problems.

    Hope this helps.

    Comment


    • #3
      Greetings Dr. Schechter (and other readers),

      I very much appreciate your response (cold water included). I would rather know that I need to switch programs "early on" in the process than try to come up with a lengthy work-around that will likely prove ineffective. I have no problem working with alternative programs. I have access to MPlus and R; so, I will look into running the SEM analyses in one of those two programs.

      Do you or anyone else have thoughts on how to best structure, i.e. "mi set" the dataset (e.g. wide format versus flong) or was that question too vague/unclear?

      Thanks again.

      Comment


      • #4
        While using Stata 14 last year, I became aware that the SEM command didn't natively support multiple imputation. While one can invoke the
        Code:
        , cmdok
        option for many commands (e.g. the user-written oaxaca command), I think I tried this with SEM and it didn't work on Stata 14. I'm pretty sure I read some of Clyde Schechter's answers on this topic. I could have run SEM using the
        Code:
        , method(mlmv)
        option which uses full-information maximum likelihood to estimate the coefficients, but my understanding is that this would assume multivariate normality plus missing at random conditional on the variables in the model, and I wasn't completely certain this was a tenable method for that particular project due to technical reasons.

        My department just upgraded us to Stata 15. Out of pure curiosity, I decided to see what happened if I tried SEM with
        Code:
        mi estimate, cmdok
        . It worked. The same is true of the IRT models; I think I tried to mi estimate an IRT model in Stata 14 out of curiosity, and also failed.

        It could be that under Stata 14, the SEM and IRT commands simply didn't post the e(b) and e(V) matrices before this, so the cmdok option would not work. Perhaps this has been changed.

        However, users trying to pull this off should consider whether the estimators have an asymptotically normal distribution, and whether the variance-covariance matrix can be consistently estimated. You may be wading into an area of statistical theory where some fundamental issues haven't been resolved. If you know what the first sentence of this paragraph means, then you know you should tread cautiously. Personally, I don't have a good understanding of how and why the first sentence might get violated.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Hello everyone,

          I don't know whether this helps after all this time but with version 14.2, gsem seems to work with imputed data.

          In this cased, I performed multinomial (polytomous) logistic regression with a random factor, as an extension for mlogit (only fixed factors).


          Code:
          mi estimate, dots vartable cmdok: gsem (i.attack <- smokes age bmi M[grp_rnd]), mlogit
          matrix k=r(table)
          Dusan

          Comment

          Working...
          X