Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCM using GSEM for ZINB data

    Dear all! I have count data (number of positive malaria cases) for different adminisrtative units. I would like to model cases of malaria based on differente climatic, conectivity and LULCC covariates measured over different years (2008-2019). I was reading and was also adviiced to use growth curve models usign SEM, as I also would like to see for interactions between climate, conectivity and LULCC. I investigated the distribution of my data and I found overdispersion and excess of 0s, around 76% of my data has ceros. Thus I guess I should use a GSEM. I found information about using sem for longitudinal data, using gsem for zero-inflated data, but I have not find information about using GSEM for zero-inflated longitudinal data. If you have a clue where to search.

    I use this code fro gsem with zero-inflated data, but I do not know how to incluye time

    gsem(2: casos2008<- ,family(pointmass 0)) (1: casos2008<- longitud_vias longitud_rios pd_ca2008, family (nbinomial))

    And I used this one for longitudinal data but I do not know how to specify my data distribution

    gsem(casos2008<- I@1 S@0 _cons@0) (casos2009<- I@1 S@1 _cons@0) (casos2010<- I@1 S@2 _cons@0), var(e.casos2008
    > @var e.casos2009@var e.casos2010@var) means(I S) family(nbinomial constant) link(log))


    Thank you very much for the help. Attached is my data
    Attached Files
    Last edited by Andrea Araujo; 13 Oct 2021, 09:33.

  • #2
    On your previous post, I commented that you may be better off just using a longitudinal Poisson or negative binomial model for this problem. The theoretical rationale given there stands. To briefly restate: if you think that some of the units can be structural zeroes, one heuristic interpretation is that they're invulnerable to malaria infection. I don't see how that can be possible, so I see that as a strike against the model. A lot of commenters here seem to try to steer people away from being set on zero inflated models as well, e.g. Joao Santos Silva. Basically, a Poisson distribution with a low mean can produce a lot of 0s.

    One of my other hesitations was that the syntax for fitting a latent growth curve model in general is a bit tricky, and you essentially have something like a two part model here, so the syntax is even trickier. There's another complication I forgot, however. A zero inflated model involves a categorical latent variable - that is, are you in the zero inflated class or not (i.e. the one you said had family equal to pointmass at 0). The problem is that Stata 16 and earlier can't fit models with both categorical and continuous latent variables. I'm not sure that Stata 17 can do this, either. For a latent growth curve or random effects model, you have a continuous latent variable as well - the intercept is a continuous latent variable. Thus, I'm not even sure that you can pull this off in Stata.

    It appears that people have conceived of longitudinal zero inflated models, and this presentation shows one being fit in MPlus. Alternatively, I bet someone has cooked up an R package that could do this. If you are dead set on this type of model, then I think you would have to switch software (unless someone can confirm that Stata 17 allows a zero-inflated model with random effects).

    If Stata 17 does allow categorical and continuous latent variables to be estimated simultaneously, then you can adapt your first syntax to this SEM example of a random intercept and slope model. The complication is that you need to specify a second set of random effects affecting the point mass function (i.e. the probability of being in the structural zero class).
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thank you very much WeiWen. You clarified my ideas. Now lets say I use just negative binomial distribution. Is it possible to use gsem for longitudinal data with this type of distribution in stata? If yes, which command is the right one.

      Thank you again.

      Comment


      • #4
        Originally posted by Andrea Araujo View Post
        Thank you very much WeiWen. You clarified my ideas. Now lets say I use just negative binomial distribution. Is it possible to use gsem for longitudinal data with this type of distribution in stata? If yes, which command is the right one.

        Thank you again.
        Are xtnbreg or menbreg not sufficient for your purposes? The former lets you fit fixed effect or random intercept models. The latter is random effects only, flexible specification of the random effects and their covariance structure if applicable. I think that latent growth curve models require the data to be in wide format (but I'm not certain of this). The other two commands require long format. If you're fitting a latent growth curve model in wide format, you can adapt the syntax of SEM example 18, but use gsem (not sem) and specify the correct family in the options. There's another example of that syntax by Chuck Huber of StataCorp using a Poisson distribution as well.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment

        Working...
        X