Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • SEM in stata

    Hi all! I am new in Stata. Although I have worked with statistics I am not an expert. I have data about malaria number of cases at different administrative units,and for10 years together with climatic, connectivity and land use covariates. I would like to see the influence of the different covariates on malaria prevalence of infection through time, but also see how these factors (climate, connectivity and land use) interact with each other. I would like some guidence about the methods and and how to proceed in stata. My malaria data has a Poisson distribution with a lot of zeros, thenI should model a zero-inflated poisson distribution. To model these influences and interactions I though about SEM and the use of latent growth curve models as I I have different years. Do you think this is the correct approach? and How to proceed with this in stata.

    Thank you very much.

  • #2
    It seems like there are two separate questions here. Only possible to answer in the general case, but here goes.

    If you have a theoretical sense that there there are two processes at work, one of which always produces a zero count of malaria cases and the other produces a standard Poisson or negative binomal count of malaria cases, then you should use a zero inflated model. The classic example of zero inflation involves a (maybe fictitious) survey of people at a national park. One of the items measured is how many fish they caught.

    The thing is, some people do not fish, and some people do. If you fish, you might still catch zero fish. If you didn't even set out to fish in the first place, then naturally you won't catch any fish. If you have a sense that a situation like that applies, then that's good grounds for a zero inflated model. If not, then I don't think it's necessary, and (although I don't work in infectious diseases) I am not sure I would buy that some administrative units might be immune to malaria in some years - that would be the heuristic interpretation of a ZIP model in this case (but if you can give an explanation for why that might happen, then that's fine). They are trickier to interpret. Also, if the DV has a lot of 0s, it could also be explained by it's a standard Poisson process, but the rate is low. If you disagree with me, go seek out Joao Santos Silva, who also typically seems to counsel against fitting zero inflated count models unless there's good theoretical reason to.

    I've never been familiar with latent growth curve models. I was under the impression that they are equivalent to (generalized) hierarchical linear models. It turns out that this is not always true, but in practice they can produce very similar estimates in some cases. In general, the syntax for mixed models is simpler to me than the SEM syntax (and I do fit a number of generalized SEMs in my work). If you are dead set on estimating a zero inflated model with additional random effects, then I'm thinking both the syntax may get a lot more complex and estimation may be a lot trickier. It seems like you'd need to specify an additional random intercept and slope for the structural zero part of the model, as well as specify how those random effects covary with the effects for the count part of the model. That's not standard syntax for gsem - and do note, you'd be using a generalized SEM, not a traditional one, if you went this route. There's no mixed effects version of the ZIP model, either.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment

    Working...
    X