How to fit a zero-inflated poisson model in GSEM?

Pengpeng Ye

Join Date: May 2014

Posts: 6
#1

How to fit a zero-inflated poisson model in GSEM?

13 Dec 2018, 19:44

Hi friends,

BACKGROUND:
I have checked all examples showed in SEM/GSEM, but I do not find any example focusing on how to fit a zero-inflated possion model in GSEM. I have also learned similar examples showed in the Users' Guide of Mplus 8 (Chapter 7.25 example). However, unfortunately, I failed to fit the zero-inflated model using GSEM based on the information provided by UG of Mplus.

QUESTION::
Does anyone used GSEM to fit a zero-inflated model before? Would you mind if you could give me some sugestion, experience or diagram of ZIP model in GSEM?

Thank you so much in advance.
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

13 Dec 2018, 21:33

You may already be aware of the zip command, which fits zero-inflated Poisson models. That said, if you want to use gsem, the syntax may take some getting used to, but see slides 56 onward of this recent presentation by Stata's own Rafal Raciborski.

https://www.stata.com/meeting/poland...Raciborski.pdf

Note that you will need Stata 15 for this to work in gsem.

Actual code:

Code:

webuse fish zip count persons livebait, inflate(child camper) est store zip gsem (1: count <- , family(pointmass 0)) /// (2: count <- persons livebait, family(poisson)) /// (C <- child camper), lclass(C 2) lcinvariant(none) est store gsem est table zip gsem ---------------------------------------- Variable | zip gsem -------------+-------------------------- count | persons | .80688527 livebait | 1.7572894 | C#c.persons | 2 | .80688527 | C#c.livebait | 2 | 1.7572894 | 2.C | -2.1784716 _cons | -2.1784716 -------------+-------------------------- inflate | child | 1.6025705 camper | -1.0156983 _cons | -.49228716 -------------+-------------------------- 1b.C | child | (omitted) camper | (omitted) _cons | (omitted) -------------+-------------------------- 2.C | child | -1.6025705 camper | 1.0156983 _cons | .49228716 ----------------------------------------

You may already be aware of this, but /// denotes a line break. You will need to copy my block of code into a do file, and execute the entire block of gsem code for it to run properly.

Basically, (1: ...) and (2: ...) denote the two latent classes, and you are specifying different predictors for each. The multinomial latent class was named C in this code (as opposed to Class in Rafal's slides). The block of code (C <- child camper) basically tells Stata to run a multinomial regression of child and camper on C. (It appears you don't have to specify that the regression is multinomial in this one case; in every other case in gsem, you need to tell Stata what sort of family and link, or it will assume Gaussian and identity).

Last edited by Weiwen Ng; 13 Dec 2018, 21:38.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

14 Dec 2018, 11:10

Originally posted by Weiwen Ng View Post

Basically, (1: ...) and (2: ...) denote the two latent classes, and you are specifying different predictors for each. The multinomial latent class was named C in this code (as opposed to Class in Rafal's slides). The block of code (C <- child camper) basically tells Stata to run a multinomial regression of child and camper on C. (It appears you don't have to specify that the regression is multinomial in this one case; in every other case in gsem, you need to tell Stata what sort of family and link, or it will assume Gaussian and identity).

In the block of results I showed, the coefficient estimates for the Poisson class are identical. The coefficient estimates regarding the excess zero class from gsem are -1 times the coefficients from zip, because gsem treats class number 1 as the base class. In my syntax, class 1 was defined as the excess zero class, so the coefficients represent the log odds of being in the Poisson class. In zip, that part of the output describes the log odds of being in the excess zero class. You could code it this way to get coefficients identical to zip if this was really important to you:

Code:

gsem (2: count <- , family(pointmass 0)) /// (1: count <- persons livebait, family(poisson)) /// (C <- child camper), lclass(C 2) lcinvariant(none)

Also note, after gsem, you can use estat lcprob to give you the estimated probability of being in each latent class. In zip, you would have to use

Code:

margins, predict(pr)

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Pengpeng Ye

Join Date: May 2014

Posts: 6
#4

21 Dec 2018, 07:52

Thank you so much. Your reply is very useful. @Weiwen Ng
Comment
Marco Greco

Join Date: Sep 2015

Posts: 45
#5

08 Feb 2021, 07:42

Thanks for this useful thread, I am also trying to estimate ZIP through gsem, since I am using a multilevel model to control for firm sector such as the following:

Code:

gsem ($xlist1 $xlist2 M[sector]-> depvar, logit ) (M[sector]-> $xlist1, logit), vce(robust)

With

Code:

depvar

a dummy variable with 86% of zeros;

Code:

sector

a cardinal variable describing with 4 sectors

If I attempt to merge my code with this, with

Code:

$xlist3

a list of variables to estimate the inflation equation, I came out with:

Code:

gsem (1: depvar <- , family(pointmass 0)) /// (2: depvar <- $xlist1 $xlist2 M[sector], family(poisson)) /// (C <- $xlist3) (M[sector]-> $xlist1, logit), lclass(C 2) lcinvariant(none) vce(robust)

I obtain the following error latent variable M not found; 'M[sector]' specifies a latent variable at level '[sector]'. For 'M[sector]' to be a valid latent variable specification, 'M' must appear in the latent() option. while if I add

Code:

latent(M)

I am told that
option lclass() is not allowed with models specified with continuous latent variables

finally, if I opt for a

Code:

gsem (1: depvar <- , family(pointmass 0)) /// (2: depvar <- $xlist1 $xlist2 M[sector], family(poisson)) /// (C <- $xlist3) (M[sector]-> $xlist1, logit), lclass(C 2) lclass(M 4) lcinvariant(none) vce(robust)

I am told that
the path from latent class variable M to observed variable to the depvar is not allowed

Do you have any suggestion to refine my code? Thank you very much!
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

08 Feb 2021, 08:39

Originally posted by Marco Greco View Post

Thanks for this useful thread, I am also trying to estimate ZIP through gsem, since I am using a multilevel model to control for firm sector such as the following:

Code:

gsem ($xlist1 $xlist2 M[sector]-> depvar, logit ) (M[sector]-> $xlist1, logit), vce(robust)

With

Code:

depvar

a dummy variable with 86% of zeros;

Code:

sector

a cardinal variable describing with 4 sectors

If I attempt to merge my code with this, with

Code:

$xlist3

a list of variables to estimate the inflation equation, I came out with:

Code:

gsem (1: depvar <- , family(pointmass 0)) /// (2: depvar <- $xlist1 $xlist2 M[sector], family(poisson)) /// (C <- $xlist3) (M[sector]-> $xlist1, logit), lclass(C 2) lcinvariant(none) vce(robust)

I obtain the following error latent variable M not found; 'M[sector]' specifies a latent variable at level '[sector]'. For 'M[sector]' to be a valid latent variable specification, 'M' must appear in the latent() option. while if I add

Code:

latent(M)

I am told that
option lclass() is not allowed with models specified with continuous latent variables

finally, if I opt for a

Code:

gsem (1: depvar <- , family(pointmass 0)) /// (2: depvar <- $xlist1 $xlist2 M[sector], family(poisson)) /// (C <- $xlist3) (M[sector]-> $xlist1, logit), lclass(C 2) lclass(M 4) lcinvariant(none) vce(robust)

I am told that
the path from latent class variable M to observed variable to the depvar is not allowed

Do you have any suggestion to refine my code? Thank you very much!

Stata can't estimate models with both continuous and categorical latent variables. This means that you can't fit a finite mixture model with random effects in Stata 16. I hope future revisions of the software take care of this issue, but right now, it is what it is.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Marco Greco

Join Date: Sep 2015

Posts: 45
#7

09 Feb 2021, 02:04

Thanks for your reply. Actually, I am not in need for M to be a continuous latent variable, how could I fix this?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

09 Feb 2021, 06:22

Originally posted by Marco Greco View Post

Thanks for your reply. Actually, I am not in need for M to be a continuous latent variable, how could I fix this?

Random effects are inherently a continuous latent variable. Remember, they’re assumed to be normally distributed with some variance that the model estimates, just like the latent trait in a SEM measurement model or an IRT model. The only fix to the problem above is to delete the random effect. In other contexts, I know that I've used the cluster-robust VCE (vce cluster(sector)) when I'm trying to fit a model where there's some clustering, but no multilevel version of the model has been defined. That might be an acceptable alternative, but I haven't tried it in gsem.

Last edited by Weiwen Ng; 09 Feb 2021, 06:40.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Marco Greco

Join Date: Sep 2015

Posts: 45
#9

09 Feb 2021, 07:13

Thank you very much, I appreciate the time you dedicate to answer my questions so quickly and thoroughly
Comment

Announcement

How to fit a zero-inflated poisson model in GSEM?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment