Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to fit a zero-inflated poisson model in GSEM?

    Hi friends,

    BACKGROUND:
    I have checked all examples showed in SEM/GSEM, but I do not find any example focusing on how to fit a zero-inflated possion model in GSEM. I have also learned similar examples showed in the Users' Guide of Mplus 8 (Chapter 7.25 example). However, unfortunately, I failed to fit the zero-inflated model using GSEM based on the information provided by UG of Mplus.

    QUESTION::
    Does anyone used GSEM to fit a zero-inflated model before? Would you mind if you could give me some sugestion, experience or diagram of ZIP model in GSEM?

    Thank you so much in advance.


  • #2
    You may already be aware of the zip command, which fits zero-inflated Poisson models. That said, if you want to use gsem, the syntax may take some getting used to, but see slides 56 onward of this recent presentation by Stata's own Rafal Raciborski.

    https://www.stata.com/meeting/poland...Raciborski.pdf

    Note that you will need Stata 15 for this to work in gsem.

    Actual code:

    Code:
    webuse fish
    zip count persons livebait, inflate(child camper)
    est store zip
    gsem (1: count <- , family(pointmass 0)) ///
    (2: count <- persons livebait, family(poisson)) ///
    (C <- child camper), lclass(C 2) lcinvariant(none)
    est store gsem
    est table zip gsem
    
    ----------------------------------------
        Variable |    zip          gsem    
    -------------+--------------------------
    count        |
         persons |  .80688527              
        livebait |  1.7572894              
                 |
     C#c.persons |
              2  |               .80688527  
                 |
    C#c.livebait |
              2  |               1.7572894  
                 |
             2.C |              -2.1784716  
           _cons | -2.1784716              
    -------------+--------------------------
    inflate      |
           child |  1.6025705              
          camper | -1.0156983              
           _cons | -.49228716              
    -------------+--------------------------
    1b.C         |
           child |               (omitted)  
          camper |               (omitted)  
           _cons |               (omitted)  
    -------------+--------------------------
    2.C          |
           child |              -1.6025705  
          camper |               1.0156983  
           _cons |               .49228716  
    ----------------------------------------
    You may already be aware of this, but /// denotes a line break. You will need to copy my block of code into a do file, and execute the entire block of gsem code for it to run properly.

    Basically, (1: ...) and (2: ...) denote the two latent classes, and you are specifying different predictors for each. The multinomial latent class was named C in this code (as opposed to Class in Rafal's slides). The block of code (C <- child camper) basically tells Stata to run a multinomial regression of child and camper on C. (It appears you don't have to specify that the regression is multinomial in this one case; in every other case in gsem, you need to tell Stata what sort of family and link, or it will assume Gaussian and identity).
    Last edited by Weiwen Ng; 13 Dec 2018, 21:38.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Originally posted by Weiwen Ng View Post
      Basically, (1: ...) and (2: ...) denote the two latent classes, and you are specifying different predictors for each. The multinomial latent class was named C in this code (as opposed to Class in Rafal's slides). The block of code (C <- child camper) basically tells Stata to run a multinomial regression of child and camper on C. (It appears you don't have to specify that the regression is multinomial in this one case; in every other case in gsem, you need to tell Stata what sort of family and link, or it will assume Gaussian and identity).
      In the block of results I showed, the coefficient estimates for the Poisson class are identical. The coefficient estimates regarding the excess zero class from gsem are -1 times the coefficients from zip, because gsem treats class number 1 as the base class. In my syntax, class 1 was defined as the excess zero class, so the coefficients represent the log odds of being in the Poisson class. In zip, that part of the output describes the log odds of being in the excess zero class. You could code it this way to get coefficients identical to zip if this was really important to you:

      Code:
      gsem (2: count <- , family(pointmass 0)) ///
      (1: count <- persons livebait, family(poisson)) ///
      (C <- child camper), lclass(C 2) lcinvariant(none)
      Also note, after gsem, you can use estat lcprob to give you the estimated probability of being in each latent class. In zip, you would have to use

      Code:
      margins, predict(pr)
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Thank you so much. Your reply is very useful. @Weiwen Ng

        Comment


        • #5
          Thanks for this useful thread, I am also trying to estimate ZIP through gsem, since I am using a multilevel model to control for firm sector such as the following:

          Code:
          gsem ($xlist1 $xlist2 M[sector]-> depvar, logit ) (M[sector]-> $xlist1, logit), vce(robust)
          With
          Code:
          depvar
          a dummy variable with 86% of zeros;
          Code:
          sector
          a cardinal variable describing with 4 sectors

          If I attempt to merge my code with this, with
          Code:
          $xlist3
          a list of variables to estimate the inflation equation, I came out with:

          Code:
          gsem (1: depvar <- , family(pointmass 0)) ///
          (2: depvar <- $xlist1 $xlist2 M[sector], family(poisson)) ///
          (C <- $xlist3) (M[sector]-> $xlist1, logit), lclass(C 2) lcinvariant(none) vce(robust)
          I obtain the following error latent variable M not found; 'M[sector]' specifies a latent variable at level '[sector]'. For 'M[sector]' to be a valid latent variable specification, 'M' must appear in the latent() option. while if I add
          Code:
          latent(M)
          I am told that
          option lclass() is not allowed with models specified with continuous latent variables

          finally, if I opt for a
          Code:
          gsem (1: depvar <- , family(pointmass 0)) ///
          (2: depvar <- $xlist1 $xlist2 M[sector], family(poisson)) ///
          (C <- $xlist3) (M[sector]-> $xlist1, logit), lclass(C 2) lclass(M 4) lcinvariant(none) vce(robust)
          I am told that
          the path from latent class variable M to observed variable to the depvar is not allowed

          Do you have any suggestion to refine my code? Thank you very much!

          Comment


          • #6
            Originally posted by Marco Greco View Post
            Thanks for this useful thread, I am also trying to estimate ZIP through gsem, since I am using a multilevel model to control for firm sector such as the following:

            Code:
            gsem ($xlist1 $xlist2 M[sector]-> depvar, logit ) (M[sector]-> $xlist1, logit), vce(robust)
            With
            Code:
            depvar
            a dummy variable with 86% of zeros;
            Code:
            sector
            a cardinal variable describing with 4 sectors

            If I attempt to merge my code with this, with
            Code:
            $xlist3
            a list of variables to estimate the inflation equation, I came out with:

            Code:
            gsem (1: depvar <- , family(pointmass 0)) ///
            (2: depvar <- $xlist1 $xlist2 M[sector], family(poisson)) ///
            (C <- $xlist3) (M[sector]-> $xlist1, logit), lclass(C 2) lcinvariant(none) vce(robust)
            I obtain the following error latent variable M not found; 'M[sector]' specifies a latent variable at level '[sector]'. For 'M[sector]' to be a valid latent variable specification, 'M' must appear in the latent() option. while if I add
            Code:
            latent(M)
            I am told that
            option lclass() is not allowed with models specified with continuous latent variables

            finally, if I opt for a
            Code:
            gsem (1: depvar <- , family(pointmass 0)) ///
            (2: depvar <- $xlist1 $xlist2 M[sector], family(poisson)) ///
            (C <- $xlist3) (M[sector]-> $xlist1, logit), lclass(C 2) lclass(M 4) lcinvariant(none) vce(robust)
            I am told that
            the path from latent class variable M to observed variable to the depvar is not allowed

            Do you have any suggestion to refine my code? Thank you very much!
            Stata can't estimate models with both continuous and categorical latent variables. This means that you can't fit a finite mixture model with random effects in Stata 16. I hope future revisions of the software take care of this issue, but right now, it is what it is.
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #7
              Thanks for your reply. Actually, I am not in need for M to be a continuous latent variable, how could I fix this?

              Comment


              • #8
                Originally posted by Marco Greco View Post
                Thanks for your reply. Actually, I am not in need for M to be a continuous latent variable, how could I fix this?
                Random effects are inherently a continuous latent variable. Remember, they’re assumed to be normally distributed with some variance that the model estimates, just like the latent trait in a SEM measurement model or an IRT model. The only fix to the problem above is to delete the random effect. In other contexts, I know that I've used the cluster-robust VCE (vce cluster(sector)) when I'm trying to fit a model where there's some clustering, but no multilevel version of the model has been defined. That might be an acceptable alternative, but I haven't tried it in gsem.
                Last edited by Weiwen Ng; 09 Feb 2021, 06:40.
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment


                • #9
                  Thank you very much, I appreciate the time you dedicate to answer my questions so quickly and thoroughly

                  Comment

                  Working...
                  X