Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to deal with zero entries in log-linear model

    I have a panel data with weekly sales as the dependent variables and a set of covariates predicting the influence on the dependent variables. In some weeks, there could be no sales, hence there will be zero entries in some weeks. I found that the distribution is not normal and hence wanted to use a log-linear model. I have a question regarding the treatment of the zeros dependent variables as one could not take log on these entries. Should I leave it as it is or is there an approach to address this issue? Since, I'm using State, State will automatically ignore these entries. Thanks.

  • #2
    Use a generalized linear model. You're modeling the expectation of Y conditional on X. You don't need to transform the data, you handle it in the link function and/or through the distribution family. for example,

    Code:
    glm y x, family(gaussian) link(log)
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Dear Frederick.

      I agree that you should not log your data and should instead use a GLM. However, I suggest you use the Poisson family rather than the Gaussian; this can also be done by using Stata's commands for Poisson regression. The main reason to prefer the Poisson option is that it is very robust and likely to partially deal with the natural heteroskedasticity of the data.

      Best regards,

      Joao

      Comment


      • #4
        Thanks Joao and Weiwen. I've tried to run the following and got the results below. How would I interpret the results? Would it be based on a log-linear model or a linear-linear model interpretation of the dependent variable?

        glm weekly_spent after_member [pweight=_pscore], family(poisson) link(log) robust

        ----------------------------------------------------------------------------
        (1)
        wspent
        ----------------------------------------------------------------------------
        main
        after_member 0.2117***
        (0.0163)


        The main variable of interest is the interaction between after store opening and membership (i.e. after_member). After the store is opened and after I became a member, my weekly spending increase by the coefficient size (i.e. $0.2117) or the coefficient size X 100% = 21.17% increase. Thanks

        Comment


        • #5
          Actually, one thing that can help is to add the eform option, which exponentiates the coefficients. Your model is actually saying that when after_member is 1 (I assume it's categorical, 1 and 0), the average weekly spending increases by exp(0.2117) = 1.236 times.

          If you wanted to explore an interaction, here's some alternative coding:

          Code:
          glm weekly_spent i.after##i.member [pweight=_pscore], family(poisson) link(log) robust
          I'm not that familiar with economics terminology, but I think this corresponds to the log-linear interpretation (but I defer to whoever knows the terminology better).

          Joao's post is very interesting. I did not know about this, but I have been hearing him and maybe some others on Stata say this, so I will go search posts when I have time.

          For what it's worth, I occasionally have modeled healthcare spending. Usually, people have defaulted to a gamma distribution and a log link, and my main methods professor has done so as well. He is an econometrician. However, I think I remember him saying that the situation was not that clear cut, and that the best family might be somewhere between gamma and Poisson.
          Last edited by Weiwen Ng; 06 Feb 2017, 15:18.
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment


          • #6
            Dear Frederick,

            With Poisson regression, the interpretation is exactly as in a model with the logged dependent variable. So, if a regressor is logged its coefficient is an elasticity, otherwise is a semi-elasticity.

            Best wishes,

            Joao

            Comment


            • #7
              Hello Weiwen: Regarding your comment about gamma family assumptions with healthcare spending outcomes: In footnote #6 of a now rather old paper (Journal of Health Economics 20 (2001), pp. 461–494) Will Manning and I argued that the gamma assumption was a "natural" baseline assumption in the case of outcomes like healthcare expenditures when one is appealing to a GLM log-link.
              Best wishes,
              John Mullahy

              Comment


              • #8
                Thanks all! I have another question related to the zero-inflated poisson regression model. I'm trying this out too. As in Stata, you need to specify the inflate option. I was wondering in my case if I could use the dependent variable in the inflate option? That is:

                Code:
                zip weekly_spent after_member [pweight=_pscore], inflate(weekly_spent) robust
                Thanks.

                Best,
                Fred

                Comment


                • #9
                  Originally posted by John Mullahy View Post
                  Hello Weiwen: Regarding your comment about gamma family assumptions with healthcare spending outcomes: In footnote #6 of a now rather old paper (Journal of Health Economics 20 (2001), pp. 461–494) Will Manning and I argued that the gamma assumption was a "natural" baseline assumption in the case of outcomes like healthcare expenditures when one is appealing to a GLM log-link.
                  Best wishes,
                  John Mullahy
                  John, thanks very much, I will go read that!

                  Originally posted by frederick lim View Post
                  Thanks all! I have another question related to the zero-inflated poisson regression model. I'm trying this out too. As in Stata, you need to specify the inflate option. I was wondering in my case if I could use the dependent variable in the inflate option? That is:

                  Code:
                  zip weekly_spent after_member [pweight=_pscore], inflate(weekly_spent) robust
                  Thanks.

                  Best,
                  Fred
                  Fred, I don't believe you can use the DV in the inflate option. Why would you want to, anyway, and what meaning would it have? You can and probably should simply dump in the independent variables on the left hand side of the regular Poisson equation. FYI, you can use the vuong option to request the Vuong test of ZIP vs regular Poisson (test rejecting means favor the ZIP model).

                  Also, linking a prior thread where someone asked about a ZIP model, and someone else was not that fond of them and posted some good commentary:

                  http://www.statalist.org/forums/foru...flated-poisson
                  Last edited by Weiwen Ng; 06 Feb 2017, 21:17.
                  Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                  When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                  Comment

                  Working...
                  X