Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Zero-inflated data

    Hi,

    I am estimating a fixed-effects model from panel data where the outcome variable consists of positive integers and should be logarithmized. I also include three explanatory and two control variables. Around half of the dependent values are zeros (real zeros, not unreported) and the rest is normally distributed. I estimate the model with different specifications: asinh() and PPML.

    The regressions look as follows in Stata:
    For asinh:
    xtreg asinh(depvar) explvars controls i.year,fe vce(cluster )
    For PPML:
    xtpoisson depvar explvars controls i.year,fe vce(robust)

    1) Is the PPML specification correct like that?
    2) Can I speak here of zero-inflated data?
    3) Some of the results still differ across the specifications. Is that due to different interpretation units? This leads me to question 4:
    4) How do I interprete them? For example, if the coefficient is 0.86, I thought of:
    Asinh: If the explanatory variable changes by 1 unit, the dependent variable changes by 86 percentage points.
    PPML: If the explanatory variable changes by 1 unit, the dependent variable changes by (e^(0,86)-1)*100 percent.

    I would be really happy if somebody could give me some input!


  • #2
    Dear Ronny Meijer,

    1) Yes, that looks correct.
    2) I would not; inflated with respect to what?
    3) That is natural, the asinh results are difficult to interpret.
    4) The interpretation for PPML is correct (assuming the variable is not in logs); for the ashin I would not not how to interpret it.

    The upshot is that I would forget about the transformation and use PPML.

    Best wishes,

    Joao

    Comment


    • #3
      Thank you for your quick response. I continued with the work but now a new question poped up.

      1) Why is it too difficult to interprete asinh? I thought it could be interpreted as a log transformed variable?

      2) When I am using the above mentioned ppml estimation (xtpoisson depvar explvars controls i.year,fe vce(robust)), all zero outcomes are dropped. I thought the ppml is designed to account for that problem?

      Thank you for helping me.

      Comment


      • #4
        Dear Ronny Meijer,

        1) the asinh transformation is not the same as the log transformation, so the interpretation must be different.

        2) It is not that all zeros are dropped but that units for which the outcome is always zero are dropped; this is done because these observations contain no information on the parameters of interest and therefore can be ignored. This is very different from dropping zeros because you cannot log them.

        Best wishes,

        Joao

        Comment


        • #5
          Thank you so much @Joao Santos Silva! You are helping me a lot. Maybe you could help me once again? There is serial correlation in my explanatory variables. I use the vce(robust) option in my regression because there is no cluster alternative in Stata. Are the resulting standard errors robust clustered? Or do I have to account for this problem somehow else?

          Comment


          • #6
            Dear Ronny Meijer,

            Which version of Stata are you using?

            Best wishes,

            Joao

            Comment


            • #7
              I am using Stata/IC16.1

              Comment


              • #8
                The option vce(cluster ...) should be available, but I believe that the option robust gives you the same results when you specify the option fe. As an alternative, please use the user-contributed xtpqml command.

                Best wishes,

                Joao

                Comment


                • #9
                  1. When I estimate the following model: "xtpoisson depvar explvars controls i.year,fe ", I am not able to cluster by id.
                  2. When I use xtpqml, I can neither include vce(cluster id) nor year fixed effects.

                  So you think that the first regression already solves a serial correlation problem in the explanatory variable if I use vce(robust)?

                  Thank you so much and best wishes!

                  Comment


                  • #10
                    The manual states "vce(robust) invokes a cluster-robust estimate of the VCE in which the ID variable specifies the clusters"

                    So you should be fine.

                    Joao

                    Comment

                    Working...
                    X