Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Deal with FEs in Tobit Models in Long Panel Data with Many Zeros?

    Hi,

    I'd like to model users' tipping behavior on a digital platform, where the decision variable equals zero if a user decides not to tip and a positive amount value when a tip is given. Tips are a corner solution response because they are restricted to be non-negative, leading to many observations of 0 donations. This is a zero-inflated, nonnegative continuous dependent variable, where I need to use fixed effects to account for unobserved heterogeneity. I intend to use within-individual variation from an unbalanced long panel (40M observations where 10K is not a positive tip amount).

    Based on the previous literature, the Tobit model sounds like a reasonable choice but I am not sure how to deal with FEs in a Tobit model. Would you please help me with this?

    Many thanks in advance,

    Best,
    Mahsa


  • #2
    Dear Mahsa Paridar,

    As you say, you have corner-solutions data, not censored data, so I would use Poisson regression rather than a Tobit. In Poisson regression, you can use fixed effects without any problem, as famously shown by Jeff Wooldridge. You also say that you have zero-inflation, but that is unlikely to be a useful concept in this case. Zero-inflation is defined with respect to some benchmark distribution, but you will not want to specify a distribution here.

    Best wishes,

    Joao

    Comment


    • #3
      Naturally, I agree with Joao. If you want to use Tobit as a comparison, you probably want to implement a version of correlated random effects. Actually including unit-specific dummies may work okay if you do have large T, but computationally it will be very difficult. In the linear case, the CRE approach is the same as fixed effects. My 2019 Journal of Econometrics paper suggests ways of using CRE with unbalanced panels.

      For effects on the mean, it's hard to beat the Poisson FE estimator as it uses no extra assumptions other than the mean is exponential and the explanatory variables are strictly exogenous.

      Comment


      • #4
        Originally posted by Joao Santos Silva View Post
        Dear Mahsa Paridar,

        As you say, you have corner-solutions data, not censored data, so I would use Poisson regression rather than a Tobit. In Poisson regression, you can use fixed effects without any problem, as famously shown by Jeff Wooldridge. You also say that you have zero-inflation, but that is unlikely to be a useful concept in this case. Zero-inflation is defined with respect to some benchmark distribution, but you will not want to specify a distribution here.

        Best wishes,

        Joao
        Thank you so much, Joao, for the response. In my case, out of 40M observations, more than 39M observations are 0 (only 10K are positive tip amount (I am sorry that I explained it wrong in my initial post)). That's why I called it zero inflated and need FEs to deal with these lots of 0s and other unobserved heterogeneities. Would you please let me know what should I call it if it is not zero inflation and what is a proper way to deal with it? Additionally, would Poisson regression with FEs still work if my data is skewed and have a huge lump at 0? I also considered zero-inflated negative binomial distribution but I am not sure if it goes well with FEs?

        I appreciate your response.

        Best,
        Mahsa

        Comment


        • #5
          Originally posted by Jeff Wooldridge View Post
          Naturally, I agree with Joao. If you want to use Tobit as a comparison, you probably want to implement a version of correlated random effects. Actually including unit-specific dummies may work okay if you do have large T, but computationally it will be very difficult. In the linear case, the CRE approach is the same as fixed effects. My 2019 Journal of Econometrics paper suggests ways of using CRE with unbalanced panels.

          For effects on the mean, it's hard to beat the Poisson FE estimator as it uses no extra assumptions other than the mean is exponential and the explanatory variables are strictly exogenous.
          Thank you so much, Jeff for the response. To explain my context, I have the tipping data of 2000 users over 2 years (730 days). For each active day of a user on the platform, I assess all content published within the preceding four days and observe users' tipping decisions. The dataset includes approximately 40M observations, of which only 10K represent positive tip amounts(39M observations are 0), hence the tip amount decision is from 0 to $20 (e.g., $2.75). I intend to control for unobserved heterogeneity at both the day and individual levels through fixed effects (FEs). To explain how big the T is, on average each user has 120 active days on the platform. Below is the summary statistics of the tips:
          Click image for larger version

Name:	1.png
Views:	1
Size:	3.4 KB
ID:	1732783



          Do you still recommend going with Poisson regression with FEs when I have this huge lump at 0? Does Zero-inflated Poisson regression or zero-inflated negative binomial also go well with FEs? Would you please let me know which one you think suits best in my case?

          If I go with the Tobit model, do you consider this T to be large enough to put all the individual and day FEs (2000 users and 730 days)?

          Would you please help me out with this?

          Many thanks in advance,

          Best,
          Mahsa
          Last edited by Mahsa Paridar; 05 Nov 2023, 15:34.

          Comment


          • #6
            I think it depends on what is your main purpose. Are you trying to predict tipping behavior of the individuals, or are you trying to determine the factors that affect tipping? Maybe you have some sort of randomized intervention? Or user and driver characteristics?

            Comment


            • #7
              Originally posted by Mahsa Paridar View Post

              Thank you so much, Joao, for the response. In my case, out of 40M observations, more than 39M observations are 0 (only 10K are positive tip amount (I am sorry that I explained it wrong in my initial post)). That's why I called it zero inflated and need FEs to deal with these lots of 0s and other unobserved heterogeneities. Would you please let me know what should I call it if it is not zero inflation and what is a proper way to deal with it? Additionally, would Poisson regression with FEs still work if my data is skewed and have a huge lump at 0? I also considered zero-inflated negative binomial distribution but I am not sure if it goes well with FEs?

              I appreciate your response.

              Best,
              Mahsa
              Stay away from the NB, but Poisson with fixed effects may work well with your data. I would say that you have a lot of zeros, but not necessarily zero-inflation. Having said that, you may have a sub-population that never tips, so you may want to take that into account but, as Jeff said, it all depends on what you want to do.

              Best wishes,

              Joao

              Comment


              • #8
                Originally posted by Jeff Wooldridge View Post
                I think it depends on what is your main purpose. Are you trying to predict tipping behavior of the individuals, or are you trying to determine the factors that affect tipping? Maybe you have some sort of randomized intervention? Or user and driver characteristics?
                Thanks, Jeff, for asking. I'd like to impose a model to fully understand users' tipping decisions and find the effects of all different factors on tipping controlling for unobserved heterogeneity as much as possible. Currently, I don't have a randomized intervention. I observe the whole population in the community and I am focusing on 2100 users who tipped at least once (I don't have never-tippers). I have the historical data on users' country and tenure on the platform, users' past tips received, average tips in the community (norm), and prior tips left to the post where the user is giving tips. I also measure the content quality as the number of impressions such as likes and comments received prior to the user's tipping decision. There is no evidence of forward-looking behavior.

                I am looking for widely acknowledged demand models that deal with the nonnegative DV with a lot of zeros and can handle the FEs.

                Do you recommend going with correlated random effect, and using a pooled Tobit with clustered standard errors? Or you would recommend other models, e.g., Poisson regression with FEs?

                Many thanks in advance,
                Mahsa
                Last edited by Mahsa Paridar; 06 Nov 2023, 10:11.

                Comment


                • #9
                  Originally posted by Joao Santos Silva View Post

                  Stay away from the NB, but Poisson with fixed effects may work well with your data. I would say that you have a lot of zeros, but not necessarily zero-inflation. Having said that, you may have a sub-population that never tips, so you may want to take that into account but, as Jeff said, it all depends on what you want to do.

                  Best wishes,

                  Joao
                  Thank you so much, Joao for the response. It was very helpful.

                  Best,
                  Mahsa

                  Comment


                  • #10
                    Yes, I'd still go with fixed effects Poisson. Putting fixed effects into any nonlinear model does not yield good statistical properties (although better with large T) and you are likely to run into computational problems. As I said, you can implement a correlated random effects version of Tobit and compare the estimated elasticities and semi-elasticities with those from Poisson regression. You appear to mostly be interested in the partial effects on the mean, and FEP is well suited for that.

                    Comment


                    • #11
                      Originally posted by Jeff Wooldridge View Post
                      Yes, I'd still go with fixed effects Poisson. Putting fixed effects into any nonlinear model does not yield good statistical properties (although better with large T) and you are likely to run into computational problems. As I said, you can implement a correlated random effects version of Tobit and compare the estimated elasticities and semi-elasticities with those from Poisson regression. You appear to mostly be interested in the partial effects on the mean, and FEP is well suited for that.
                      Thank you so much, Jeff, for all the helpful responses and insight. I really appreciate it. I have one last question: Do you think Tobit would work properly with learning models? Ideally, I believe users update their beliefs about society's tip norm through different signals they get, and it is the updated norm belief that affects users' tipping decisions. If I go with Tobit and use the correlated random effect, do you think it would be feasible to implement learning to a Tobit model as well?

                      Thank you so much for your time and all the help and insights.

                      Best,
                      Mahsa
                      Last edited by Mahsa Paridar; 09 Nov 2023, 15:31.

                      Comment

                      Working...
                      X