Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tobit Difference-in-Differences model

    Hello!

    I am conducting a Difference-in-Differences test with firm and month fixed effects:

    Code:
    areg DepVar treatedxduring Controls i.month , absorb(permno) vce(cluster permno)
    However, I am a bit concerned that there is selection bias in my model, since my dependent variable does not occur regularly throughout the sample. To clarify, the dependent variable is the amount of shares traded ( if a trade occurs ). The simple DiD model above is biased and skews the results if there's a trade of small size, but doesn't take into account the observations in which there are no trades. I was thinking of using a Tobit Regression , but I am not familiar with the coding. Furthermore, I haven't found anyone using the Tobit model in a DiD setting. I am not really sure whether it would work in this type of setting since the Tobit model is non-linear.

    I would appreciate any help in structuring as well as interpreting the model.

    Best regards,
    Fanetti


  • #2
    Dear all,

    I would very much appreciate any comments.

    As mentioned above, I am estimating the effect of a natural experiment on a group of traders. I am using panel data with monthly observations. Treated represents the treated sample and During is the period in which the treatment effect is active. The problem for which I am reaching out for advice is that the individuals don't trade that often: around 30% of the time. Trade is a dummy variable if a trade occurs in a given month. Amount is the amount of shares traded (as a percentage of all shares; very small continuous number, less than 1%) IF a Trade=1.

    So far, I have done a logit analysis with fixed effects:
    Code:
    clogit Trade i.treated##i.during Controls i.month , group(permno) vce(cluster permno)  (1)
    And a linear regression with fixed effects on the amount of trade:
    Code:
    areg Amount i.treated##i.during Controls i.month, absorb(permno) vce(cluster permno)  (2)
    In the second regression I tried using both the censored (treat Amount as 0 if there isn't any trade happening) and truncated (only if Amount > 0) samples. Both have their advantages and disadvantages.

    I believe that in this type of problem the model that I am using (especially (2)) is biased. I thought about the Tobit model in such case where the dependent variable is limited. However, I am not sure whether it would work in a Difference-in-Differences setting. I haven't found anything on the web. I could also consider alternative models, preferably not extremely complicated.

    Thank you!
    Cheers

    Comment


    • #3
      Any advice please?
      It's a bit urgent..

      I get negative coefficients for treated*during for both (1) and (2) regressions. However, the p-values are slightly above the 10% threshold. I am looking for a model that could combine the two and measure the overall impact of the treatment. I was reading about the Tobit and the Truncated regression. However, haven't found relevant sources in the context of DiD with limited dependent variable...

      Comment


      • #4
        I have had only a few occasions to ever use -tobit-, so I do not pretend to have a deep understanding of it. As nobody else has responded to your question, I'll give you my opinion. Take it for what it's worth and use at your own risk.

        I can't think of any reason why the DID approach would care whether you are using -regress-, -tobit-, -xtreg-, etc. I think any regression command can work with DID. The real question is whether a Tobit regression is an appropriate model for your data. And as I understand it from your description, the answer would be no. Tobit regression is for censored outcomes. A censored outcome, in your case would mean that the amount of shares traded could, in reality, be negative but, for some reason, in your data negative values of shares traded are recorded as zero instead. That would be censored data, and would be suitable for -tobit-. But as I understand it, this is not the case here.

        My reading of your description is that the observations where the number of shares traded is recorded as 0 are real zeroes. It seems that in your analysis, however, you want to distinguish "0 shares traded" for "no trading." So this sounds like perhaps a Heckman selection model? Or maybe a zero-inflated Poisson, or something like that. I'm not really sure, as I don't really understand the theoretical/conceptual reason for distinguishing "no trading" from "0 shares traded." But I'm pretty sure that this is not censored outcomes, and -tobit- would not be a valid representation of your data generating process.

        Finally, I should add that it is not science to go shopping for models that give you desired "significant" results. That's called p-hacking, and some people would consider it fraud if you present results from such an adventure as anything other than a wide-ranging exploratory analysis that generates, but does not actually test, hypotheses.

        Comment


        • #5
          I hesitate to disagree with Clyde, but I thought that tobit was appropriate for cases where you can't have more or less than some limit - can't buy less than one beer, etc. It assumes some continuous variable (e.g., desire to buy beer), but you just can't get beyond a limit. So, I would see tobit as appropriate when you have purchases of number of shares with no possibility of negative values (i.e., sales of shares). This is different than cases where you don't observe an action (and sometimes such no observed actions are erroneously coded as zeros).

          Comment


          • #6
            First, nobody should ever hesitate to disagree with me.

            Second, we don't actually disagree about the statistical principle and the purpose of -tobit-:

            I thought that tobit was appropriate for cases where you can't have more or less than some limit - can't buy less than one beer, etc. It assumes some continuous variable (e.g., desire to buy beer), but you just can't get beyond a limit.
            There is no disagreement between us on this. Perhaps the failure to trade on a given day does represent a latent, but unachievable, desire to trade a negative amount of shares. In that case, the observed value of 0 is truly a censored observation and -tobit- is a reasonable way to go.

            Where we disagree is that I have the impression from the original post (and another thread from which this thread is an offshoot) that that isn't what's happening. I have the impression, rather, that some traders are just "out of the action" on some days and that this state has nothing to do with how many shares they "would have liked to have traded" had they been engaged in trading that day. If this is what's going on, then the zeroes are not censored versions of negative values and -tobit- is the wrong approach. (In fact, if this is what's happening, the best solution may be to treat those observations as "not in universe" for the study and simply -drop- them.)

            I think Fanetti Mazakura will have to apply his understanding of how his data were generated to figure this out. I think it boils down to clarifying what the distinction between "didn't trade" and "0 shares traded" represents.

            Comment


            • #7
              Guys, thank you very much for your response!

              I see Clyde's point on the Tobit model and I acknowledge it.
              Originally posted by Clyde Schechter View Post
              I have the impression, rather, that some traders are just "out of the action" on some days and that this state has nothing to do with how many shares they "would have liked to have traded" had they been engaged in trading that day.
              That is correct. Some traders just decide not to trade, perhaps as a result of the treatment or due to whatever reasons.

              Please, don't get me wrong, I am not trying to find a model that gives me significant results. I totally agree that it's not fair to present such results. I am just trying to figure out which model would most accurately test my hypothesis, that is, a model that would estimate the impact of the treatment on the two decision variables - to trade or not to trade, and how much.

              I also see the limitations of my model described in #1: the Trade dummy doesn't really tell much, and the Amount variable is limited, and thus selection bias is present. For instance if a people decide to trade, but trade low amount of shares, the mean is skewed to the left. I could go forward with it and analyse the Trade dummy and Amount of trade separately: Estimate the effect of the treatment on the likelihood of trading and the amount of trading. However, if both have the same coefficient sign for Treated*During, but with slightly above 10% p threshold, I would reject the hypothesis. Nevertheless, the combined effect might be large enough to tell meaningful results.



              Comment


              • #8
                I hope you see my concerns. Do you think I should stick with the model I am using until now? Simply analysing the two decision variables separately and interpreting the results of the two regressions:
                Code:
                clogit Trade i.treated##i.during Controls i.month , group(permno) vce(cluster permno) (1)
                Code:
                areg Amount i.treated##i.during Controls i.month, absorb(permno) vce(cluster permno) (2)
                Or, given my concerns stated above, should I consider using a more complicated model?

                The Heckman two-step does sound like a solid choice. I am not really familiar how the model works, though.

                In my data, the explanatory variables in the 1st stage are quite the same as in the 2nd stage. I tested out this command:

                Code:
                 heckman Amount treated##during Controls, select(treated##during Controls) twostep
                However, I am not entirely sure how to interpret the results. It seems that every single variable is significant.

                Furthermore, would I be able to include fixed effects? From my understanding I could manually do the model by retrieving lambda=phi/PHI and include it as an additional regressor in the 2nd stage. I really appreciate your time and help!

                Comment


                • #9
                  I'm sorry I can't advise you here. These are commands that I know little about. I hope that somebody else who is more knowledgeable in this domain will speak up.

                  Comment


                  • #10
                    Dear Fanetti, A discussion can be found here (https://stats.stackexchange.com/ques...-specification). However, no theoretical justification is offered.

                    Ho-Chuan (River) Huang
                    Stata 19.0, MP(4)

                    Comment

                    Working...
                    X