Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Non-negative count variable with many zeros in staggered Difference-in-Difference design

    Hi all,

    First of all, I would like to apologize if my question sounds rather ignorant. I am neither a programmer nor a statistician, thus this question might seem trivial for some of you.

    I would like to implement a staggered Difference-in-Difference design for a non-negative count variable with many zeros. For the traditional TWFE DiD estimator, there is the ppmlhdfe package which readily takes care of it. What are my options for a staggered DiD adoption?

    My preferred estimator is Callaway & Sant’anna’s csdid, but I am also aware of the other ones (eventstudyinteract, did_multiplegt_dyn, jwdid). From these commands, only Woolridge’s jwdid command seems to integrate a poisson estimation method directly. However, I would like to use the "notyettreated" as my comparison group while testing for pre-trends. I have read that regression-type methods should not easily allow that, yet only jwdid excludes them from the output when specifying the comparison this way (for reasons I frankly do not understand).


    Are non-negative count outcomes with many zeros a large issue for these estimators, similar to the TWFE estimator? I found this technical paper (see link) which claims just that. My options in applying staggered DiD designs for count variables seems to be very limited.
    Can somebody give me some perspective?

    Thanks a lot!!

  • #2
    Sorry, the link does not seem to work so here in text: https://papers.ssrn.com/sol3/papers....act_id=4859576

    Comment


    • #3
      JWDID will be your best bet indeed.
      But you are right, you cannot test for pre-trends with not-yet treated, because correctly doing that requires more regressions than a single one (since comparison group need to be changed for each pre-period)
      Others will also work foryour data, but just as in the OLS vs Poisson discussion, the point remains that OLS assumes linearity, which may not work when your data is non-linear, and requires a non-linear model estimator.
      F

      Comment


      • #4
        Hi Fernando, thanks for your input.

        What about Callaways & Sant'anna's estimator, which is not a linear one?

        Comment


        • #5
          Hi Stefan. Callaway and Sant'Anna assumes that conditional parallel trends holds in the levels -- just like all other estimators except for the Poisson and logit and fractional logit estimators that I discuss in my work. The doubly robust CS augmented AIPW estimator has some resiliency to misspecified functional form, but not violation of the levels PT assumption. At a minimum, you can try jwdid with a linear mean, jwdid with an exponential mean (with the assumption being that PT holds in the log of the mean, not the change in the level), and csdid. YOu can always start by using the never treated units and then add "never" to jwdid to test for pre-trends.

          Comment


          • #6
            Hi Jeff, thanks a lot for your input too.
            YOu can always start by using the never treated units and then add "never" to jwdid to test for pre-trends.
            I assume you mean I can start with the notyet treated units and then add the never ones? Thing is I do not have never treated units, but I get your point. Thanks a lot, I will follow your advice!

            Comment


            • #7
              even if you have NO never treated, If you restrict your sample accordingly, the Never could be those treated last

              Comment


              • #8
                Stefan: Yes, that's what I meant -- sorry.

                Comment


                • #9
                  Hi, thanks again for your help!

                  I have a follow-up question: I am struggling to implement my model with the jwdid+poisson estimator.
                  Whereas csdid uses pre-treatment covariates to match treated with control units, jwdid relies on time-variant covariates. I have fine-grained data so even just adding one covariate makes the command compute forever. What is your practical advice? I am aware of the different options (xasis, exovar, xtvar etc. ) but strangely, these options do not speed up the process much + it is quite important to control for some unit characteristics.

                  Could you give me a rundown of the different options (for example what to consider when xtvar vs xgvar vs xattvar). and potential options to make it compute quicker?

                  A bit about my setting:
                  I have monthly data on municipalities (~5.600) over 10 years and want to examine the impact of a treatment on municipality-level raw count data with many zeros. I have already aggregated the time period to 3-months period, to reduce the sample size. I could aggregate if further, but short-term changes around treatment adoption might be crucial.

                  Again, many thanks for your support!

                  Comment


                  • #10
                    With so many time periods and such a large N, I’d try using Poisson regression with effects restricted to be constant by exposure time. You can see if the estimates are much different from the unrestricted model estimated by jwdid after estat event. If they’re close without covariates then I’d probably use the restricted model. I have a Stata do file for that somewhere.

                    Comment


                    • #11
                      Alright, that sounds good, thank you Jeff!

                      So just to be sure, my command flow would be something like:

                      Code:
                      ...
                      
                      ** Unrestricted model, no covariates
                      jwdid y, ivar(i) tvar(t) gvar(g) method(poisson)
                      estat event
                      
                      ** vs restricted model, no covariates
                      jwdid y, ivar(i) tvar(t) gvar(g) method(poisson) hettype(event)
                      estat event
                      
                      ** if similar, add time-invariant covariates
                      jwdid y x1 x2 x3, ivar(i) tvar(t) gvar(g) method(poisson) hettype(event)
                      estat event
                      Is this correct?

                      I am also confused whether I should use 'option(poisson)' or 'option(ppmlhdfe)'. They show very different results for my estimations. Can both be used with time-invariant (pre-treatment) covariates?

                      Thanks again.

                      Comment

                      Working...
                      X