No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Diff-in-diff with multiple treatment groups


    I am working with a set of 62 million transactions for 8 different grocery stores from the same chain spread across the country. The time span is from july 2015 to July 2018. New, competing stores have opened near four of the stores in my dataset. They have all opened on different dates.

    The idea is to conduct a difference-in-difference analysis. All examples I can find online involve only one treatment group, however I have four groups.

    So far I have worked out two approaches to the analysis. The first one is to code dummies for each treated store (store1, store2, store3, store4) and four time dummies (T2_store1, T2_store2, T3_store3, T3_store4) to indicate when the different stores have had an opening nearby. Then I get the following model:

    xtreg ln_sales store1 ##T2_store1 store2##T2_store2 store3##T2_store3 store4##T2_store4

    I would interpret the coefficients on the interaction terms as the percentage change in sales after treatment. Does this make sense?

    The second approach is to code only two dummies: The first one is treatment, indicating if the transaction observed is from one of the four treatment groups. The second one is T2, taking the value one if the transaction is observed after the date of a nearby opening for the respective store.

    the model would look like this:

    xtreg ln_sales##treatment T2,

    and I would interpret the interaction term as I did for the four different interaction terms in the first model, except from the fact that we now are looking at the general effect of having a competing store opening nearby.

    I’m very confused by the fact that my treatment groups are treated on different dates and not sure if the regressions above make sense. One very simple solution would be to split the data set into four subsamples, one for each treated grocery store, and then running the regressions separately, just like all the other diff-in-diff examples I’ve seen online. I would, however, prefer to use one of the first models. Can I do so?

    I also feel like I should exploit the fact that I’m given panel data to work with - would be grateful for any inputs on how this can be done.
    Last edited by Milla Hanzon; 11 Sep 2018, 10:59.

  • #2
    The data you have are not suitable for a classical difference in differences (DID) analysis. You can, instead, use generalized DID. See for a full explanation of the method and how it works.

    As applied to your situation, you would need a variable, call it under_competition, that takes on the value 1 in those observations where the observation's store is one of those that became subject to competition and the date is after that store became subject to competition; 0 in all other observations. It is analogous to an interaction term between treatment group and pre-post intervention variables; but in your design it is not possible to define a single pre-post intervention variable.

    Then you use regression with both store and time fixed effects. Let's say, for sake of illustration, that your outcome variable is called sales, and that your observations are aggregated up to the daily level, so you have a variable, called date, and sales represents the total of sales for that observation's store on that observation's date. Then your code would be something like:

    xtset store date
    xtreg sales i.under_competition, fe
    You don't actually show any example data, but from your description I'm inferring that your data are not actually aggregated up to the daliy level, but that you have individual transactions. That data set will be very large and very noisy, so some aggregation to a coarser level seems warranted. I used date for illustration, but it might make more sense to do it at the week or month level, or something like that.