Difference-in-differences with multiple treated groups

Ashley Joy

Join Date: Jan 2023

Posts: 1
#1

Difference-in-differences with multiple treated groups

16 Jan 2023, 11:34

Hi all,

I want to use the difference-in-difference method to estimate the effect of treatment at multiple distances to the treatment location. I have a house price dataset from 2000 to 2020 and a public infrastructure built in 2010. The effect of this infrastructure on house prices is likely to decrease as the distance to public amenities increases. To capture this effect, I created four distance dummies for walking distances: 0-500m,500m-1000m, 1000m-1500m, and 1500m-2000m.

My question is:

Is using multiple distance dummies and multiple interactions in a DID model possible? For example:

where f(j) is location and g(t) time-fixed effects.

Or do I need to divide the data into subsamples for corresponding distances and run the regression separately for each distance group? For example, for the distance 0-500m,

Any advice would be appreciated. Thank you!

Last edited by Ashley Joy; 16 Jan 2023, 11:36. Reason: difference-in-differences, hedonic price, treatment effect
Tags: difference-in-differences, hedonic, policy analysis, regression, treatment
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#2

16 Jan 2023, 12:25

I would use the first approach. It will make it very easy to compare the treatment effects at different distances (rather than compare the treatment effect at a specific distance range to any out-of-range difference.) In fact, if your belief that the effect will be a decreasing function of distance, then a model looking like, say ln(Pitj) = b0 + b1*Dist500_1000m + b2*Treatment + b3*Treatment*Dist500_1000m) + error terms will be incoherent, since the non-Dist500_1000m group will be a mixture of locations closer to and farther from Dist500_1000m, so the comparison will be incoherent.

The only reservation I have about the first approach is that the distance ranges you have chosen, 0-500, 500-1000, 1000-1500, 1500-200, with > 2000 as the reference category, seems rather arbitrary, based more on round numbers than on anything else. While I am normally one to argue against making categorical variables out of inherently continuous ones, here I think it is probably a good idea because a simple functional relationship (linear or otherwise) of effect with distance strikes me as unlikely. So the use of categories makes sense. But I would make some attempt, based on an understanding of what the public infrastructure was and how that might affect housing prices and propensity of current homeowners to move, to, if possible, choose other cutoffs that reflect those considerations. I might also use more than five categories here.

That said, you might also want to see if some continuous model of distance works (almost) equally well. I don't know if this public infrastructure is of the type that would tend to raise housing prices (parks, convenient transit station) or one that would tend to lower them (prison). But for one that might raise them, using ln(distance) would provide rapidly diminishing returns, or for one that might raise them, using exp(-distance) would smoothly taper off from some high effect close on down to zero far away.
Comment

Announcement

Difference-in-differences with multiple treated groups

Comment