Asking this as a generic question, so say, my data is repeated cross-sectional, some treatment happened in 2007 (e.g. a university teaching reform), I want to examine the effect on some outcome (e.g. wage of its graduates) and I have data going back all the way to 1980. If I write my DiD up as:
...then including all years back to 1980 would increase my n, but intuitively would not be correct. So what is the rule of thumb here / what are some considerations on how many pre-treatment periods to include? Those where the parallel trends assumption seems to hold? Back untill some exogenous shock relevant to the outcome happened to the treated/control units?
Code:
regress outcome i.treatment i.clustervariable i.timevariable, cluster(clustervariable)
Comment