Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-Differences analysis where outcome variables are computed over rolling windows

    I have cross-sectional data on firms between 2001 and 2017. The intervention occurred sometime during 2012. Each firm appears in the sample only once, so the firms before the intervention differ from those after the intervention. I have a treatment dummy and a time dummy and my interest is obviously in the interaction term. I also have covariates.

    I have several outcome variables which are computed over a 4-year rolling window from t-4 to t-1 and then used at time t. So, for example, the outcome variable for a given firm in 2005 is computed based on data from 2001 to 2004. Similarly, the outcome variable for a given firm in 2006 is computed from data from 2002 to 2005. So, effectively, my pre-intervention sample used in the difference-in-differences analysis consists of firms between 2005 and 2011 (although, theoretically, I am using data from 2001 onward to construct the outcome variable). I do the same for the post-intervention period. To allow for the same 4-year rolling window, the outcome variable for a given firm in 2016 is computed from data from 2012 to 2015 (I am aware that the intervention occurs sometime in 2012, so I have to ensure that I am capturing the period after the intervention). As a result, my post-intervention sample consists of firms in 2016 and 2017. I have also tried shortening the window to even as low as one year prior which obviously increases the sample before and after the intervention with qualitatively similar results.

    Assuming that I have made myself reasonably clear above, are there any issues with the above construct? Is anyone aware of papers that do something similar? Are there other things I can do, given that the outcome variables are computed based on prior data?

  • #2
    This raises a number of issues. I can point out the problems, but the solutions to these problems lie in areas that are outside my expertise, so I can't give you real guidance. Hopefully somebody else will give you a better answer than I can. But here's a start.

    1. Because you are using a rolling window, there will be serial correlation in the outcomes. So OLS will give you incorrect standard errors. I believe the solutions lie with either generalized least-squares or Newey-West standard errors, but I don't know much about these.

    2. You only have two years worth of data for the post-intervention period. That's pretty scanty--it doesn't give you a reliable estimate of any trend. Moreover, the 2016 observations include the 2012 outcome variable, which is partly pre-intervention. So for the 2016 observations the data are somewhat contaminated with pre-intervention results. It's even worse than that if the effects of the intervention are slightly lagged in reality. So you might actually only have one year's worth of clean post-intervention data. Both the paucity of post-intervention observation and the possible contamination of the first year will reduce your chances of fully detecting the effect.

    3. The logic of the difference-in-differences approach to estimating causal effect is that the only thing that is differentially changing the outcome at the moment of intervention is the intervention itself. But in your case, your "moment of intervention" extends to a four-year period. The assertion that nothing else happens differentially to the two groups in that time period is less credible a priori. You will need some kind of ancillary information to back up that claim. The usual robustness tests won't overcome this problem.

    Just how serious these problems are would depend on the specifics of the outcome variable, the nature of the intervention and the probable time course of its effects, and other substantive factors.

    Comment


    • #3
      Thanks, Clyde, for your comments.

      The outcome variables could be viewed as reputation measures, hence captured over some prior period relative to time t but used at time t. I am considering only using data from t-1 to compute them as it may help to resolve many of the issues you have raised. For example, I would have four years in the post-intervention period (2014-17).

      If you have any other thoughts, I would be happy to receive them. I am using Stata 15.1.

      Comment


      • #4
        Maybe I'm wrong, but I don't see this as a d-i-d since you don't have the same firms in both periods. I also find the idea of a rolling average on lagged values as the outcome for an intervention in t problematic - the outcome in t is influencing values from t-4 or even t-1?

        If you're collecting data, then you'd be better off collecting pre and post data on the same treated and untreated firms.

        Comment


        • #5
          @ Phil Bromley: it doesn't matter that the firms are not the same. Wooldridge, in his introductory textbook on econometrics, give the example of the effect of building a new incinerator on housing prices. The houses on the market before and after were not the same.

          Comment


          • #6
            I assume that "computed over a 4-year rolling window" means that the average is used. In that case, only a fourth of this window can contribute to the treatment effect in the first treated year. In the second year half of the composed variable comprises of treated components and so on. The respective non-treated tail causes a downward bias of the treatment effect. My suggestion is to model this bias via the treatment dummy. In the first treated period it should be 0.25 because only a fourth of the composed variable is treated, followed by 0.5 in the second treated year and so on until the window is completly in the treated range. It is not perfect but at least adjusts the magnitude of the marginal effect in comparison to fully treated windows.

            Comment

            Working...
            X