Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Significant pre-trends despite exact matching

    Hi Statalisters,

    I have encountered an issue whereby I discover significant pre-trends in a dynamic diff-in-diff/event study set-up despite having exact matches among pairs in the pre-period. Granted, I suspect that this is more of a stats, rather than programming, question, but I thought I'd still give it a shot here (on the off-chance that it might not be or that someone can help regardless).

    Consider the following simple set-up (data example at the bottom): I observe units for up to five periods before being treated and then once more after having received treatment. Units are matched in pairs such that each treated unit has an untreated pair. The outcome variable is whether or not a person ends up in a salaried job. To ensure comparability, the control cases and matches have the exact same history in terms of having been/not been in salaried employment before the treatment (i.e. there is no variation on the salaried variable within the matched pair before treatment occurs) at t-1, t-2, and t-3. I set-up a relatively simple event-study regression in which I include matched-pair- and time (here: wave) fixed effects and time-to-treatment dummies for the treated units (dropping the t-1 dummy as the baseline).

    In Stata, I estimate the following regression (note: user-written reghdfe command) in which the matched_group identifies the matched pair, wave identifies the time period, salaried is a binary indicator for salaried employment, pre*event indicates the pre-treatment dummies, and post0event2 indicates the one post-treatment dummy.

    Code:
            reghdfe salaried  pre2event pre3event pre4event pre5event post0event2 , a(matched_group wave) vce(cluster matched_group)
    Based on my understanding, this should result in point estimates for pre2event and pre3event that are (exactly?!) 0. I don't understand why they are not (note that they are both insignificant, but when I run this regression on the full sample, they tend to be significant). Am I misremembering my core stats or is there something wrong with my implementation? Any help would be highly appreciated!!

    Please find a reproducible data example below.



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(matched_group ind_id) byte(wave pre2event pre3event pre4event pre5event waves_til_treat post0event2) float salaried
     35     35 11 1 0 0 0 -2 0 0
     35 223381 11 0 0 0 0 -2 0 0
     35     35 12 0 0 0 0 -1 0 0
     35 223381 12 0 0 0 0 -1 0 0
     35     35 14 0 0 0 0  0 1 0
     35 223381 14 0 0 0 0  0 0 0
     49     49 13 0 0 0 1 -5 0 0
     49  49178 13 0 0 0 0 -5 0 0
     49     49 14 0 0 1 0 -4 0 0
     49  49178 14 0 0 0 0 -4 0 0
     49     49 15 0 1 0 0 -3 0 0
     49  49178 15 0 0 0 0 -3 0 0
     49     49 16 1 0 0 0 -2 0 0
     49  49178 16 0 0 0 0 -2 0 0
     49     49 17 0 0 0 0 -1 0 0
     49  49178 17 0 0 0 0 -1 0 0
     49     49 19 0 0 0 0  0 1 0
     49  49178 19 0 0 0 0  0 0 0
     87     87 12 0 0 1 0 -4 0 0
     87  31031 12 0 0 0 0 -4 0 0
     87     87 13 0 1 0 0 -3 0 0
     87  31031 13 0 0 0 0 -3 0 0
     87     87 14 1 0 0 0 -2 0 0
     87  31031 14 0 0 0 0 -2 0 0
     87     87 15 0 0 0 0 -1 0 0
     87  31031 15 0 0 0 0 -1 0 0
     87     87 21 0 0 0 0  0 1 0
     87  31031 21 0 0 0 0  0 0 0
    141    141 11 1 0 0 0 -2 0 0
    141 348518 11 0 0 0 0 -2 0 0
    141    141 12 0 0 0 0 -1 0 0
    141 348518 12 0 0 0 0 -1 0 0
    141    141 20 0 0 0 0  0 1 0
    141 348518 20 0 0 0 0  0 0 .
    175    175 11 1 0 0 0 -2 0 0
    175  33145 11 0 0 0 0 -2 0 0
    175    175 12 0 0 0 0 -1 0 0
    175  33145 12 0 0 0 0 -1 0 0
    175    175 14 0 0 0 0  0 1 0
    175  33145 14 0 0 0 0  0 0 1
    183    183 15 0 0 0 1 -5 0 1
    183 122812 15 0 0 0 0 -5 0 1
    183    183 16 0 0 1 0 -4 0 1
    183 122812 16 0 0 0 0 -4 0 1
    183    183 17 0 1 0 0 -3 0 1
    183 122812 17 0 0 0 0 -3 0 1
    183    183 18 1 0 0 0 -2 0 1
    183 122812 18 0 0 0 0 -2 0 1
    183    183 19 0 0 0 0 -1 0 1
    183 122812 19 0 0 0 0 -1 0 1
    183    183 25 0 0 0 0  0 1 1
    183 122812 25 0 0 0 0  0 0 1
    185    185 11 0 1 0 0 -3 0 0
    185  26147 11 0 0 0 0 -3 0 0
    185    185 12 1 0 0 0 -2 0 0
    185  26147 12 0 0 0 0 -2 0 0
    185    185 13 0 0 0 0 -1 0 0
    185  26147 13 0 0 0 0 -1 0 0
    185    185 15 0 0 0 0  0 1 0
    185  26147 15 0 0 0 0  0 0 1
    188    188 13 0 0 0 1 -5 0 0
    188  71239 13 0 0 0 0 -5 0 0
    188    188 15 0 1 0 0 -3 0 0
    188  71239 15 0 0 0 0 -3 0 0
    188    188 16 1 0 0 0 -2 0 0
    188  71239 16 0 0 0 0 -2 0 0
    188    188 17 0 0 0 0 -1 0 0
    188  71239 17 0 0 0 0 -1 0 0
    188    188 22 0 0 0 0  0 1 0
    188  71239 22 0 0 0 0  0 0 0
    207    207 11 0 0 0 1 -5 0 0
    207 369660 11 0 0 0 0 -5 0 0
    207    207 13 0 1 0 0 -3 0 0
    207 369660 13 0 0 0 0 -3 0 0
    207    207 14 1 0 0 0 -2 0 0
    207 369660 14 0 0 0 0 -2 0 0
    207    207 15 0 0 0 0 -1 0 0
    207 369660 15 0 0 0 0 -1 0 0
    207    207 20 0 0 0 0  0 1 .
    207 369660 20 0 0 0 0  0 0 0
    211    211 12 0 0 0 1 -5 0 0
    211 108777 12 0 0 0 0 -5 0 0
    211    211 13 0 0 1 0 -4 0 0
    211 108777 13 0 0 0 0 -4 0 0
    211    211 14 0 1 0 0 -3 0 0
    211 108777 14 0 0 0 0 -3 0 0
    211    211 15 1 0 0 0 -2 0 0
    211 108777 15 0 0 0 0 -2 0 0
    211    211 16 0 0 0 0 -1 0 0
    211 108777 16 0 0 0 0 -1 0 0
    211    211 18 0 0 0 0  0 1 0
    211 108777 18 0 0 0 0  0 0 0
    242    242 14 0 0 0 1 -5 0 1
    242 239596 14 0 0 0 0 -5 0 1
    242    242 15 0 0 1 0 -4 0 1
    242 239596 15 0 0 0 0 -4 0 1
    242    242 16 0 1 0 0 -3 0 1
    242 239596 16 0 0 0 0 -3 0 1
    242    242 17 1 0 0 0 -2 0 1
    242 239596 17 0 0 0 0 -2 0 1
    242    242 18 0 0 0 0 -1 0 1
    242 239596 18 0 0 0 0 -1 0 1
    242    242 20 0 0 0 0  0 1 0
    242 239596 20 0 0 0 0  0 0 .
    end

  • #2
    Your data are not as you think they are.

    Run
    Code:
    sort matched_group wave ind_id
    browse matched_group wave ind_id pre2event
    and you will see that the pre2event variable is, in fact, not always the same in the members of the pairs in a given wave.

    The same is also true of the other pre*event variables.

    Something went wrong with your matching.

    Comment


    • #3
      hi Clyde Schechter, thanks for taking a look! To clarify, variation in the pre*event variables is on purpose. These are the dummies indicating the unit that is/will be treated. In a staggered adoption setup pre2event thus takes the value "1" for the treated units two waves before treatment occurs (as such during that wave in each pair there should be one unit for which pre2event is 0 and one for which it's 1).

      However, there is no (within-pair) variation in the outcome variable, salaried, before treatment occurs (at least during t-1 thru t-3). I.e. the underlying data have (forced) equal values on the outcome variable pre-treatment. However, the pre-trend indicators (pre2event, pre3event) seem to suggest that there is a difference between (to-be) treated and never-treated units in terms of salaried status before treatment.

      Comment


      • #4
        OK, I misunderstood your data. But with your clearer explanation in #3, I think you are misunderstanding your regression. The coefficient of the variable pre2event represents the expected difference in salaried between observations with pre2event = 1 and those with pre2event = 0. While it is true that among observations with pre2event = 1, the matched pairs agree on salary, once we get to the observations with pre2event = 0, which includes those that are post-treatment, the pairs are no longer constrained to agree on salaried. And they are not constrained to agree with their pre2event = 1 value of salary. So what this non-zero coefficient, a small negative, is telling you is that overall there is a slight tendency for salaried to change from 1 to 0 as we pass from 2 years before treatment to post-treatment. Whether this is an actual treatment effect or a secular trend is not revealed by the regression alone, of course. But the thing to remember is that the coefficients of pre2event...pre5event are telling you about the net direction of change in salaried over time. The matching may affect that, but there is no reason to think the matching will force it to zero.

        If you take your -reghdfe- and add an -if waves_til_treat < 0- clause to it, you will get zero coefficients for these variables (well, with very small rounding errors on the order of 10-17.)

        Comment


        • #5
          Thank you Clyde, that makes a lot of sense! I appreciate the help

          Comment

          Working...
          X