Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about how -xthdidregress- calculates the pre-period ATET?

    I'd like to understand how the -xthdidregress ra- command calculates the ATET. I'm using Stata/MP 18.0. I'm able to manually calculate the ATET correctly for each period after treatment, but my estimates are incorrect in the periods prior to treatment.

    Setup

    I have panel data on four states from time period t=15 to 40 (data copied below). One state, West Virginia (state_code=50), is treated beginning in period t=22. The other states are never treated. Using -xthdidregress-, I calculate the ATET in each period:

    Code:
    xtset state_code t 
    
    xthdidregress ra (yvar) (treatment_var), group(state_code)
    
    estat atetplot
    which gives the following estimates:
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	40.9 KB
ID:	1720127
    My understanding was that these coefficients represent the difference in yvar in each period between the treated state and the never treated states. Since there is no coefficient for t=15, I assume that we are normalizing it to 0. In order to make sure I understand correctly, I try to recreate these coefficients from scratch.

    Attempt 1

    Code:
    * Trying to manually calculate the ATET
    preserve
        gcollapse yvar, by(_did_cohort t)
        reshape wide yvar, i(t) j(_did_cohort)
        gen difference = yvar22 - yvar0
        gen center = difference if t==15
        replace center = center[1]
        gen atet = difference - center
        twoway connected atet t, xline(21) ///
            xlabel(10(10)40) yline(0) ylabel(-15(5)5)
    restore
    which gives the following estimates:
    Graph2.png
    The pattern seems about right, but the estimates are clearly different.

    Attempt 2

    If instead I recenter around the t=21, the period immediately before treatment, I get the coefficients for t>=22 exactly right. But, the earlier coefficients are still wrong.
    Code:
    preserve
        gcollapse yvar, by(_did_cohort t)
        reshape wide yvar, i(t) j(_did_cohort)
        gen difference = yvar22 - yvar0
        gen center = difference if t==21
        replace center = center[7] // bit of a hack to recenter
        gen atet = difference - center
        twoway connected atet t, xline(21) ///
            xlabel(10(10)40) yline(0) ylabel(-15(5)5)
    restore
    Graph3.png

    Attempt 3

    Finally, here is an equivalent version of attempt 2 that doesn't use -gcollapse-.

    Code:
    * Equivalent approach without collapsing
    reg yvar i.t if _did_cohort==0
    predict mhat, xb
    
    gen atet = yvar - mhat if _did_cohort==22
    gen center = atet if t==21
    bysort _did_cohort: egen center_max = max(center)
    replace atet = atet - center_max
    twoway connected atet t if _did_cohort==22, xline(21) ///
        xlabel(10(10)40) yline(0) ylabel(-15(5)5)
    In summary, I can calculate the ATET correctly for the post-treatment periods, but I'm clearly missing something as far as the pre-treatment periods. I'd be surprised if there were actually two separate procedures to calculate the pre- and post-period ATETs, so I'm guessing that my intuition about what the command is doing is incorrect. I'd appreciate any advice.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(yvar date t) long state_code float treatment_var
     15.48513 202 15  6 0
     15.72585 203 16  6 0
    15.925879 204 17  6 0
    16.269817 205 18  6 0
     16.45096 206 19  6 0
    15.775162 207 20  6 0
    16.349216 208 21  6 0
    16.380201 209 22  6 0
    16.644821 210 23  6 0
    16.963497 211 24  6 0
    15.247427 212 25  6 0
    14.532273 213 26  6 0
    15.245345 214 27  6 0
    15.190864 215 28  6 0
    14.373366 216 29  6 0
    14.372738 217 30  6 0
    14.459294 218 31  6 0
    14.425225 219 32  6 0
     13.76663 220 33  6 0
     13.70272 221 34  6 0
    13.949927 222 35  6 0
    14.058516 223 36  6 0
    13.268163 224 37  6 0
     12.94921 225 38  6 0
    12.758443 226 39  6 0
     12.18474 227 40  6 0
    32.836536 202 15  8 0
     33.09925 203 16  8 0
    33.805603 204 17  8 0
    35.038982 205 18  8 0
     35.24295 206 19  8 0
    35.345726 207 20  8 0
     31.71864 208 21  8 0
    31.059637 209 22  8 0
    30.057047 210 23  8 0
     30.07797 211 24  8 0
    27.192354 212 25  8 0
     28.09544 213 26  8 0
    28.268705 214 27  8 0
    27.712704 215 28  8 0
    27.565866 216 29  8 0
     27.00981 217 30  8 0
     28.08877 218 31  8 0
      27.2892 219 32  8 0
    26.119316 220 33  8 0
    26.578644 221 34  8 0
     26.56049 222 35  8 0
     27.08266 223 36  8 0
    26.081926 224 37  8 0
       25.714 225 38  8 0
    25.593884 226 39  8 0
    22.621666 227 40  8 0
    13.697414 202 15  9 0
    13.828605 203 16  9 0
     13.30723 204 17  9 0
    13.292938 205 18  9 0
    12.665986 206 19  9 0
    12.207777 207 20  9 0
    12.806787 208 21  9 0
    11.970038 209 22  9 0
    12.354688 210 23  9 0
    12.440425 211 24  9 0
    11.497087 212 25  9 0
    11.378276 213 26  9 0
    12.242064 214 27  9 0
    11.282948 215 28  9 0
    10.674518 216 29  9 0
    11.858283 217 30  9 0
     10.82375 218 31  9 0
    11.259423 219 32  9 0
     10.58102 220 33  9 0
    10.740476 221 34  9 0
    10.472117 222 35  9 0
    10.894774 223 36  9 0
    10.301176 224 37  9 0
    10.201314 225 38  9 0
    10.150612 226 39  9 0
     8.140618 227 40  9 0
    26.615303 202 15 50 0
     26.99366 203 16 50 0
    27.320904 204 17 50 0
    27.346266 205 18 50 0
     27.98667 206 19 50 0
    28.089935 207 20 50 0
     28.10553 208 21 50 0
      27.9229 209 22 50 1
    27.838293 210 23 50 1
    28.183153 211 24 50 1
    26.817026 212 25 50 1
    27.129515 213 26 50 1
     27.15126 214 27 50 1
     26.80359 215 28 50 1
    26.056345 216 29 50 1
     26.63486 217 30 50 1
     27.07756 218 31 50 1
     25.72048 219 32 50 1
    24.775146 220 33 50 1
    24.457024 221 34 50 1
     23.38469 222 35 50 1
    22.576775 223 36 50 1
    end
    format %tq date
    label values state_code state_code
    label def state_code 6 "Colorado", modify
    label def state_code 8 "Delaware", modify
    label def state_code 9 "District of Columbia", modify
    label def state_code 50 "West Virginia", modify
    Last edited by David Beheshti; 11 Jul 2023, 19:13.

  • #2
    Hi David,

    To compute ATET(g,t) for the pre-treatment period, you must set the benchmark time and the benchmark group. The default benchmark group is the never-treated group regardless of t. However, the benchmark time differs depending if t is before or post-treatment. If t is the post-treatment period, the benchmark time is g-1. If t is the before-treatment period, the benchmark time is t-1.

    If you have covariates in the outcome model, the covariates are assumed to be time-invariant. Thus, to compute the function m(x, t) = E[y_{t} - y_{t-1}|C=1, x], you only need to use the t-1 period of x's.

    In the CS(2021) paper, the no-anticipation assumption and the conditional parallel trend assumption imply the pre-treatment ATET are zeros.

    Comment


    • #3
      Hi Di,

      Thanks for your reply. I wasn't aware that the benchmark time is fixed in the post-period, while it varies in the pre-period. The following code reproduces the ATETs estimated by -xthdidregress-:
      Code:
      preserve
          // Post-treatment ATETs (g=22)
          gcollapse yvar, by(_did_cohort t)
          reshape wide yvar, i(t) j(_did_cohort)
          gen difference = yvar22 - yvar0
          gen center = difference if t==21
          replace center = center[7] // bit of a hack to recenter
          gen atet = difference - center if t>=22
          
          // Pre-treatment ATETs
          forvalues i=15(1)21{
              drop center
              gen center = difference if t==`i'
              replace center = center[`i'-14]
              replace atet = difference - center if t==`i'+1
          }
          twoway connected atet t, xline(21) ///
              xlabel(10(10)40) yline(0) ylabel(-10(5)5)
      restore
      final_graph.png

      The following comment may just further reveal my ignorance, but this strikes me as odd. The post-period ATETs are identical to what we'd get from a traditional TWFE event study where we normalize the period immediately before treatment to 0. I.e., the coefficients on "time_to_treatment" from the following regression:
      Code:
      * Generating a "time until treatment" variable
      bysort state_code: egen temp = min(t) if treatment_var==1
      bysort state_code: egen first_t_treated = max(temp)
      
      gen time_to_treatment = t - first_t_treated
      replace time_to_treatment = time_to_treatment+100 // Stata doesn't like negative
              // factor variables. 100 is now the first treatment period
      replace time_to_treatment = 99 if missing(time_to_treatment) // so untreated
              // states don't get dropped
      
      xtreg yvar b99.time_to_treatment i.t, fe cluster(state_code)
      preserve
          parmest, norestore
      
          gen time = regexs(1) if regexm(parm, "([0-9]?[0-9][0-9])b?\.time_to_treatment")
          destring time, replace
      
          keep if time!=.
          replace time = time - 100
      
          twoway     (connect estimate time), ///
                  ytitle(" ") xline(0) yline(0) xtitle("Time until treatment") ///
                  ylabel(-10(5)5)
      restore
      graph4.png

      But, the pre-period ATETs are equal to the first difference of these same coefficients! So, if I'm understanding correctly all of the post-period coefficients from -xthdidregress- can be interpreted as changes in the difference in yvar between treated and control states, relative to the period immediately preceding treatment. But, all the pre-period coefficients are interpreted as changes relative to the previous year. Doesn't this mechanically "flatten" the pre-trends that applied economists are concerned about when doing this type of analysis? What's the justification for the change in comparison group? I'm sure this is discussed in the original CS2021 paper, although I confess it's beyond my reading comprehension.
      Last edited by David Beheshti; 12 Jul 2023, 12:01. Reason: Edited to correct a mistake in the interpretation of the TWFE coefficients.

      Comment


      • #4
        Hi David
        it will flatten it, but also gives you more information for understanding at what point violations of pta occur.
        now, with csdid (my version of callaway and Sant’Anna) (or the new csdid2) you can request either the short pretreatment differences, or the long differences (long2 option in csdid) for pre treatment analysis.

        as a matter of fact, in csdid2 the long differences are now the default, to avoid confusions. And I believe Pedro was gonna do the same with -did- package in R
        hth

        Comment


        • #5
          Hi Fernando,

          Thanks for your reply, as well as your work in creating -csdid-.

          Thanks for pointing out the "long2" option in -csdid-, that is definitely more in line with what I was expecting both -csdid- and -xthdidregress- to do. I'm glad to hear that it will be the default in -csdid2-, as I expect that will help avoid a lot of confusion in the future. I have a few follow-up questions/comments for you (or anyone else):
          1. Is there a similar option for -xthdidregress-? From my reading of the help file I think the answer is no, but I hope I'm wrong.
          2. I don't follow
            it will flatten it, but also gives you more information for understanding at what point violations of pta occur.
          In what sense does this help understand violations of the parallel trends assumption? From what I can see it masks violations, unless the reader is very careful. Consider the following basic setup where we have four states, one of which is ever treated:
          Code:
          clear
          
          set obs 100
          gen counter = _n
          gen id = 1
          replace id = 2 if counter > 25
          replace id = 3 if counter > 50
          replace id = 4 if counter > 75
          
          bysort id: gen time = _n
          
          xtset id time
          
          gen outcome = 10 + id*2 + .5*rnormal() // outcome is a constant + FE + noise
          
          replace outcome = outcome + time if id==4 // linear time trend for id=4
          
          // Assume only unit 4 is ever treated, starting in period time=15
          gen treatment_var = 0
          replace treatment_var = 1 if id==4 & time>=15
          
          twoway (line outcome time if id==1, lcolor(gray)) ///
                 (line outcome time if id==2, lcolor(gray)) ///
                 (line outcome time if id==3, lcolor(gray)) ///
                 (line outcome time if id==4, lcolor(red)), ///
                      legend(on order(1 "Never Treated" 2  "Never Treated" ///
                          3 "Never Treated" 4 "Treated")) ///
                      xline(15)
          graph5.png
          By construction, there is no real effect of the treatment, just a continuation of the pre-trend. But running the default -csdid- (or -xthdidregress-):
          Code:
          gen cohort = 0
          replace cohort = 15 if id==4
          
          csdid outcome, i(id) time(time) gvar(cohort) //long2 // without "long2", this
              // is equivalent to -xthdidregress-. With "long2", this is equivalent to TWFE
          estat event
          csdid_plot
          gives the following event study plot:
          graph6.png
          Now that I realize what the pre-treatment coefficients are measuring, I understand that the consistently positive coefficients indicate an upward trend (although I get no intuitive sense of the magnitude). But, every diff-in-diff paper I've seen since around 2016 has shown an "event study" in which the pre-period coefficients have the same basic interpretation as the post-period coefficients. That's what I, and I assume a lot of labor/health/education economists, will naturally assume that this figure is showing--and will think "pre-trends look pretty good!" Anecdotally, I refereed a paper just last month that showed a similar figure and made this argument. But whether their claim that was true or not hinges critically on whether they used the "long2" option. In contrast, including the "long2" option gives the following:
          graph7.png
          Which makes the underlying data generating process immediately clear. I'm struggling to think of a realistic example where the first graph is clearer than the second. Anyways, I appreciate everyone's helpful responses thus far--and apologize if it seems like I'm ranting--I'm just worried that I'm not the only one who is going to misunderstand the default output. Hopefully this thread will help someone else avoid my mistake.
          Last edited by David Beheshti; 12 Jul 2023, 20:32.

          Comment


          • #6
            You ar3 absolutely right.
            On this I have to redirect the original decisions on pretreatment to Pedro. That is why on my first run of csdid I added long option.
            however the example you give is also interesting
            your event plot shows that at each pretreatment point your the attgts were all significant this showing the pta violation

            nevertheless you are right in that many ppl who have asked me about the command had the same confusions

            in you search Brantly Callaway page they do have an explanation about the “varying base” and “base universal “ estimates

            Comment


            • #7
              Thanks Fernando. I'll continue to use -csdid- with the "long2" option. Hopefully, Stata will add a similar feature to -xthdidregress- in the future.

              Comment


              • #8
                Just wanted to flag that there is now an option for universal base estimates using xthdidregress. Just specify the option
                Code:
                basetime(common)
                .

                Comment

                Working...
                X