Synthetic difference-in-differences event study style plots with permutation-based confidence intervals

Noah Spencer

Join Date: Jan 2019
Posts: 134

Synthetic difference-in-differences event study style plots with permutation-based confidence intervals

11 Jul 2023, 13:06

Hi,
Section 4.4 of Clarke et al (2023) shows how to generate event study style plots with the synthetic difference-in-differences procedure developed by Arkhangelsky et al (2021). Specifically, Clarke et al provide the Stata code for generating this type of plot with confidence intervals generated following a block bootstrapping procedure. However, they note that this type of plot could also be generated following a placebo permutation procedure. I would like to know how to generate such a plot.

I've tried to write my own code to do this based on the author's existing code for the event study plot with bootstraping + the .ado code for generating placebo/permutation-based standard errors. Could someone review my code and check that it is properly generating placebo-based confidence intervals?

Code:

/*=======================================================================
Prepare data
========================================================================*/
 
webuse set www.damianclarke.net/stata/
webuse quota_example.dta, clear
egen m=min(year) if quota==1, by(country) //indicator for the year of adoption
egen mm=mean(m), by(country)
keep if mm==2002 | mm==. //keep only one time of adoption
drop if lngdp==. //keep if covariates observed



/*=======================================================================
Create vector of values from Equation (8)
========================================================================*/

qui sdid womparl country year quota, vce(noinference) graph g2_opt(ylab(-5(5)20) ytitle("Women in Parliament") scheme(sj)) graph_export(groups, .pdf) covariates(lngdp, projected) 

matrix lambda = e(lambda)[1..12,1] //save lambda weight
matrix yco = e(series)[1..12,2] //control baseline
matrix ytr = e(series)[1..12,3] //treated baseline
matrix aux = lambda'*(ytr - yco) //calculate the pre-treatment mean
scalar meanpre_o = aux[1,1]
matrix difference = e(difference)[1..26,1..2] // Store Ytr-Yco
svmat difference
ren (difference1 difference2) (time d)
replace d = d - meanpre_o // Calculate vector in (8)


/*=======================================================================
Do permutation procedure to generate confidence intervals
========================================================================*/

* Generate locals related to treatment status
tempvar treated ty tyear n
qui gen `ty' = year if quota == 1 
qui bys country: egen `treated' = max(quota)
qui by  country: egen `tyear'   = min(`ty')
    
* Generate local for number of control units
local co = 106
    
* Set replication counter and number of reps    
local b = 1
local B = 100

* Loop over all permutation replications
while `b'<=`B' {
    
    * Preserve data
    preserve
    
    * Drop treated units
    qui drop if `tyear' !=.
    
    * Generate numeric order by random uniform variable
    tempvar r rand id
    sort year country
    qui gen `r' = runiform() in 1/`co'
    bysort country: egen `rand' = sum(`r')
    egen `id' = group(`rand')

    * Assign placebo treatment 
    local i=1
        forvalues y=1/`co' {
            qui replace `tyear' = tryears[`i',1] if `id' == `y'
            
            local ++i
        }
     qui replace quota = 1 if year >= `tyear'
    
    
    * Run SDiD
    qui: sdid womparl country year quota, vce(noinference) graph
        
    * Collect bootstrapped versions of the quantities you stored above to calculate the second term in Equation 8
    matrix lambda_b = e(lambda)[1..12,1]
    matrix yco_b = e(series)[1..12,2]
    matrix ytr_b = e(series)[1..12,3]
    matrix aux_b = lambda_b'*(ytr_b - yco_b)
    matrix meanpre_b = J(26,1,aux_b[1,1])
        
    * Collect this replication's Equation 8 values (and store in a matrix)
    matrix d`b'=e(difference)[1..26,2] - meanpre_b
        
    * Update loop counter
    local ++b
    
    * Restore dataset
    restore
}
    

*** Calculate standard deviation of each estimate from (8) based on the bootstrap resamples
    * Keep original estimate of (8)
    preserve
    keep time d
    keep if time!=.
    
    * Import vector of resampled estimates
    forval b=1/`B' {
        svmat d`b'
    }
    
    * Calculate standard deviation of estimates across each time period
    egen rsd = rowsd(d11 - d`B'1)      

    
* Generate confidence intervals
gen LCI = d + invnormal(0.025)*rsd 
gen UCI = d + invnormal(0.975)*rsd 


/*=======================================================================
Generate plot
========================================================================*/

tw rarea UCI LCI time, color(gray%40) || scatter d time, color(blue) m(d) xtitle("") ytitle("Women in Parliament") xlab(1990(1)2015, angle(45)) legend(order(2 "Point Estimate" 1 "95% CI") pos(12) col(2)) xline(2002, lc(black) lp(solid)) yline(0, lc(red) lp(shortdash)) scheme(sj)

restore

Tags: None

Noah Spencer

Join Date: Jan 2019
Posts: 134

12 Jul 2023, 14:53

I've made some edits to the code to try to get closer to the procedure described in Clarke et al (2023), but I'm still not sure if I'm doing this correctly:

Code:

/*=======================================================================
Store Equation (8) data series
========================================================================*/

* Prepare data
webuse set www.damianclarke.net/stata/
webuse quota_example.dta, clear
egen m=min(year) if quota==1, by(country) //indicator for the year of adoption
egen mm=mean(m), by(country)
keep if mm==2002 | mm==. //keep only one time of adoption
drop if lngdp==. //keep if covariates observed

* Run SDiD
sdid womparl country year quota, vce(noinference) graph

* Calculate second term in Equation 8
matrix lambda = e(lambda)[1..12,1] // time weights
matrix yco = e(series)[1..12,2] // pre-treatment values for treated unit
matrix ytr = e(series)[1..12,3] // average pre-treatment values for control units
matrix aux = lambda'*(ytr - yco) // calculate the pre-treatment baseline difference
global meanpre_o = aux[1,1]        // store this value
di $meanpre_o

* Store pre-made output on first term of Equation 8
matrix difference = e(difference)[1..26,1..2]  // store Ytr - Yco
svmat difference
keep difference1 difference2
ren (difference1 difference2) (year trt_ctrl_diff)

* Calculate the vector in equation 8
gen eqn8 = trt_ctrl_diff - meanpre_o   
drop trt_ctrl_diff
drop if year == .    

* Save 
save "eqn8_values", replace




/*=======================================================================
Do permutation procedure to generate confidence intervals
========================================================================*/

* Prepare data
webuse set www.damianclarke.net/stata/
webuse quota_example.dta, clear
egen m=min(year) if quota==1, by(country) //indicator for the year of adoption
egen mm=mean(m), by(country)
keep if mm==2002 | mm==. //keep only one time of adoption
drop if lngdp==. //keep if covariates observed


* Drop treated units
drop if country == "Djibouti" | country == "Morocco"
    
* Reset state codes so they're all in order
encode country, gen(country_code)
    

* Set permutation counter and number of reps    
local p = 1
local P = 100


* Loop over all permutation replications
while `p'<=`P' {
    
    * Randomly pick placebo state
    local placebo_unit = runiformint(1, 106)

    * Assign placebo treatments 
    drop quota
    gen quota = (year >= 2002 & country_code == `placebo_unit')
    
    * Run SDiD
    qui: sdid womparl country year quota, vce(noinference) graph
        
    * Collect the quantities you stored above to calculate the second term in Equation 8
     matrix lambda_p = e(lambda)[1..12,1]
     matrix yco_p = e(series)[1..12,2]
     matrix ytr_p = e(series)[1..12,3]
     matrix aux_p = lambda_p'*(ytr_p - yco_p)
     matrix meanpre_p = J(26,1,aux_p[1,1])
        
    * Collect Equation 8 values (and store in a matrix)
     matrix d`p'=e(difference)[1..26,2] - meanpre_p
        
    * Update loop counter
        local ++p
    

}
    


    
* Save all placebo estimates of Equation 8 values
sort country year
forval p=1/`P' {
    svmat d`p'
}
    
* Calculate standard deviation of estimates across each time period
egen rsd = rowsd(d11 - d`P'1)      
keep year rsd
drop if rsd == .

* Merge in true Equation 8 values
merge 1:1 year using "eqn8_values", nogen
    
* Generate confidence intervals
gen LCI = eqn8 + invnormal(0.025)*rsd 
gen UCI = eqn8 + invnormal(0.975)*rsd 


/*=======================================================================
Generate plot
========================================================================*/


*generate plot
tw rarea UCI LCI year, color(gray%40) || scatter eqn8 year, color(blue) m(d) xtitle("") ytitle("") legend(order(2 "Point Estimate" 1 "95% CI") pos(12) col(2)) xline(2002, lc(black) lp(solid)) yline(0, lc(red) lp(shortdash)) scheme(sj)

Comment

Damian Clarke

Join Date: Nov 2021
Posts: 11

20 Jul 2023, 10:23

Hi Noah,
This is a nice idea, and great to see you've applied the code to permutation inference! I think there are just two small things that I would change. I've made some edits to the code below and think that this should do what you're after. The changes are (a) in your code towards the top, meanpre_o is defined as a global, but it needs to be a scalar as it is used below in the code as a scalar (or the code below should be adapted to refer to the global), and (b) when you do the permutations, we want to permute the same treatment structure, which in this case is to have two treated units. In your code you've set up permutations assuming a single treated unit, so below I've changed this for the two countries. Please see below for the version with these edits.

Code:

/*=======================================================================
   Store Equation (8) data series
========================================================================*/

* Prepare data
webuse set www.damianclarke.net/stata/
webuse quota_example.dta, clear
egen m=min(year) if quota==1, by(country) //indicator for the year of adoption
egen mm=mean(m), by(country)
keep if mm==2002 | mm==. //keep only one time of adoption
drop if lngdp==. //keep if covariates observed

* Run SDiD
sdid womparl country year quota, vce(noinference) graph

* Calculate second term in Equation 8
matrix lambda = e(lambda)[1..12,1] // time weights
matrix yco = e(series)[1..12,2] // pre-treatment values for treated unit
matrix ytr = e(series)[1..12,3] // average pre-treatment values for control units
matrix aux = lambda'*(ytr - yco) // calculate the pre-treatment baseline difference
scalar meanpre_o = aux[1,1]        // store this value
di meanpre_o

* Store pre-made output on first term of Equation 8
matrix difference = e(difference)[1..26,1..2]  // store Ytr - Yco
svmat difference
keep difference1 difference2
ren (difference1 difference2) (year trt_ctrl_diff)

* Calculate the vector in equation 8
gen eqn8 = trt_ctrl_diff - meanpre_o
drop trt_ctrl_diff
drop if year == .

* Save
save "eqn8_values", replace




/*=======================================================================
    Do permutation procedure to generate confidence intervals
========================================================================*/

* Prepare data
webuse set www.damianclarke.net/stata/
webuse quota_example.dta, clear
egen m=min(year) if quota==1, by(country) //indicator for the year of adoption
egen mm=mean(m), by(country)
keep if mm==2002 | mm==. //keep only one time of adoption
drop if lngdp==. //keep if covariates observed


* Drop treated units
drop if country == "Djibouti" | country == "Morocco"

* Reset state codes so they're all in order
encode country, gen(country_code)


* Set permutation counter and number of reps
local p = 1
local P = 100


* Loop over all permutation replications
while `p'<=`P' {
    
    drop quota
    // Need to choose 2 placebo states.  Let's just take max and min of rand number
    // This could probably be done in less lines, but the below should work fine
    gen c = rnormal()
    bys country: egen chooser = min(c)
    qui sum chooser
    gen treated = chooser==r(min) | chooser==r(max)
    
    gen quota = (year >= 2002 & treated==1)

    * Run SDiD
    qui: sdid womparl country year quota, vce(noinference) graph

    * Collect the quantities you stored above to calculate the second term in Equation 8
    matrix lambda_p = e(lambda)[1..12,1]
    matrix yco_p = e(series)[1..12,2]
    matrix ytr_p = e(series)[1..12,3]
    matrix aux_p = lambda_p'*(ytr_p - yco_p)
    matrix meanpre_p = J(26,1,aux_p[1,1])

    * Collect Equation 8 values (and store in a matrix)
    matrix d`p'=e(difference)[1..26,2] - meanpre_p

    * Update loop counter
    local ++p
    drop c chooser treated
}

* Save all placebo estimates of Equation 8 values
sort country year
forval p=1/`P' {
    svmat d`p'
}

* Calculate standard deviation of estimates across each time period
egen rsd = rowsd(d11 - d`P'1)
keep year rsd
drop if rsd == .

* Merge in true Equation 8 values
merge 1:1 year using "eqn8_values", nogen

* Generate confidence intervals
gen LCI = eqn8 + invnormal(0.025)*rsd
gen UCI = eqn8 + invnormal(0.975)*rsd


/*=======================================================================
    Generate plot
========================================================================*/


*generate plot
tw rarea UCI LCI year, color(gray%40) || scatter eqn8 year, color(blue) m(d) xtitle("") ytitle("") legend(order(2 "Point Estimate" 1 "95% CI") pos(12) col(2)) xline(2002, lc(black) lp(solid)) yline(0, lc(red) lp(shortdash)) scheme(sj)
graph export sdidplacebo.pdf, replace

Output is as follows:

Click image for larger version

Name: Screenshot from 2023-07-20 17-22-55.png
Views: 1
Size: 67.8 KB
ID: 1721276

Best wishes,
Damian

Comment

Noah Spencer

Join Date: Jan 2019

Posts: 134
#4

20 Jul 2023, 11:20

Thank you very much, Damian!
Comment
Pierangela Peruzzo

Join Date: Mar 2024

Posts: 4
#5

20 Mar 2024, 02:28

Hi,
I tried to reproduce both this event study (placebo) as well as the one in the session 4.4 in Clark et al., 2023. However I am encountering some issues in looping over all permutation replications. The error I get is: "<= invalid name". I can't understand how to solve it. Can someone help me please?
Comment
Noah Spencer

Join Date: Jan 2019

Posts: 134
#6

12 Apr 2024, 08:02

Pierangela Peruzzo, are you copying-and-pasting Damian's code exactly? It still works for me.

If you are copy-and-pasting exactly, can you go line-by-line to see where the error message is coming up?
Comment
Mathieu Simoens

Join Date: Jul 2024

Posts: 4
#7

12 Sep 2025, 01:43

Dear all,

Thanks a lot for these contributions. I have tried implementing both the bootstrap (from the Clarke et al. paper) and the placebo (from this forum) codes for the event study plots and it works perfectly!

I have three questions related to these results:
Do you have any guidance on why the confidence intervals from the placebo event study (shown here) are so much wider than the confidence intervals of the bootstrap event study (in the paper)? And will this generally be the case or is this related to the specific dataset / the number of draws /...?

The simple average of the post-treatment coefficients in the event study plot is exactly the same as the estimated ATT if you use the sdid command. Is there also a way of getting to the standard error and/or confidence intervals from the sdid command by taking some kind of average from the standard errors and/or confidence intervals of the event study plot?

In the placebo inference, what is the added value of increasing the number of draws above the number of availble units? In the end, these additional draws will always be draws which were already drawn before, no? In contrast to the bootstrap where every draw will indeed (likely) be different from a previous one.

Thanks a lot in advance!

Mathieu
Comment

Announcement

Synthetic difference-in-differences event study style plots with permutation-based confidence intervals

Comment

Comment

Comment

Comment

Comment

Comment