Synthetic difference in differences with multiple treated units

Anja Jean-Mairet

Join Date: Apr 2021

Posts: 18
#1

Synthetic difference in differences with multiple treated units

28 May 2024, 07:40

Hi,

I have worked with SDID before but always with just one treated unit and usually in groups instead of unique IDs.
I have a dataset with data for firms from 2002-2020. Each firm has its own bvdid. 94 of these IDs are treated units, the rest is untreated. I was assuming this code should work:

synth RD (impact_2005) year bvdid, trunit(1) trperiod(2005)

impact_2005: shows after 2005 and treated
bvdid: shows the firm IDs
year: gives all years from 2002-2020

With help synth I see the following:
trunit(#) the unit number of the unit affected by the intervention as given in the panel id
variable specified in tsset panelvar; see tsset. Notice that only a single unit number can be
specified. If the intervention of interest affected several units the user may chose to
combine these units first and then treat them as a single unit affected by the intervention.

I am now desperately trying to combine the treated units in a way that the command still works. I was thinking of simply assigning the same ID to all treated values but then I run into the problem of my panel data no longer being uniquely identified by bvdid and year.

Has anyone worked with such data before and knows of a better workaround/solution? (I do also have variables such as country, iso_codes I could possibly use for grouping)

Highly appreciated!
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3152
#2

28 May 2024, 08:11

I think there's a command floating around that permits multiple treated units. Can't recall the name of it.

If you want the mean of the treated units, create ID2 and assign the same ID to treated units and keep the original ID on the control units. Then collapse the data (mean/sum whatever you want) on ID2 and time.
Comment

Anja Jean-Mairet

Join Date: Apr 2021
Posts: 18

28 May 2024, 08:32

Hi George,

Thank you for your reply.
How exactly would collapsing be useful here? Don't I lose a lot of information that way?

Here to showcase what my data looks like (the bvdid stays the same for each year: 365 2003, 365 2004, etc):

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long bvdid int year double RD float(treatment_2005 post_2005 impact_2005) long(country iso_country)
 365 2002 523229398.664204 1 0 0  7  7
1365 2002                0 1 0 0 26 25
1214 2002                0 1 0 0 21 19
 119 2002        1.911e+09 1 0 0 11  6
   6 2002         46000000 1 0 0  1  1
  18 2002                0 1 0 0  1  1
  57 2002                0 1 0 0  2  2
 502 2002         91600000 1 0 0  9 10
 723 2002                0 1 0 0 10 11
 885 2002  2463728915.0984 1 0 0 32 12
1416 2002                0 1 0 0 31 28
 505 2002          1100000 1 0 0  9 10
 262 2002       1.1353e+09 1 0 0 11  6
 200 2002                0 1 0 0 11  6
 259 2002         20900000 1 0 0 11  6
1423 2002 52081122.3137318 1 0 0 31 28
1109 2002                0 1 0 0  4 14
1413 2002                0 1 0 0 31 28
 410 2002          1540000 1 0 0 30  9
1112 2002  34515514.267202 1 0 0 14 15
 946 2002 3638064.11883703 1 0 0 32 12
1154 2002                0 1 0 0 18 18
 416 2002          8936000 1 0 0 30  9
1208 2002                0 1 0 0 18 18
 302 2002                0 1 0 0 11  6
1211 2002                0 1 0 0 18 18
 798 2002                0 1 0 0 32 12
 391 2002                0 1 0 0  7  7
  55 2002                0 1 0 0  2  2
1527 2002                0 1 0 0 29 29
1358 2002                0 1 0 0 26 25
1172 2002                0 1 0 0 18 18
1366 2002                0 1 0 0 26 25
1114 2002                0 1 0 0 14 15
 265 2002         25500000 1 0 0 11  6
  15 2002         22813000 1 0 0  1  1
1204 2002                0 1 0 0 18 18
 706 2002                0 1 0 0 10 11
1215 2002                0 1 0 0 21 19
 125 2002                0 1 0 0 11  6
 766 2002                0 1 0 0 10 11
 223 2002                0 1 0 0 11  6
 412 2002                0 1 0 0 30  9
 182 2002        2.590e+08 1 0 0 11  6
1367 2002                0 1 0 0 26 25
 801 2002 72641552.2510149 1 0 0 32 12
 310 2002        1.577e+08 1 0 0 11  6
 430 2002                0 1 0 0 30  9
1212 2002                0 1 0 0 18 18
 249 2002        391069000 1 0 0 11  6
end

Comment

George Ford

Join Date: Aug 2014

Posts: 3152
#4

28 May 2024, 09:02

All you are doing with collapse is reducing the treated units to one unit. Control count is unchanged.

If the intervention of interest affected several units the user may chose to
combine these units first and then treat them as a single unit affected by the intervention.

If you have 5 treated and 10 controls, the collapsed data would have 1 treated and 10 controls.
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#5

28 May 2024, 09:09

If you take a simple mean, you need to think about the scale of the variables among the treated. I would think you might not want to take an average if one treated units has a mean of 1 and another 100. You could weight the collapse by the mean outcome in the last pre-treatment year. I'd do it both ways (simple mean, weighted mean).

Also see
https://docs.iza.org/dp8944.pdf
https://papers.ssrn.com/sol3/papers....act_id=2584200

I think gsynth allows you to do multiple treated units.
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#6

28 May 2024, 09:10

Not gsynth, but fect.

https://yiqingxu.org/packages/fect/stata/fect_md.html
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#7

28 May 2024, 10:00

Actually, I think gsynth is R code. I don't think fect does synthetic counterfactual. Sorry for the confusion.
Comment
Anja Jean-Mairet

Join Date: Apr 2021

Posts: 18
#8

01 Jun 2024, 09:09

Hi George,

I think I have found the command that allows for multiple treated units: allsynth

However when I run that code I always end up with an error message:

Code:

allsynth RD, transform(RD, normalize) bcorrect(merge) keep(allsynth_MA, replace) stacked(trunits(treatment_2005) trperiods(post_2005), clear figure(classic bcorrect)) post_2005 observes 1 as the treatment period for treated unit (bvdid == 5), but 1 is not found in timevar year r(198);

My post_2005 is a dummy variable (1 is after intervention and 0 it before intervention). Should this be another variable then? And am I using the specifications correctly for my dataset?

Thank you for your help so far!
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#9

01 Jun 2024, 10:05

I think the trperiods are years, not dummies.
Comment
Anja Jean-Mairet

Join Date: Apr 2021

Posts: 18
#10

01 Jun 2024, 11:21

I see, but then how should that variable look like. Should it basically be the same as year but have a zero before the intervention period? So up until 2005 it would be zero and afterwards it shows the respective year (2005-2020)?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#11

01 Jun 2024, 11:42

I believe the command you want is sdid.
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#12

01 Jun 2024, 14:26

Yeah. I don't see that allsynth does multiple units.
Comment
Anja Jean-Mairet

Join Date: Apr 2021

Posts: 18
#13

02 Jun 2024, 03:41

I have to be honest I am a bit confused by now. I have tried all these commands multiple times but I always run into complications.

Code:

xtset bvdid year sdid RD bvdid treatment_2005 year_post, vce(placebo)

Keeps telling me that I have repeated time values in my panel. However running

Code:

duplicates report bvdid year duplicates list bvdid year

Shows me that I do not have any duplicates in my panel. Additionally the observations per year are the exact same for each year.

Running the allsynth command shows me a similar error:

Code:

allsynth RD, transform(RD, normalize) bcorrect(merge) keep(allsynth_MA, replace) stacked(trunits(treatment_2005) trperiods(year_post), clear figure(classic bcorrect)) Multiple treatment periods (year_post) are observed for treated unit (bvdid == 5). Remove (bvdid == 5) from your treated units, or restrict year_post to observe a single treatment period for (bvdid == 5) and take note of the implications for interpreting the results

If I remove bvdid==5 the error appears for the next bvdid==6 so the command is not happy with my data structure. However, this is simply how panel data looks like. I have observations for each bvdid per year. I have 19 years so I have values for each year per id.

I believe the paper Examination of the Synthetic Control Method for Evaluating Health Policies with Multiple Treated Units (https://onlinelibrary.wiley.com/doi/....1002/hec.3258) has done what I am trying to recreate. Page 1519 shows how they extent the SCM to multiple treated units without leaving insufﬁcient power to detect whether there was a statistically signiﬁcant treatment effect by aggregating the units. However I struggle to recreate the code.

In case you are familiar with that paper, that would also be of great help.

In general, thank you very much for your inputs!
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#14

02 Jun 2024, 04:11

Show me what happens when you try to xtset your data.

Here's what I'm not understanding: SDID is quite simple in the sense that it only need 4 variables from the user. The outcome, the indicator treatment, the numerical/categorical id and the time.

You seem... to have "treatment_2005 year_post" as your variables. Why? Is treatment_2005 a 0 1 variable where it's equal to 1 if a unit is in the set of 94 treated units AND time is greater than or equal to the treatment start date, else 0? Is "year_post" a scalar for year, or is it a dummy variable? I'm not at my computer (well, I'm not getting up to turn it on 😂), but you should be using the "year" variable, right?
Comment
Anja Jean-Mairet

Join Date: Apr 2021

Posts: 18
#15

02 Jun 2024, 04:30

When I xtset my data and then run the code this shows up:

Code:

xtset bvdid year Panel variable: bvdid (strongly balanced) Time variable: year, 2002 to 2020 Delta: 1 unit sdid RD bvdid treatment_2005 year, vce(placebo) repeated time values within panel

I ran it with 3 different time variables and each time there is the same error code.

year: simply shows 2002-2020
post_2005: dummy variable (1 if 2005 or after, 0 if before 2005)
year_post: 0 if before 2005, 2005-2020 if 2005 or after

My treatment_2005 variable shows 1 if the bvdid has ever been treated and 0 if never treated. My impact_2005 variable would show the treatment x post variable which I am not using in this version of the command. If I am using year as my time variable how does stata know when my intervention is taking place?
Comment

Announcement

Synthetic difference in differences with multiple treated units

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment