Synth2 - Why does dropping a unit not in the synthetic control change the results?

Noah Spencer

Join Date: Jan 2019

Posts: 125
#1

Synth2 - Why does dropping a unit not in the synthetic control change the results?

21 Apr 2023, 08:49

I am using the user-written -synth2- command to conduct a synthetic control analysis.

I noticed that when I drop units that are not a part of the synthetic control, my treatment effects change slightly. I don't understand why this would be the case. Is it an error in the command or is there a reason for this?

See below for an example of this issue arising:

Code:

* Load data from Abadie, Diamond, and Hainmueller (2010) use"https://github.com/scunning1975/mixtape/blob/master/smoking.dta?raw=true", clear * Declare panel xtset state year * Replicate results in Abadie, Diamond, and Hainmueller (2010) synth2 cigsale lnincome age15to24 retprice beer cigsale(1988) cigsale(1980) cigsale(1975), trunit(3) trperiod(1989) xperiod(1980(1)1988) nested allopt nofig *** Treatment effect is -19.0018 * Drop a unit that is not in the synthetic control from analysis and try to replicate drop if state == 1 synth2 cigsale lnincome age15to24 retprice beer cigsale(1988) cigsale(1980) cigsale(1975), trunit(3) trperiod(1989) xperiod(1980(1)1988) nested allopt nofig *** Treatment effect is -19.0779

Note: This issue also seems to occur with the original -synth- command. If you try the above code with -synth- instead of -synth2-, you can see that the optimal unit weights are slightly different when you drop state 1.

Last edited by Noah Spencer; 21 Apr 2023, 08:52.
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

21 Apr 2023, 09:01

The states in your donor pool will affect the optimization procedure slightly. If you get rid of certain donors, the donor the convex optimization has to pick from will inevitably be affected. Why?

This is because underneath the hood, you are using ordinary least squares regression solved in this case by quadratic programming. And, as we would expect in any regression, changing the predictors will inevitably change the results that we get. This is why selecting a donor pool is extremely important with applications of SCM.

Anyways, the results do not meaningfully change. So, this is a good aspect of classic SCM (even though I have problems with it).
1 like
Comment
Noah Spencer

Join Date: Jan 2019

Posts: 125
#3

21 Apr 2023, 09:13

Thanks for this helpful response Jared Greathouse!
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

21 Apr 2023, 09:26

To further illustrate, suppose we select the top 30 states, such that their pairwise standard deviation between the treated unit and donors is minimized. Say, that is, we discard the 8 states which look least like California. As we can confirm, this discards states like Kentucky who smoke too much relative to the other donors, and thus may not be a good comparison state. You'll also notice that North Carolina is gone, as is Alabama and Arkansas.

Code:

cls 

clear *


* Load data from Abadie, Diamond, and Hainmueller (2010)
use"https://github.com/scunning1975/mixtape/blob/master/smoking.dta?raw=true", clear


g score = .

* Declare panel
xtset state year

local lbl: value label `r(panelvar)'


loc unit ="California":`lbl'
 

qui levelsof state if state != `unit', loc(ids)

qui foreach x of loc ids {
    
cap frame drop minframe

frame put state year cigsale if (state ==`unit' | state ==`x') & year < 1989, into(minframe)



frame minframe {
greshape wide cigsale, j(state) i(year)

order cigsale`unit', a(year)


ds // Gets a list of all variables

loc t: word 2 of `r(varlist)'

loc treated_unit: disp "`t'" // T for treated

loc a: word 3 of `r(varlist)'

loc donor_unit: disp "`a'" // First donor unit...


g diff = `treated_unit'-`donor_unit'

egen mean =  mean(`treated_unit'-`donor_unit')

egen summed = total((diff-mean)^2)

qui su year

g division= summed/`=r(N)'

g score = division^.5

su score, mean

loc score = r(mean)

frame default: replace score = `score' if state ==`x'
}
}

bys year: egen rank = rank(score)
xtset

keep if inrange(rank,1,30) | rank == .


cls

synth cigsale lnincome age15to24 retprice beer cigsale(1988) cigsale(1980) cigsale(1975), trunit(3) trperiod(1989) xperiod(1980(1)1988) nested allopt

Very similar weights are chosen. They get roughly the same quantitative impact, too. The issue with this method is that there's no stopping rule- while the top 30 states do give a good fit and effect, how do we know that this is the optimal 30, why not 31, why not 32? Donor selection in SCM is a pretty important issue, and it's one me and my coworkers are delving into at the moment.

Announcement

Synth2 - Why does dropping a unit not in the synthetic control change the results?

Comment

Comment

Comment