Picking the most representative geographic areas from a set

Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#1

Picking the most representative geographic areas from a set

20 Jul 2018, 19:33

I am trying to design an experiment where there will be very few treated geographic units, perhaps only one. You can think of these as states of the US for the purpose of this question. I have historical panel data on both the outcomes and the covariates. In the non-experimental setting, synthetic cohort methods have been used to create Frankenstein control group(s) for causal inference. My situation is different since I can choose the treated unit. To do this, I have tried to find the most representative state by finding the smallest squared Mahalanobis distance between each state and the remaining states using the pre-test outcomes and some covariates.

Here's how I have approached this problem on the smoking dataset that comes bundled with the user-written synth command:

Code:

sysuse smoking, clear drop if missing(beer) drop age15to24 xtset state year reshape wide cigsale lnincome beer retprice, i(state) j(year) gen ms = . levelsof state, local(states) foreach s of local states { gen mod_s = cond(state != `s',1,0) gen target_state = cond(state == `s',1,0) mahascore2 cigsale* lnincome* beer* retprice*, pop1(target_state) pop2(mod_s) compute_invcovarmat union replace ms = r(mahascore_sq) if state==`s' capture drop mod_s target_state } sum ms, detail gsort ms list state ms in 1/10, clean noobs

Does this seem like a reasonable way to do this? Is there something better?
Tags: experimental design, Mahalanobis distance

Announcement

Picking the most representative geographic areas from a set