Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picking the most representative geographic areas from a set

    I am trying to design an experiment where there will be very few treated geographic units, perhaps only one. You can think of these as states of the US for the purpose of this question. I have historical panel data on both the outcomes and the covariates. In the non-experimental setting, synthetic cohort methods have been used to create Frankenstein control group(s) for causal inference. My situation is different since I can choose the treated unit. To do this, I have tried to find the most representative state by finding the smallest squared Mahalanobis distance between each state and the remaining states using the pre-test outcomes and some covariates.

    Here's how I have approached this problem on the smoking dataset that comes bundled with the user-written synth command:

    Code:
    sysuse smoking, clear
    drop if missing(beer)
    drop age15to24
    xtset state year 
    reshape wide cigsale lnincome beer retprice, i(state) j(year)
    
    gen ms = .
    levelsof state, local(states)
    
    foreach s of local states {
        gen mod_s        = cond(state != `s',1,0)
        gen target_state = cond(state == `s',1,0)
        mahascore2 cigsale* lnincome* beer* retprice*, pop1(target_state) pop2(mod_s) compute_invcovarmat union
        replace ms = r(mahascore_sq) if state==`s'
        capture drop mod_s target_state    
    }
    
    sum ms, detail
    gsort ms
    list state ms in 1/10, clean noobs
    Does this seem like a reasonable way to do this? Is there something better?
Working...
X