Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Synth2 - Why does dropping a unit not in the synthetic control change the results?

    I am using the user-written -synth2- command to conduct a synthetic control analysis.

    I noticed that when I drop units that are not a part of the synthetic control, my treatment effects change slightly. I don't understand why this would be the case. Is it an error in the command or is there a reason for this?

    See below for an example of this issue arising:

    Code:
    * Load data from Abadie, Diamond, and Hainmueller (2010)
    use"https://github.com/scunning1975/mixtape/blob/master/smoking.dta?raw=true", clear
    
    * Declare panel
    xtset state year
    
    * Replicate results in Abadie, Diamond, and Hainmueller (2010)
    synth2 cigsale lnincome age15to24 retprice beer cigsale(1988) cigsale(1980) cigsale(1975), trunit(3) trperiod(1989) xperiod(1980(1)1988) nested allopt nofig
    *** Treatment effect is -19.0018
    
    * Drop a unit that is not in the synthetic control from analysis and try to replicate
    drop if state == 1
    synth2 cigsale lnincome age15to24 retprice beer cigsale(1988) cigsale(1980) cigsale(1975), trunit(3) trperiod(1989) xperiod(1980(1)1988) nested allopt nofig
    *** Treatment effect is -19.0779
    Note: This issue also seems to occur with the original -synth- command. If you try the above code with -synth- instead of -synth2-, you can see that the optimal unit weights are slightly different when you drop state 1.
    Last edited by Noah Spencer; 21 Apr 2023, 08:52.

  • #2
    The states in your donor pool will affect the optimization procedure slightly. If you get rid of certain donors, the donor the convex optimization has to pick from will inevitably be affected. Why?

    This is because underneath the hood, you are using ordinary least squares regression solved in this case by quadratic programming. And, as we would expect in any regression, changing the predictors will inevitably change the results that we get. This is why selecting a donor pool is extremely important with applications of SCM.

    Anyways, the results do not meaningfully change. So, this is a good aspect of classic SCM (even though I have problems with it).

    Comment


    • #3
      Thanks for this helpful response Jared Greathouse!

      Comment


      • #4
        To further illustrate, suppose we select the top 30 states, such that their pairwise standard deviation between the treated unit and donors is minimized. Say, that is, we discard the 8 states which look least like California. As we can confirm, this discards states like Kentucky who smoke too much relative to the other donors, and thus may not be a good comparison state. You'll also notice that North Carolina is gone, as is Alabama and Arkansas.
        Code:
        cls 
        
        clear *
        
        
        * Load data from Abadie, Diamond, and Hainmueller (2010)
        use"https://github.com/scunning1975/mixtape/blob/master/smoking.dta?raw=true", clear
        
        
        g score = .
        
        * Declare panel
        xtset state year
        
        local lbl: value label `r(panelvar)'
        
        
        loc unit ="California":`lbl'
         
        
        qui levelsof state if state != `unit', loc(ids)
        
        qui foreach x of loc ids {
            
        cap frame drop minframe
        
        frame put state year cigsale if (state ==`unit' | state ==`x') & year < 1989, into(minframe)
        
        
        
        frame minframe {
        greshape wide cigsale, j(state) i(year)
        
        order cigsale`unit', a(year)
        
        
        ds // Gets a list of all variables
        
        loc t: word 2 of `r(varlist)'
        
        loc treated_unit: disp "`t'" // T for treated
        
        loc a: word 3 of `r(varlist)'
        
        loc donor_unit: disp "`a'" // First donor unit...
        
        
        g diff = `treated_unit'-`donor_unit'
        
        egen mean =  mean(`treated_unit'-`donor_unit')
        
        egen summed = total((diff-mean)^2)
        
        qui su year
        
        g division= summed/`=r(N)'
        
        g score = division^.5
        
        su score, mean
        
        loc score = r(mean)
        
        frame default: replace score = `score' if state ==`x'
        }
        }
        
        bys year: egen rank = rank(score)
        xtset
        
        keep if inrange(rank,1,30) | rank == .
        
        
        cls
        
        synth cigsale lnincome age15to24 retprice beer cigsale(1988) cigsale(1980) cigsale(1975), trunit(3) trperiod(1989) xperiod(1980(1)1988) nested allopt
        Very similar weights are chosen. They get roughly the same quantitative impact, too. The issue with this method is that there's no stopping rule- while the top 30 states do give a good fit and effect, how do we know that this is the optimal 30, why not 31, why not 32? Donor selection in SCM is a pretty important issue, and it's one me and my coworkers are delving into at the moment.

        Comment

        Working...
        X