Hey everyone. Suppose we're interested in implementing an algorithm which selects the "optimal" set of controls for a certain treated unit for policy analysis. Take the dataset below (by the way I'm working with Stata 17)
In our case, hongkong is our treated unit and the other columns are our control group. The way the first step of the algorithm proceeds is as follows:
We begin by looping over our control units in DID regression (outcome only). We then calculate the predicted counterfactual for this pre-intervention period and calculate the R-squared statistic. We save the r2 to a separate frame. The first unit to be selected is the one which maximizes the r-squared statistic, which we may access in the rsquare frame. We then put this unit up to the front of our control group, and proceed.
Now, using the new control group (which for now consists of just canada), we take the average of canada and the other control units individually, estimate difference-in-differences with each of the remaining control units, and then find the unit which maximizes the r-squared statistic. In this case, it's New Zealand.
Here's where my question lies: Now, we must implement step 2 for the remaining 3 controls (australia austria denmark). That is, we must see which of these three, when added to canada and new zealand, maximizes the r2, adding it to the macro/set U. Then, we must see which of the 2 maximize the r2 statistic, and then we simply add the final control unit to set U, such that we have all units included in the set "U". How might I do this? Or at the very least, what might a good starting point be? My initial reaction is to use a while loop which continues to loop until some condition is fulfilled. Maybe, I should, at the end of the loop for step 2, check how many words there are in the `newdonors'. Once there's 0 words in `newdonors', this means that there are no more control units and then the while loop can conclude. Is that a reasonable starting point?
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(hongkong australia newzealand austria canada denmark) .062 .04048913 .04724391 -.01308351 .01006395 -.01229182 .059 .03785692 .03875869 -.007580798 .02126387 -.003092842 .058 .02250948 .08991753 .000542671 .018919427 -.007764421 .062 .02874655 .06975085 .001180751 .02531683 -.004048589 .079 .03399039 .06019911 .02551085 .04356715 .0310944 .068 .03791937 .06255518 .019941313 .05022538 .06428 .046 .05228941 .04292477 .017087875 .06512183 .04595546 .052 .031070895 .04760897 .023035197 .06733068 .05516641 .037 .008696091 .022149924 .025292696 .0509212 .04805718 .029 .006773674 .023302693 .021849955 .03152506 .011953605 .012 .00302829 .03045487 .018319173 .018179957 .02080968 .015 .010981606 .03069211 .01345693 .015165864 .008303516 .025 .03818205 .04089288 .015387368 .007820651 .010101924 .036 .03452006 .03584749 .017335817 .011510062 .03085883 .047 .03667319 .03766511 .013595447 .02166007 .04011321 .059 .03898745 .025593406 .004195063 .02470353 .02569367 .058 .025748033 .005592703 -.001534697 .035775363 .029461836 .072 .05186257 .034571752 -.002021203 .03868772 .0398559 .061 .05928891 .03353216 .0161481 .03672911 .01764238 .014 .06342068 .018211482 .017984418 .0380428 .03139775 -.032 .06243018 .018747104 .03047369 .033733726 .033021096 -.061 .03949186 -.017352613 .032317627 .028866187 -.010453694 -.081 .04318146 -.019380176 .02719671 .0189473 .02242801 -.065 .035846256 .02692859 .02283163 .02248398 .01477213 -.029 .0409 .03889057 .02242066 .03785257 .003396192 .005 .03857499 .05541473 .028941363 .04818999 .029012986 .039 .02692009 .06664848 .03839999 .063958384 .010058184 .083 .03297373 .04101127 .03883952 .06480881 .026324544 .107 .03959653 .06054202 .036271237 .06725128 .03857407 .075 .04687658 .034375284 .030074896 .07295554 .030016804 .076 .024391215 .0207593 .016178379 .065640524 .03717389 .063 .005838784 .03254855 .011539886 .05321779 .036737546 .027 .000732203 .015890202 .005441154 .04056079 .01424755 .015 -.002025588 .05496934 -.005665725 .007522337 .007846387 -.001 .028212607 .036681067 -.004335946 -.017121905 .01250594 -.017 .03998247 .04987184 -.000616657 -.014710952 -.00049339 -.01 .03867529 .05044868 .003416942 -.011813972 -.00188065 .005 .04083484 .011498967 .010456725 .01313668 .01491368 .028 .032215476 .04828115 .01278433 .0296833 .002654192 .048 .03258446 .015707554 .010514015 .03784149 -.001912172 .041 .02694338 .02053765 .007017912 .03301423 .005125017 -.009 .02464757 .0400604 .008199689 .015919374 -.01868332 .038 .03993078 .04220926 .005746135 .025793217 -.005911494 .047 .05529072 .05431711 .008809593 .02063149 .017165432 .077 .05989317 .066612795 .013646583 .02710546 .02884403 .12 .05748491 .06818556 .017229607 .04896003 .03726172 .066 .04655641 .05111898 .02444402 .05043372 .036128163 .079 .030096613 .04093498 .02429197 .04781632 .03432884 .062 .03160237 .01501046 .02370696 .03985261 .020981267 .071 .04588263 .02387351 .0257305 .03162305 .05209525 .081 .04553443 .011341385 .026474627 .03574734 .04374102 .069 .05498263 .008914476 .032616325 .0503335 .02875244 .09 .04806678 .018647296 .03832044 .04947587 .04931609 .062 .02698179 -.009260945 .035103742 .04119911 .03880127 .064 .032730877 .011500795 .03722008 .031677015 .04183601 .066 .03857545 .036755715 .03898238 .0200051 .02980916 .055 .0580129 .03946248 .036197655 .03071206 .033133514 .062 .05951871 .05829326 .03257025 .03982709 -.007168933 .068 .05664859 .05114701 .03155845 .03474158 .013516975 .069 .04582468 .04590492 .01909501 .03812844 .02379412 .073 .027523303 .031214973 .017430725 .02921731 -.005199719 end
Code:
clear *
* Example generated by -dataex-. For more info, type help dataex
clear
input float(hongkong australia newzealand austria canada denmark)
.062 .04048913 .04724391 -.01308351 .01006395 -.01229182
.059 .03785692 .03875869 -.007580798 .02126387 -.003092842
.058 .02250948 .08991753 .000542671 .018919427 -.007764421
.062 .02874655 .06975085 .001180751 .02531683 -.004048589
.079 .03399039 .06019911 .02551085 .04356715 .0310944
.068 .03791937 .06255518 .019941313 .05022538 .06428
.046 .05228941 .04292477 .017087875 .06512183 .04595546
.052 .031070895 .04760897 .023035197 .06733068 .05516641
.037 .008696091 .022149924 .025292696 .0509212 .04805718
.029 .006773674 .023302693 .021849955 .03152506 .011953605
.012 .00302829 .03045487 .018319173 .018179957 .02080968
.015 .010981606 .03069211 .01345693 .015165864 .008303516
.025 .03818205 .04089288 .015387368 .007820651 .010101924
.036 .03452006 .03584749 .017335817 .011510062 .03085883
.047 .03667319 .03766511 .013595447 .02166007 .04011321
.059 .03898745 .025593406 .004195063 .02470353 .02569367
.058 .025748033 .005592703 -.001534697 .035775363 .029461836
.072 .05186257 .034571752 -.002021203 .03868772 .0398559
.061 .05928891 .03353216 .0161481 .03672911 .01764238
.014 .06342068 .018211482 .017984418 .0380428 .03139775
-.032 .06243018 .018747104 .03047369 .033733726 .033021096
-.061 .03949186 -.017352613 .032317627 .028866187 -.010453694
-.081 .04318146 -.019380176 .02719671 .0189473 .02242801
-.065 .035846256 .02692859 .02283163 .02248398 .01477213
-.029 .0409 .03889057 .02242066 .03785257 .003396192
.005 .03857499 .05541473 .028941363 .04818999 .029012986
.039 .02692009 .06664848 .03839999 .063958384 .010058184
.083 .03297373 .04101127 .03883952 .06480881 .026324544
.107 .03959653 .06054202 .036271237 .06725128 .03857407
.075 .04687658 .034375284 .030074896 .07295554 .030016804
.076 .024391215 .0207593 .016178379 .065640524 .03717389
.063 .005838784 .03254855 .011539886 .05321779 .036737546
.027 .000732203 .015890202 .005441154 .04056079 .01424755
.015 -.002025588 .05496934 -.005665725 .007522337 .007846387
-.001 .028212607 .036681067 -.004335946 -.017121905 .01250594
-.017 .03998247 .04987184 -.000616657 -.014710952 -.00049339
-.01 .03867529 .05044868 .003416942 -.011813972 -.00188065
.005 .04083484 .011498967 .010456725 .01313668 .01491368
.028 .032215476 .04828115 .01278433 .0296833 .002654192
.048 .03258446 .015707554 .010514015 .03784149 -.001912172
.041 .02694338 .02053765 .007017912 .03301423 .005125017
-.009 .02464757 .0400604 .008199689 .015919374 -.01868332
.038 .03993078 .04220926 .005746135 .025793217 -.005911494
.047 .05529072 .05431711 .008809593 .02063149 .017165432
.077 .05989317 .066612795 .013646583 .02710546 .02884403
.12 .05748491 .06818556 .017229607 .04896003 .03726172
.066 .04655641 .05111898 .02444402 .05043372 .036128163
.079 .030096613 .04093498 .02429197 .04781632 .03432884
.062 .03160237 .01501046 .02370696 .03985261 .020981267
.071 .04588263 .02387351 .0257305 .03162305 .05209525
.081 .04553443 .011341385 .026474627 .03574734 .04374102
.069 .05498263 .008914476 .032616325 .0503335 .02875244
.09 .04806678 .018647296 .03832044 .04947587 .04931609
.062 .02698179 -.009260945 .035103742 .04119911 .03880127
.064 .032730877 .011500795 .03722008 .031677015 .04183601
.066 .03857545 .036755715 .03898238 .0200051 .02980916
.055 .0580129 .03946248 .036197655 .03071206 .033133514
.062 .05951871 .05829326 .03257025 .03982709 -.007168933
.068 .05664859 .05114701 .03155845 .03474158 .013516975
.069 .04582468 .04590492 .01909501 .03812844 .02379412
.073 .027523303 .031214973 .017430725 .02921731 -.005199719
end
cls
* Creates the r-squared frame indexed to each individual unit
mkf rsquare
* We only need one row
frame rsquare: set obs 1
* Our time variable
g time = _n, b(hongkong)
* Gets the list of variable names
qui ds
** Our time column
loc temp: word 1 of `r(varlist)'
loc time: disp "`temp'"
** Our treated unit
loc t: word 2 of `r(varlist)'
loc treated_unit: disp "`t'"
loc a: word 3 of `r(varlist)'
* Our first control unit
loc donor_one: disp "`a'"
local nwords : word count `r(varlist)'
loc b: word `nwords' of `r(varlist)'
* Our last control unit
loc last_donor: disp "`b'"
*** Step 1: Initial Selection Loop
/* We begin by looping over our controls in regression. */
qui foreach i of var `donor_one'-`last_donor' {
cap drop cfp
constraint define 1 `i' = 1
qui cnsreg `treated_unit' `i' if `time' < 45, constraint(1)
// Calculating our rsquared statistic for the i-th model
qui predict cfp if e(sample)
qui corr `treated_unit' cfp if e(sample)
frame rsquare: g `i' = r(rho)^2
cap drop cfp
}
** Step 1b: now we select the unit with the highest r-squared statistic
frame rsquare {
qui ds
loc donors `r(varlist)'
qui egen max_value = rowmax(*)
qui gen max_var = ""
* Loop through each column and update max_var
qui foreach var of varlist `donors' {
replace max_var = "`var'" if `var' == max_value
}
loc colmax : di max_var[1]
loc U: di "`colmax'"
di "First selected unit is `U'"
// In this case it's canada
}
frame drop rsquare
cls
Now, using the new control group (which for now consists of just canada), we take the average of canada and the other control units individually, estimate difference-in-differences with each of the remaining control units, and then find the unit which maximizes the r-squared statistic. In this case, it's New Zealand.
Code:
// Step 2: DID Step
*******
*******
order `U', a(`treated_unit')
mkf rsquare
frame rsquare: set obs 1
loc newdonors : list donors - U
di "`newdonors'"
local nwords : word count `newdonors'
loc temp: word 1 of `newdonors' // Time
loc donor_one: disp "`temp'"
loc last_donor: word `nwords' of `newdonors'
di "The last donor in the set is `last_donor'"
/* In this step, we loop through the REMAINING donors.
In this case, australia and austria. */
foreach i of var `donor_one'-`last_donor' {
// These must be created each time
cap drop cfp
cap drop ym
* We take the average of controls, using the U selected group and the new donor `v'
egen ym = rowmean(`U' `i')
constraint define 1 ym = 1
qui cnsreg `treated_unit' ym if `time' < 45, constraint(1)
qui predict cfp if e(sample)
qui corr `treated_unit' cfp if e(sample)
qui frame rsquare: g `i' = r(rho)^2
cap drop cfp
if `i'==`last_donor' {
frame rsquare {
cap drop max_value
cap drop max_var
qui ds
loc donors `r(varlist)'
egen max_value = rowmax(*)
gen max_var = ""
* Loop through each column and update max_var
qui foreach var of varlist `donors' {
replace max_var = "`var'" if `var' == max_value
}
loc colmax : di max_var[1]
di "Our next optimal donor is: `colmax'"
local selectedunit `U' `colmax'
loc newdonors : list donors - selectedunit
di "Here are our new controls: `newdonors'"
}
}
}

Comment