Dear all,
I have a question regarding how to best compare treatment groups to possible control groups.
Status quo:
I have a panel dataset of investment funds on 35 different countries. My dataset contains 2800 funds, for which I have monthly data since their launch. Each fund is matched to a counterpart, which is almost identical in terms of relative performance and tracking difference. In direct comparison to each other each fund of the "fund pairs" is then labeled according to their level of costs (low, equal, expensive). My variable of interest is the net fund flow, for which I expect to be higher after a treatment (law in 2007) for funds with low costs and lower for funds with high costs.
My general approach is to use a differences in differences set up, to control for other global effects, such as the global financial crisis.
Here a sample of my dataset:
Questions:
1) Since a law in 2007 affects only 12 of the 35 countries, I am interested in finding the best control groups for each of the country. However I am not sure how to best compare two countries and check for same pre-treatment trends.
I used
But as I have more than 200 different treatment/control variations and additionally have to check for their cost level, I was wondering if their is a simplier and faster way to do so?
I checked absolutff for normality with the Shapiro-Wilk test in order to compare those values with a ttest, but as the Shapiro-Wilk tests rejects the null hypothesis I can't use this method.
Any help would be highly appreciated
Best regards
Nils
I am using Stata 13.0 MP.
I have a question regarding how to best compare treatment groups to possible control groups.
Status quo:
I have a panel dataset of investment funds on 35 different countries. My dataset contains 2800 funds, for which I have monthly data since their launch. Each fund is matched to a counterpart, which is almost identical in terms of relative performance and tracking difference. In direct comparison to each other each fund of the "fund pairs" is then labeled according to their level of costs (low, equal, expensive). My variable of interest is the net fund flow, for which I expect to be higher after a treatment (law in 2007) for funds with low costs and lower for funds with high costs.
My general approach is to use a differences in differences set up, to control for other global effects, such as the global financial crisis.
Here a sample of my dataset:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int(id date) double absolutff int pair_no byte(costs country) int pair_since 31 19327 -9.239015700989455 15 1 1 17175 25 19144 186.85838569562338 905 1 1 17353 57 18596 -5.496761722961907 28 1 1 17350 61 17682 .06686852085497819 30 1 1 17346 63 17805 -.16932426191381467 31 1 1 17346 61 18046 1.582140805531317 30 1 1 17346 61 18443 2.8458500413119765 30 1 1 17346 1 19509 -.11380961815754276 1 1 1 18770 1 19297 1.039429086571019 1 1 1 18770 9 18473 1.0064870991515136 5 1 1 17322 43 18746 125.9467074758661 21 1 1 18400 3 17955 -1369.9539938037483 2 1 1 14703 9 19478 1.6442268547417598 5 1 1 17322 25 19023 12.392920762605286 905 1 1 17353 61 18991 4.001175138684253 30 1 1 17346 1 19600 1.1098453150021186 1 1 1 18770 43 18837 66.86733528832815 21 1 1 18400 25 19236 11.289322551892155 905 1 1 17353 3 16252 110.86366342183373 2 1 1 14703 31 18808 54.61094808673886 15 1 1 17175 57 19445 9.853336436322266 28 1 1 17350 65 18352 -.004607725977749055 32 1 1 17350 1 19904 1.1246802915969099 1 1 1 18770 25 18046 -199.0324547350574 905 1 1 17353 55 18746 68.34967832515457 27 1 1 17343 51 18535 25.69649703638811 25 1 1 17343 24 19264 47.337987857405096 12 1 1 18165 31 17197 . 15 1 1 17175 61 18231 -.09163433605707638 30 1 1 17346 15 19570 -3.5150794364147373 8 1 1 17339 18 19236 33.55259905528828 9 1 1 17339 43 18596 5.045689752387972 21 1 1 18400 65 18778 4.374711156891422 32 1 1 17350 49 19509 3.0397776249698154 24 1 1 17343 31 19600 -15.394984354814596 15 1 1 17175 61 19082 -.1735562131553614 30 1 1 17346 57 19358 -.14879663995753845 28 1 1 17350 49 18109 44.5902421508824 24 1 1 17343 31 17409 .00016059699336778976 15 1 1 17175 25 18961 -21.20220181570619 905 1 1 17353 15 19478 -3.305031308009177 8 1 1 17339 31 19509 62.30445986369057 15 1 1 17175 55 17652 -.23154089861505156 27 1 1 17343 49 17652 48.6263461243002 24 1 1 17343 20 19996 51.39883592988346 10 1 1 18760 55 18261 21.478548481939953 27 1 1 17343 3 17164 -315.4292466016568 2 1 1 14703 63 18319 -3.8384396126380125 31 1 1 17346 47 19843 . 23 1 1 19135 24 19052 -46.484487331075115 12 1 1 18165 59 18017 6.447018961867215 29 1 1 17343 55 18291 .19629014500452513 27 1 1 17343 51 17409 -.0045817941029486775 25 1 1 17343 18 19052 58.01049379529945 9 1 1 17339 49 17378 . 24 1 1 17343 24 19236 58.12006042575092 12 1 1 18165 65 17744 .0012631302871231043 32 1 1 17350 3 18078 -407.2088331905388 2 1 1 14703 59 18931 8.370869002767364 29 1 1 17343 31 18170 14.620260073860663 15 1 1 17175 65 17987 -.008088753366756407 32 1 1 17350 23 19723 5.340970770649221 12 1 1 18165 3 18261 121.3147833386547 2 1 1 14703 51 18382 -6.747851027524092 25 1 1 17343 3 15886 99.3797657030093 2 1 1 14703 5 18658 4.579470763805517 3 1 1 17318 53 18413 6.480326349914506 26 1 1 17350 35 19445 12.851672766711715 17 1 1 18004 35 18931 -26.531320308799366 17 1 1 18004 65 18535 -4.546527780115142 32 1 1 17350 25 19662 202.49404516705386 905 1 1 17353 63 18352 .6014626868254211 31 1 1 17346 3 16555 74.41402490798873 2 1 1 14703 3 17409 524.2630771994791 2 1 1 14703 65 18291 -5.8174220300813175 32 1 1 17350 53 19537 -12.241589094764848 26 1 1 17350 43 18473 18.723550559313765 21 1 1 18400 31 18382 -51.67163543709387 15 1 1 17175 49 18870 -4.492430298452746 24 1 1 17343 5 18505 1.7915204015726118 3 1 1 17318 63 17864 .06776539857245822 31 1 1 17346 9 17927 20.407215148969726 5 1 1 17322 61 18319 -10.455990778935671 30 1 1 17346 3 16952 -5.514699409579407 2 1 1 14703 23 19935 10.430967079593529 12 1 1 18165 51 19509 2.420117353967939 25 1 1 17343 3 17255 -135.98127113448754 2 1 1 14703 15 17652 -5.851729639516634e-06 8 1 1 17339 55 19236 .277199056267051 27 1 1 17343 63 17836 .0015359481089411986 31 1 1 17346 15 19662 28.14391024008188 8 1 1 17339 35 18078 -.000058075902785503786 17 1 1 18004 5 17713 .023399700631649978 3 1 1 17318 65 17896 4.445647495783408 32 1 1 17350 57 19113 -7.046332941747657 28 1 1 17350 43 18778 170.7652200524583 21 1 1 18400 15 18808 -.0003142356430885229 8 1 1 17339 65 18991 .06398520772777516 32 1 1 17350 51 18078 -2.084300174636425 25 1 1 17343 51 19417 2.2194822442202593 25 1 1 17343 end format %tdnn/dd/CCYY date format %tdnn/dd/CCYY pair_since label values costs costs label def costs 1 "Low", modify label values country countrylab label def countrylab 1 "Denmark", modify
Questions:
1) Since a law in 2007 affects only 12 of the 35 countries, I am interested in finding the best control groups for each of the country. However I am not sure how to best compare two countries and check for same pre-treatment trends.
I used
Code:
bysort country date: egen meanabsolutff = mean(absolutnff) twoway (tsline meanabsolutff if country == 1) (tsline meanabsolutff if country == 2), by(costs, total)
I checked absolutff for normality with the Shapiro-Wilk test in order to compare those values with a ttest, but as the Shapiro-Wilk tests rejects the null hypothesis I can't use this method.
Code:
by country costs, sort : swilk absolutff
Best regards
Nils
I am using Stata 13.0 MP.