Hey there,
I've got a similar problem to my last post (http://www.statalist.org/forums/foru...eral-variables) but I think the question differs enough to create a new thread.
We have a wide structured data of mutual funds and we want to correlate pairwise the returns of the funds with the same category of one fundfamily (group of investors) with the funds of the other category of the same group of investors. We simplified our data so we only have two categories left.
Here's a small sample of our data:
the labeling of the variables is the following: ret_id_idf_cat, while
the code itselve does not bring any errors but it doesn't compute the intended result either as it correlates as well the returns of the funds within one category. A sample of the computed result is:
The only numbers we want are in row "mret_v~88_2" and "mret_v~_82_2" for all the columns except for those in the columns "mret_v~88_2" and "mret_v~_82_2" because those belong to the same category. Therefore the data we want is :
from there we want to build the mean value over the correlations.
Sorry for the load of information, but I think the problem is quite difficult, even though there might be a simple answer.
Regards,
Florian
I've got a similar problem to my last post (http://www.statalist.org/forums/foru...eral-variables) but I think the question differs enough to create a new thread.
We have a wide structured data of mutual funds and we want to correlate pairwise the returns of the funds with the same category of one fundfamily (group of investors) with the funds of the other category of the same group of investors. We simplified our data so we only have two categories left.
Here's a small sample of our data:
date | ret_1_1_1 | ret_1_2_1 | ret_1_3_2 | ret_2_4_1 | ret_2_5_2 | ret_2_6_2 | ret_3_7_1 | ret_3_8_1 | ret_3_9_2 | |||||
2001 | .06654 |
|
|
|
|
|
.02514 | .04545984 | .00515651 | |||||
2002 | .0501 | -.065123 | .15121 | -.1321651 | ... | ... | ... | ... | ... | |||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |||||
2005 | ... | ... | ... | ... | ... | ... | ... | ... |
the labeling of the variables is the following: ret_id_idf_cat, while
- id is the ID of the fund familiy
- idf is the ID of the fund
- cat is the category
- ex. ret_72_517_2 is the return of fund 517 which belongs to fund family 72 and is in the category 2
Code:
set obs = `num_id' //num_id was defined as the number of distinct fund families gen id = _n gen dummy =1 summarize id scalar idmin = r(min) scalar idmax = r(max) sort id forvalues i= `=idmin'/`=idmax' { pwcorr mret_vw`i'_*_1 mret_vw`i'_*_2 mata: C = st_matrix("r(C)") mata: st_numscalar("avg_cor", mean(abs(select(vech(C), vech(C) :< 1)))) display avg_cor replace dummy = avg_cor if id == `i' }
mr~395_1 | m~_396_1 | mr~397_1 | mr~398_1 | mr~402_1 | mr~403_1 | m~_404_1 | mr~_88_2 | mre~48_2 | |
mret_v~395_1 | 10.000 | ||||||||
mret_~_396_1 | 0.9140 | 10.000 | |||||||
mret_v~397_1 | 0.9579 | 0.7647 | 10.000 | ||||||
mret_v~398_1 | 0.9990 | 0.9197 | 0.9552 | 10.000 | |||||
mret_v~402_1 | 0.9993 | 0.9127 | 0.9603 | 0.9994 | 10.000 | ||||
mret_v~403_1 | 0.9285 | 0.9789 | 0.8400 | 0.9294 | 0.9468 | 10.000 | |||
mret_v~404_1 | 0.9439 | 0.7782 | 0.9748 | 0.9387 | 0.9519 | 0.7635 | 10.000 | ||
mret_v~_88_2 | 0.9844 | 0.9738 | 0.9208 | 0.9883 | 0.9861 | 0.9862 | 0.8884 | 10.000 | |
mret_v~_82_2 | 0.8322 | 0.9216 | 0.6650 | 0.8258 | 0.8308 | 0.9149 | 0.7028 | 0.8323 | 10.000 |
The only numbers we want are in row "mret_v~88_2" and "mret_v~_82_2" for all the columns except for those in the columns "mret_v~88_2" and "mret_v~_82_2" because those belong to the same category. Therefore the data we want is :
0.9844 | 0.9738 | 0.9208 | 0.9883 | 0.9861 | 0.9862 | 0.8884 |
0.8322 | 0.9216 | 0.6650 | 0.8258 | 0.8308 | 0.9149 | 0.7028 |
Sorry for the load of information, but I think the problem is quite difficult, even though there might be a simple answer.
Regards,
Florian
Comment