Pairwise Correlation of values of one variable within the a group, with values of several other variables of the same group

Florian Krenn

Join Date: Mar 2017
Posts: 8

Pairwise Correlation of values of one variable within the a group, with values of several other variables of the same group

23 Mar 2017, 04:16

Hey there,

I've got a similar problem to my last post (http://www.statalist.org/forums/foru...eral-variables) but I think the question differs enough to create a new thread.

We have a wide structured data of mutual funds and we want to correlate pairwise the returns of the funds with the same category of one fundfamily (group of investors) with the funds of the other category of the same group of investors. We simplified our data so we only have two categories left.

Here's a small sample of our data:

date

ret_1_1_1

ret_1_2_1

ret_1_3_2

ret_2_4_1

ret_2_5_2

ret_2_6_2

ret_3_7_1

ret_3_8_1

ret_3_9_2

2001

.06654

-.32771295

-.05421696

-.06160638

-.13911781

.01840753

.02514

.04545984

.00515651

2002

.0501

-.065123

.15121

-.1321651

...

2005

...

the labeling of the variables is the following: ret_id_idf_cat, while

id is the ID of the fund familiy
idf is the ID of the fund
cat is the category
ex. ret_72_517_2 is the return of fund 517 which belongs to fund family 72 and is in the category 2

We tried our correlation with the following code:

Code:

set obs = `num_id' 
//num_id was defined as the number of distinct fund families 

gen id = _n

gen dummy =1

summarize id
scalar idmin = r(min)
scalar idmax = r(max)
sort id



forvalues i= `=idmin'/`=idmax' {
    
        pwcorr mret_vw`i'_*_1  mret_vw`i'_*_2
        
        mata: C = st_matrix("r(C)")
        mata: st_numscalar("avg_cor", mean(abs(select(vech(C), vech(C) :< 1))))
        
        display avg_cor
        replace dummy = avg_cor if id == `i'
        
        }

the code itselve does not bring any errors but it doesn't compute the intended result either as it correlates as well the returns of the funds within one category. A sample of the computed result is:

	mr~395_1	m~_396_1	mr~397_1	mr~398_1	mr~402_1	mr~403_1	m~_404_1	mr~_88_2	mre~48_2
mret_v~395_1	10.000
mret_~_396_1	0.9140	10.000
mret_v~397_1	0.9579	0.7647	10.000
mret_v~398_1	0.9990	0.9197	0.9552	10.000
mret_v~402_1	0.9993	0.9127	0.9603	0.9994	10.000
mret_v~403_1	0.9285	0.9789	0.8400	0.9294	0.9468	10.000
mret_v~404_1	0.9439	0.7782	0.9748	0.9387	0.9519	0.7635	10.000
mret_v~_88_2	0.9844	0.9738	0.9208	0.9883	0.9861	0.9862	0.8884	10.000
mret_v~_82_2	0.8322	0.9216	0.6650	0.8258	0.8308	0.9149	0.7028	0.8323	10.000

The only numbers we want are in row "mret_v~88_2" and "mret_v~_82_2" for all the columns except for those in the columns "mret_v~88_2" and "mret_v~_82_2" because those belong to the same category. Therefore the data we want is :

0.9844	0.9738	0.9208	0.9883	0.9861	0.9862	0.8884
0.8322	0.9216	0.6650	0.8258	0.8308	0.9149	0.7028

from there we want to build the mean value over the correlations.

Sorry for the load of information, but I think the problem is quite difficult, even though there might be a simple answer.

Regards,
Florian

Tags: None

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

24 Mar 2017, 10:27

You didn't get a quick answer. This often means you'd do better following the FAQ on asking questions - provide Stata code in code delimiters, Stata output, and sample data using dataex. Also, try to simplify the question and data to the simplest that will demonstrate your problem.

I have not tried to understand your program, but may have a suggestion. If you don't want correlations for some conditions, you can set up an if statement. It could be an if statement that characterized the correlations you don't want. This if statement would lead to a continue statement (which would move to the next iteration of the loop. Alternatively, you could have an if statement that characterized the conditions you do want with all the calculations you want. This may not work if you're using the wild cards to generate a lot of correlations at once.

A third alternative would be to simply dump all the correlations you don't want. This wastes processing time, but that is usually not a big problem.
Comment

Announcement

Pairwise Correlation of values of one variable within the a group, with values of several other variables of the same group

Comment