Pairwise correlation over several variables

Florian Krenn

Join Date: Mar 2017

Posts: 8
#1

Pairwise correlation over several variables

20 Mar 2017, 06:29

Hi,

I need to correlate returns of funds within and between fund families. The data set is wide structured (return_family1_fund1, return_family1_fund2,...).
How can I compute the pairwise correlation of every fund within a fund family and then between fund families?

Florian
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#2

20 Mar 2017, 10:02

So, first you need to reshape your data. It's a bit complicated because of the way your variables are named. What you need to do is go half-long and half-wide (wide on fund, long on family). Given the way your variables are named,it's actually easier to go double-long in one step and then go back to half-long/half-wide

Code:

// GO DOUBLE LONG WITH ONE RESHAPE gen long obs_no = _n reshape long return_, i(obs_no) j(ff) string split ff, gen(fff) parse("_") rename fff1 family rename fff2 fund destring family, ignore("family") replace destring fund, ignore("fund") replace drop ff // NOW GO HALF LONG/HALF WIDE ON FUND reshape wide return_, i(obs_no family) j(fund) // NOW DO PAIRWISE CORRELATIONS WITHIN FAMILIES by family, sort: pwcorr return_*

This is somewhat roundabout, but it is actually simpler than juggling the variable names so as to do a single reshape to half long half wide.

At this point I can't go any farther. Apparently you have in mind some sort of aggregate return for a family that is calculated in some way from its constituent fund returns. But you don't say what that calculation is. So, not knowing how to calculate a family return, I can't calculate correlations among them. I will venture the guess that whatever that calculation is, it will best be done by returning to double-long layout first. You can do that by simply running -reshape long- after the code shown above.
Comment

Florian Krenn

Join Date: Mar 2017
Posts: 8

20 Mar 2017, 12:00

Thank you very much for the quick answer Clyde, the calculation works.

The thing is, that we have a very large dataset, so it computes all the data in the console for about half an hour and I have no chance of reading any information out of it or building the average for the correlations of the funds with the same objective, what was the actual goal of this calculation.

We actually converted our inital data to a wide structure, because we didn't see any chance to compute a pairwise correlation otherwise.

This is how the inital data looks like:

investor_name	fund_id	date	fund_classification	return
A Investments	1	2001	A	0,005
A Investments	1	2002	A	0,003
A Investments	2	2001	A	-0,0025
B Investments	3	2001	A	0,004
B Investments	3	2002	A	0,002
B Investments	3	2003	A	-0,005
C Investments	4	2001	B	0,002
C Investments	4	2002	B	0,003
C Investments	5	2001	A	0,009
C Investments	5	2002	A	-0,03

So, maybe it is clearer this way. Firstly, we want to compute average correlation over all families for the funds within a fund family, and then the average correlation over all families for the funds between fund families. The fund objective is actually negligible, because we narrowed those down and can calculate for every objective manually.

I hope I could clarify the problem a little bit.

Thanks a lot in advance for any help.

Regards,
Florian

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30121

20 Mar 2017, 15:34

I'm afraid this version is even less clear to me than the original. I don't work in finance and I don't know the jargon. The example data you show in #3 doesn't indicate which funds are in which family, unless one of those other variable names is a synonym for family. And I don't even get the concept of "the average correlation over all families for the funds between fund families." I don't know what that means.

That said, returning to #1 and #2, it is not that hard to convert the code in #2 to calculate the average correlation among funds in each family:

Code:

capture program drop avg_corr
program define avg_corr, rclass
    syntax varlist [if] [in]
    marksample touse
    pwcorr `varlist' if `touse'
    tempname M  J X
    matrix `M' = r(C)
    local r = rowsof(`M')
    matrix `J' = J(`r', `r', 1)
    forvalues i = 1/`r' {
        forvalues j = `i'/`r' {
            matrix `J'[`i', `j'] = 0
        }
    }
    matrix `X' = (`M'*`J')
    return scalar average_r = trace(`X')/(`r'*(`r'-1)/2)
    exit
end


// GO DOUBLE LONG WITH ONE RESHAPE
gen long obs_no = _n
reshape long return_, i(obs_no) j(ff) string
split ff, gen(fff) parse("_")
rename fff1 family
rename fff2 fund
destring family, ignore("family") replace
destring fund, ignore("fund") replace
drop ff

// NOW GO HALF LONG/HALF WIDE ON FUND
reshape wide return_, i(obs_no family) j(fund)

gen family_avg_corr = .
levelsof family, local(families)
foreach fam of local families {
    avg_corr return_* if family == `f'
    replace family_avg_corr = r(average_r) if family == `f'
}

This code assumes that you do not want to include the correlation of each fund with itself (which is always 1) in the average. The average correlation among the funds in a family will appear as the value of the new variable family_avg_corr in each observation for that family.

Comment

Florian Krenn

Join Date: Mar 2017
Posts: 8

23 Mar 2017, 03:28

Sorry for the unclear descriptions and thank you again for the answer.

Unfortunately the code didn't work the way we intendet it to.

But with a little bit of work we figured out a way that gave us our wanted results. The code is the following:

The data is wide shaped as described in #1

Code:

gen id = _n

gen dummy = .


summarize id
scalar idmin = r(min)
scalar idmax = r(max)

forvalues i= `=idmin'/`=num_id' {
    
    forvalues n = 1/1 {
        
        pwcorr *vw`i'_*
        
        mata: C = st_matrix("r(C)")
        mata: st_numscalar("avg_cor", mean(abs(select(vech(C), vech(C) :< 1))))
        
        display avg_cor
        replace dummy = avg_cor if id == `i'
        }
        }
    
mean(dummy)

qui summarize dummy
scalar avg_cor_M = r(mean)

Thank you very much for your support Clyde.

Florian

Announcement