Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pairwise Correlation of values of one variable within the a group, with values of several other variables of the same group

    Hey there,

    I've got a similar problem to my last post (http://www.statalist.org/forums/foru...eral-variables) but I think the question differs enough to create a new thread.

    We have a wide structured data of mutual funds and we want to correlate pairwise the returns of the funds with the same category of one fundfamily (group of investors) with the funds of the other category of the same group of investors. We simplified our data so we only have two categories left.

    Here's a small sample of our data:

    date ret_1_1_1 ret_1_2_1 ret_1_3_2 ret_2_4_1 ret_2_5_2 ret_2_6_2 ret_3_7_1 ret_3_8_1 ret_3_9_2
    2001 .06654
    -.32771295
    -.05421696
    -.06160638
    -.13911781
    .01840753
    .02514 .04545984 .00515651
    2002 .0501 -.065123 .15121 -.1321651 ... ... ... ... ...
    ... ... ... ... ... ... ... ... ... ...
    ... ... ... ... ... ... ... ... ... ...
    ... ... ... ... ... ... ... ... ... ...
    2005 ... ... ... ... ... ... ... ...

    the labeling of the variables is the following: ret_id_idf_cat, while
    • id is the ID of the fund familiy
    • idf is the ID of the fund
    • cat is the category
    • ex. ret_72_517_2 is the return of fund 517 which belongs to fund family 72 and is in the category 2
    We tried our correlation with the following code:

    Code:
    set obs = `num_id' 
    //num_id was defined as the number of distinct fund families 
    
    gen id = _n
    
    gen dummy =1
    
    summarize id
    scalar idmin = r(min)
    scalar idmax = r(max)
    sort id
    
    
    
    forvalues i= `=idmin'/`=idmax' {
        
            pwcorr mret_vw`i'_*_1  mret_vw`i'_*_2
            
            mata: C = st_matrix("r(C)")
            mata: st_numscalar("avg_cor", mean(abs(select(vech(C), vech(C) :< 1))))
            
            display avg_cor
            replace dummy = avg_cor if id == `i'
            
            }
    the code itselve does not bring any errors but it doesn't compute the intended result either as it correlates as well the returns of the funds within one category. A sample of the computed result is:
    mr~395_1 m~_396_1 mr~397_1 mr~398_1 mr~402_1 mr~403_1 m~_404_1 mr~_88_2 mre~48_2
    mret_v~395_1 10.000
    mret_~_396_1 0.9140 10.000
    mret_v~397_1 0.9579 0.7647 10.000
    mret_v~398_1 0.9990 0.9197 0.9552 10.000
    mret_v~402_1 0.9993 0.9127 0.9603 0.9994 10.000
    mret_v~403_1 0.9285 0.9789 0.8400 0.9294 0.9468 10.000
    mret_v~404_1 0.9439 0.7782 0.9748 0.9387 0.9519 0.7635 10.000
    mret_v~_88_2 0.9844 0.9738 0.9208 0.9883 0.9861 0.9862 0.8884 10.000
    mret_v~_82_2 0.8322 0.9216 0.6650 0.8258 0.8308 0.9149 0.7028 0.8323 10.000


    The only numbers we want are in row "mret_v~88_2" and "mret_v~_82_2" for all the columns except for those in the columns "mret_v~88_2" and "mret_v~_82_2" because those belong to the same category. Therefore the data we want is :
    0.9844 0.9738 0.9208 0.9883 0.9861 0.9862 0.8884
    0.8322 0.9216 0.6650 0.8258 0.8308 0.9149 0.7028
    from there we want to build the mean value over the correlations.


    Sorry for the load of information, but I think the problem is quite difficult, even though there might be a simple answer.

    Regards,
    Florian

  • #2
    You didn't get a quick answer. This often means you'd do better following the FAQ on asking questions - provide Stata code in code delimiters, Stata output, and sample data using dataex. Also, try to simplify the question and data to the simplest that will demonstrate your problem.

    I have not tried to understand your program, but may have a suggestion. If you don't want correlations for some conditions, you can set up an if statement. It could be an if statement that characterized the correlations you don't want. This if statement would lead to a continue statement (which would move to the next iteration of the loop. Alternatively, you could have an if statement that characterized the conditions you do want with all the calculations you want. This may not work if you're using the wild cards to generate a lot of correlations at once.

    A third alternative would be to simply dump all the correlations you don't want. This wastes processing time, but that is usually not a big problem.

    Comment

    Working...
    X