Generate correlation as variable

paulvonhippel

Join Date: Apr 2014

Posts: 502
#1

Generate correlation as variable

31 Jan 2015, 02:59

I need to generate a variable representing the correlation between variables x and y, for different countries and years. I want to store that variable in my dataset. There was a thread on this in 2008, where it was suggested that I this would work ssc install egenmore egenmore corrxy = corr(x,y) , by(country year) For me, though, that doesn't work. The ssc installation appears to run OK, but the next command generates an error "unrecognized command: egenmore" If instead I type egen corrxy = corr(x,y) , by(country year) I get a different error: "invalid syntax". If I type help egenmore I get a help page for the egen command, which contains no description of a corr() function.
Tags: None
paulvonhippel

Join Date: Apr 2014

Posts: 502
#2

31 Jan 2015, 03:08

Never mind. The command does work as
egen corrxy = corr(x y) , by(country year)
My mistake, which I inherited from the original thread, was inserting a comma between x and y.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35694
#3

31 Jan 2015, 03:14

The help for egenmore does include documentation of the corr() function and explains that the syntax is to use egen as a command. egenmore is no more than a paclage name. Your syntax error is to include a comma. The syntax is documented as

corr(varname1 varname2) [ , covariance spearman taua taub by(byvarlist) ]

The problem is that any comma between variable names is echoed to the command being called, in this case correlate.
Comment
Christina Enders

Join Date: Jul 2019

Posts: 6
#4

26 Jul 2019, 11:08

Hi,

I have another question regarding the "egen corrxy = corr(x y)" command. I have a pairwise correlation matrix with 74 different variables and I want to store every possible combination of correlations between all 74 variables as a single variable so that I get a dataset with (74 * 74 - 74) / 2 = 2701 variables.

Now my question is, do I really have to use the "egen corrxy = corr(x y)" for 2701 times or is there a smarter solution to my problem?

Thanks a lot in advance!

Best regards,
Christina
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35694

26 Jul 2019, 12:23

No; you don't. Let's illustrate with a token script.

Code:

sysuse auto, clear 

local varlist headroom trunk length displacement weight 

local nvars : word count `varlist' 

local N = `nvars' * (`nvars' - 1) / 2 

if `N' > _N set obs `N' 

gen x = "" 
gen y = "" 
gen r = . 
local k = 1 
tokenize "`varlist'" 

forval i = 1/`nvars' { 
    local J = `i' + 1 
    forval j = `J'/`nvars' { 
        quietly {
            corr ``i'' ``j'' 
            replace x = "``i''" in `k' 
            replace y = "``j''" in `k' 
            replace r = r(rho) in `k' 
        }
        local ++k 
    }
}

list x y r if !missing(x) 


     +----------------------------------------+
     |            x              y          r |
     |----------------------------------------|
  1. |     headroom          trunk   .6620111 |
  2. |     headroom         length   .5162955 |
  3. |     headroom   displacement   .4744915 |
  4. |     headroom         weight   .4834558 |
  5. |        trunk         length   .7265956 |
     |----------------------------------------|
  6. |        trunk   displacement   .6086351 |
  7. |        trunk         weight   .6722057 |
  8. |       length   displacement     .83514 |
  9. |       length         weight   .9460086 |
 10. | displacement         weight   .8948958 |
     +----------------------------------------+

Comment

Christina Enders

Join Date: Jul 2019

Posts: 6
#6

26 Jul 2019, 18:08

Hi Nick,

first of all, thanks a lot for your code, that really brings me much closer to what I want. Now I have two subsequent questions.

1) Is there a possibility to tell Stata to just use all variables of the dataset in code line "local varlist headroom trunk length displacement weight" or do I need to list every single variable in that line (wouldn't be that bad but maybe a bit prone to errors)? I tried it with var1 - var74 but somehow that didn't work.

2) Is there a possibility to reshape the resulting table in that way, that the variable name ist just the combination of the names of x and y and the first row consists of the r values? So that it looks like this:

headroom_trunk headroom_length headroom_displacement
1 0.6620111 0.5162955 0.4744915

and so on.
I tried several solutions with the commands -reshape- , -xpose- and -sxpose- but somehow I don't get what I want here.

Thank you so much!

Best regards,
Christina
Comment
Christina Enders

Join Date: Jul 2019

Posts: 6
#7

26 Jul 2019, 18:20

Sorry for the bad illustration of the example in 2). Actually it should look like this:
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35694
#8

27 Jul 2019, 08:16

In #4 you asked for (a) 1 new variable (b) 2701 new variables. That was contradictory but (a) made more sense to me.

What you now ask for is still puzzling as a request as it is of no obvious use for any subsequent analysis. If you explain what you are going to do with it someone might be minded to suggest some code.

This will no doubt seem harsh or unhelpful but my charity on Statalist does not go beyond questions that seem to me interesting or useful.
Comment

Christina Enders

Join Date: Jul 2019
Posts: 6

29 Jul 2019, 09:59

Hi Nick,

ok, sorry for that confusion. You're right, maybe it is very confusing if one doesn't know the overall context of the question. So I try to shortly explain that. I want to replicate the coinsurance/diversification measure by Hann, Ogneva and Ozbas (2013): "Corporate Diversification and the Cost of Capital" (Journal of Finance, Vol. 68 (p. 1969-1970)), Link: https://onlinelibrary.wiley.com/doi/10.1111/jofi.12067 .

I got idiosyncratic industry cash flows per 2-digit SIC Code industry (as residuals from a regression, that is why the variable names are res_SIC) for 27 years. These are illustrated in this data example with idiosyncratic cash flows (residuals) for 8 industries (in my complete dataset I have 74 industries):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int year float(res_10 res_12 res_13 res_14 res_15 res_16 res_17 res_20)
1990  .033254102   -.04792337   .05700066   -.09138067    .02794227 -.00026059974  .0039374474 .00008333124
1991  -.01919841   -.04014556 -.008848097   -.04181518  -.008211095    .032521505    .08611627    .02692385
1992   .02821145  -.006532365  -.04394998 -.0080966065  -.022054717     .03218632     .0575919   -.03121326
1993 .0038291214   -.02539229 -.013553325   -.14529903  -.007691163   -.007535936    .07557896    .02676738
1994   .07251374  -.032627285   .01114261   -.04820225    .05871338   -.018943405  -.007243626   .010144584
1995   .04367205   -.04133857   .03054319    .02854056 -.0015061008  -.0045481613   -.01230913 -.0005895875
1996  .003749058    .06145516   .04461166    .07190183   .035624944    -.01982356   .009847625   -.03091387
1997  -.07819264   .001658237   .01981391    .07108616   -.00518429    -.02092511 -.0014665262  -.003491783
1998  -.04154119     .0699422 -.010625497   -.02252452    .05107028    .021999307   .005278377  .0042764693
1999  .011160394 -.0038090604  .065049924   -.02648608    .06857794   .0017683457  .0019574217  -.013661997
2000   .02061287    .02761672   .04835914   -.06898598   .003824244     .01472899   -.03069713  .0008440256
2001  .005321359 -.0017550978  .067692034    .04415399   .017593222  -.0039787963   -.04837864      .005605
2002   .04406443   .029889535  .023144836     .0215849   .036073036   .0021540255  -.029622594 -.0027596336
2003  -.03246744   -.02577336   .03097895   .022476805   .035536606   -.031978518 -.0015668035  -.007414372
2004 .0033156725     .0757577   .05927413   .018581428    .05933197   -.031759653  -.015944071   .007566282
2005 -.031742893  -.005693389   .07974613   .009071746    .11699577   -.009109915    .02401482   .013660272
2006  -.03639871  -.005954036   .05664376    .07321639   -.05554803    .003499886   .005934403  -.002626764
2007   .03727423    .14695072   .09391564    .07616765   -.11068815    -.01322465  -.013029888  -.003010965
2008  -.10562408    .04227998 -.010594637    .10868854   -.28537837     .02062974   .011361933   -.02128578
2009  .025218315    .08033428  -.04314794    .04224764  -.013182837     .01676502  -.009288081   -.00630211
2010   .01658875    .06149259  -.02837122   -.03007945 .00005215796     .02538311   -.04897374  -.008060934
2011   .03678665    .08610797   .01109371   .004797227   .001603199   -.015716467   -.05856934  -.014376557
2012   .04163102   .002470567  -.04690592    .09402638   .020188507   .0083638625   -.04846392 -.0015014635
2013  -.06729205 -.0013210756  -.01913452  -.018016137   .017436218   -.027384736   -.07219469   .020294074
2014  -.04172251   -.13514738 -.013596268   .005979343   .024419414    .010108522    .05206577   .011064536
2015  -.05911325   -.19696306  -.29779744  -.035499208   -.04279067   -.006380766   .000571963    .01064037
2016   .04435107   -.04277896   -.1709661    -.0870742  -.017588003    .018067393    .03630019   .007348059
2017   .04173888    -.0728008  .008480637   -.06906129  -.005159727   .0033942484   .027191086  .0019908443
end

Now, I need the pairwise correlation of the idiosyncratic industry cash flows of every possible industry combination on a 10-year rolling window basis. That is, e.g. for 2018: for industries 10 and 12 I get the correlation calculated over their according idiosyncratic cash flows over the years 2008-2017, for 2017 the correlation over 2007-2016 and so on. In the end, I want to obtain a dataset that looks like this in order to merge it with my actual dataset:

Click image for larger version

Name: Corr.PNG
Views: 1
Size: 8.6 KB
ID: 1509864

I really hope that this explanation is comprehensible and that someone might help me.

Thank you a lot in advance.

Best regards,
Christina

Comment

Christina Enders

Join Date: Jul 2019

Posts: 6
#10

31 Jul 2019, 09:56

Dear Statalist members,

according to the FAQ "polite bumping" is allowed after around 12 hours of no answer.

Since I posted the extended version of my question with explanation around two days ago, I now feel free to ask again if someone please might help me with my problem?

Thanks a lot for every kind of help.

Best regards,
Christina
Comment

Lovely Tolin

Join Date: Apr 2016
Posts: 10

#11

24 Jun 2020, 00:18

Hi all, first of all thanks for this thread.

I have a montly exchange rate dataset for all countries. I have adapted Nick's code (reply to Christina's query) to generate correlation coefficients for all possible combinations of countries in terms of exchange rates, and it worked perfectly well. However, I wanted to limit the computation to generate cross country correlation only between Asia (total of 40 countries) and all countries (total of 191 countries) in the world. As such I did some tweaking, but unfortunately, the results are not what I desired. Instead, the x variable reports only 40 countries which are not necessarily from Asia. I can't seem to understand where I have gone wrong and I was wondering if it's before of my forval loop?

Thanks in advance for the help!

Code:

forval y = 2006/2019  {  
    use er`y', clear
    quietly describe, varlist
    local vars `r(varlist)'  // provides a list of all countries in the world
    unab omit:  year year1 year_act
    local allcountry: list vars - omit
    
    local asia AFG ARM AUS AZE BGD BTN KHM CHN FJI GEO HKG IND IDN JPN KAZ KIR KGZ LAO MYS MDV FSM MNG MMR NPL NZL PAK PNG PHL WSM SGP SLB KOR LKA TWN TJK THA TON UZB VUT VNM
        * local asia used to tease out only countries in asia
    
    local nallcountry : word count `allcountry'
    local nasia : word count `asia'
    local N = `nasia' * `nallcountry'  
    if `N' > _N set obs `N'

    gen x = ""
    gen y = ""
    gen r = .
    local k = 1  //place holder
    tokenize "`asia'"
    tokenize "`allcountry'"

    forval i = 1/`nasia' {  
        forval j = 1/`nallcountry' {
            quietly {
            capture corr ``i'' ``j''
            replace x = "``i''" in `k'
            replace y = "``j''" in `k'
            replace r = r(rho) in `k'
                }
        local ++k  //increment to the next macro
            }
    }
        keep x y r
        gen remove = (x==y)
        keep if remove != 1
        
*        collapse (mean) r, by (x)
        export excel x r using "er_outputasia.xlsx", sheet(`y') sheetmodify
}

Last edited by Lovely Tolin; 24 Jun 2020, 00:22.

Announcement