Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate correlation as variable


    I need to generate a variable representing the correlation between variables x and y, for different countries and years. I want to store that variable in my dataset. There was a thread on this in 2008, where it was suggested that I this would work ssc install egenmore egenmore corrxy = corr(x,y) , by(country year) For me, though, that doesn't work. The ssc installation appears to run OK, but the next command generates an error "unrecognized command: egenmore" If instead I type egen corrxy = corr(x,y) , by(country year) I get a different error: "invalid syntax". If I type help egenmore I get a help page for the egen command, which contains no description of a corr() function.

  • #2
    Never mind. The command does work as
    egen corrxy = corr(x y) , by(country year)
    My mistake, which I inherited from the original thread, was inserting a comma between x and y.

    Comment


    • #3
      The help for egenmore does include documentation of the corr() function and explains that the syntax is to use egen as a command. egenmore is no more than a paclage name. Your syntax error is to include a comma. The syntax is documented as

      corr(varname1 varname2) [ , covariance spearman taua taub by(byvarlist) ]

      The problem is that any comma between variable names is echoed to the command being called, in this case correlate.

      Comment


      • #4
        Hi,

        I have another question regarding the "egen corrxy = corr(x y)" command. I have a pairwise correlation matrix with 74 different variables and I want to store every possible combination of correlations between all 74 variables as a single variable so that I get a dataset with (74 * 74 - 74) / 2 = 2701 variables.

        Now my question is, do I really have to use the "egen corrxy = corr(x y)" for 2701 times or is there a smarter solution to my problem?

        Thanks a lot in advance!

        Best regards,
        Christina

        Comment


        • #5
          No; you don't. Let's illustrate with a token script.

          Code:
          sysuse auto, clear 
          
          local varlist headroom trunk length displacement weight 
          
          local nvars : word count `varlist' 
          
          local N = `nvars' * (`nvars' - 1) / 2 
          
          if `N' > _N set obs `N' 
          
          gen x = "" 
          gen y = "" 
          gen r = . 
          local k = 1 
          tokenize "`varlist'" 
          
          forval i = 1/`nvars' { 
              local J = `i' + 1 
              forval j = `J'/`nvars' { 
                  quietly {
                      corr ``i'' ``j'' 
                      replace x = "``i''" in `k' 
                      replace y = "``j''" in `k' 
                      replace r = r(rho) in `k' 
                  }
                  local ++k 
              }
          }
          
          list x y r if !missing(x) 
          
          
               +----------------------------------------+
               |            x              y          r |
               |----------------------------------------|
            1. |     headroom          trunk   .6620111 |
            2. |     headroom         length   .5162955 |
            3. |     headroom   displacement   .4744915 |
            4. |     headroom         weight   .4834558 |
            5. |        trunk         length   .7265956 |
               |----------------------------------------|
            6. |        trunk   displacement   .6086351 |
            7. |        trunk         weight   .6722057 |
            8. |       length   displacement     .83514 |
            9. |       length         weight   .9460086 |
           10. | displacement         weight   .8948958 |
               +----------------------------------------+

          Comment


          • #6
            Hi Nick,

            first of all, thanks a lot for your code, that really brings me much closer to what I want. Now I have two subsequent questions.


            1) Is there a possibility to tell Stata to just use all variables of the dataset in code line "local varlist headroom trunk length displacement weight" or do I need to list every single variable in that line (wouldn't be that bad but maybe a bit prone to errors)? I tried it with var1 - var74 but somehow that didn't work.


            2) Is there a possibility to reshape the resulting table in that way, that the variable name ist just the combination of the names of x and y and the first row consists of the r values? So that it looks like this:

            headroom_trunk headroom_length headroom_displacement
            1 0.6620111 0.5162955 0.4744915

            and so on.
            I tried several solutions with the commands -reshape- , -xpose- and -sxpose- but somehow I don't get what I want here.


            Thank you so much!

            Best regards,
            Christina

            Comment


            • #7
              Sorry for the bad illustration of the example in 2). Actually it should look like this:

              Click image for larger version

Name:	Unbenannt.PNG
Views:	1
Size:	3.9 KB
ID:	1509617

              Comment


              • #8
                In #4 you asked for (a) 1 new variable (b) 2701 new variables. That was contradictory but (a) made more sense to me.

                What you now ask for is still puzzling as a request as it is of no obvious use for any subsequent analysis. If you explain what you are going to do with it someone might be minded to suggest some code.

                This will no doubt seem harsh or unhelpful but my charity on Statalist does not go beyond questions that seem to me interesting or useful.

                Comment


                • #9
                  Hi Nick,

                  ok, sorry for that confusion. You're right, maybe it is very confusing if one doesn't know the overall context of the question. So I try to shortly explain that. I want to replicate the coinsurance/diversification measure by Hann, Ogneva and Ozbas (2013): "Corporate Diversification and the Cost of Capital" (Journal of Finance, Vol. 68 (p. 1969-1970)), Link: https://onlinelibrary.wiley.com/doi/10.1111/jofi.12067 .

                  I got idiosyncratic industry cash flows per 2-digit SIC Code industry (as residuals from a regression, that is why the variable names are res_SIC) for 27 years. These are illustrated in this data example with idiosyncratic cash flows (residuals) for 8 industries (in my complete dataset I have 74 industries):


                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input int year float(res_10 res_12 res_13 res_14 res_15 res_16 res_17 res_20)
                  1990  .033254102   -.04792337   .05700066   -.09138067    .02794227 -.00026059974  .0039374474 .00008333124
                  1991  -.01919841   -.04014556 -.008848097   -.04181518  -.008211095    .032521505    .08611627    .02692385
                  1992   .02821145  -.006532365  -.04394998 -.0080966065  -.022054717     .03218632     .0575919   -.03121326
                  1993 .0038291214   -.02539229 -.013553325   -.14529903  -.007691163   -.007535936    .07557896    .02676738
                  1994   .07251374  -.032627285   .01114261   -.04820225    .05871338   -.018943405  -.007243626   .010144584
                  1995   .04367205   -.04133857   .03054319    .02854056 -.0015061008  -.0045481613   -.01230913 -.0005895875
                  1996  .003749058    .06145516   .04461166    .07190183   .035624944    -.01982356   .009847625   -.03091387
                  1997  -.07819264   .001658237   .01981391    .07108616   -.00518429    -.02092511 -.0014665262  -.003491783
                  1998  -.04154119     .0699422 -.010625497   -.02252452    .05107028    .021999307   .005278377  .0042764693
                  1999  .011160394 -.0038090604  .065049924   -.02648608    .06857794   .0017683457  .0019574217  -.013661997
                  2000   .02061287    .02761672   .04835914   -.06898598   .003824244     .01472899   -.03069713  .0008440256
                  2001  .005321359 -.0017550978  .067692034    .04415399   .017593222  -.0039787963   -.04837864      .005605
                  2002   .04406443   .029889535  .023144836     .0215849   .036073036   .0021540255  -.029622594 -.0027596336
                  2003  -.03246744   -.02577336   .03097895   .022476805   .035536606   -.031978518 -.0015668035  -.007414372
                  2004 .0033156725     .0757577   .05927413   .018581428    .05933197   -.031759653  -.015944071   .007566282
                  2005 -.031742893  -.005693389   .07974613   .009071746    .11699577   -.009109915    .02401482   .013660272
                  2006  -.03639871  -.005954036   .05664376    .07321639   -.05554803    .003499886   .005934403  -.002626764
                  2007   .03727423    .14695072   .09391564    .07616765   -.11068815    -.01322465  -.013029888  -.003010965
                  2008  -.10562408    .04227998 -.010594637    .10868854   -.28537837     .02062974   .011361933   -.02128578
                  2009  .025218315    .08033428  -.04314794    .04224764  -.013182837     .01676502  -.009288081   -.00630211
                  2010   .01658875    .06149259  -.02837122   -.03007945 .00005215796     .02538311   -.04897374  -.008060934
                  2011   .03678665    .08610797   .01109371   .004797227   .001603199   -.015716467   -.05856934  -.014376557
                  2012   .04163102   .002470567  -.04690592    .09402638   .020188507   .0083638625   -.04846392 -.0015014635
                  2013  -.06729205 -.0013210756  -.01913452  -.018016137   .017436218   -.027384736   -.07219469   .020294074
                  2014  -.04172251   -.13514738 -.013596268   .005979343   .024419414    .010108522    .05206577   .011064536
                  2015  -.05911325   -.19696306  -.29779744  -.035499208   -.04279067   -.006380766   .000571963    .01064037
                  2016   .04435107   -.04277896   -.1709661    -.0870742  -.017588003    .018067393    .03630019   .007348059
                  2017   .04173888    -.0728008  .008480637   -.06906129  -.005159727   .0033942484   .027191086  .0019908443
                  end

                  Now, I need the pairwise correlation of the idiosyncratic industry cash flows of every possible industry combination on a 10-year rolling window basis. That is, e.g. for 2018: for industries 10 and 12 I get the correlation calculated over their according idiosyncratic cash flows over the years 2008-2017, for 2017 the correlation over 2007-2016 and so on. In the end, I want to obtain a dataset that looks like this in order to merge it with my actual dataset:

                  Click image for larger version

Name:	Corr.PNG
Views:	1
Size:	8.6 KB
ID:	1509864



                  I really hope that this explanation is comprehensible and that someone might help me.

                  Thank you a lot in advance.

                  Best regards,
                  Christina

                  Comment


                  • #10
                    Dear Statalist members,

                    according to the FAQ "polite bumping" is allowed after around 12 hours of no answer.

                    Since I posted the extended version of my question with explanation around two days ago, I now feel free to ask again if someone please might help me with my problem?

                    Thanks a lot for every kind of help.

                    Best regards,
                    Christina

                    Comment


                    • #11
                      Hi all, first of all thanks for this thread.

                      I have a montly exchange rate dataset for all countries. I have adapted Nick's code (reply to Christina's query) to generate correlation coefficients for all possible combinations of countries in terms of exchange rates, and it worked perfectly well. However, I wanted to limit the computation to generate cross country correlation only between Asia (total of 40 countries) and all countries (total of 191 countries) in the world. As such I did some tweaking, but unfortunately, the results are not what I desired. Instead, the x variable reports only 40 countries which are not necessarily from Asia. I can't seem to understand where I have gone wrong and I was wondering if it's before of my forval loop?

                      Thanks in advance for the help!

                      Code:
                      forval y = 2006/2019  {  
                          use er`y', clear
                          quietly describe, varlist
                          local vars `r(varlist)'  // provides a list of all countries in the world
                          unab omit:  year year1 year_act
                          local allcountry: list vars - omit
                          
                          local asia AFG ARM AUS AZE BGD BTN KHM CHN FJI GEO HKG IND IDN JPN KAZ KIR KGZ LAO MYS MDV FSM MNG MMR NPL NZL PAK PNG PHL WSM SGP SLB KOR LKA TWN TJK THA TON UZB VUT VNM
                              * local asia used to tease out only countries in asia
                          
                          local nallcountry : word count `allcountry'
                          local nasia : word count `asia'
                          local N = `nasia' * `nallcountry'  
                          if `N' > _N set obs `N'
                      
                          gen x = ""
                          gen y = ""
                          gen r = .
                          local k = 1  //place holder
                          tokenize "`asia'"
                          tokenize "`allcountry'"
                      
                          forval i = 1/`nasia' {  
                              forval j = 1/`nallcountry' {
                                  quietly {
                                  capture corr ``i'' ``j''
                                  replace x = "``i''" in `k'
                                  replace y = "``j''" in `k'
                                  replace r = r(rho) in `k'
                                      }
                              local ++k  //increment to the next macro
                                  }
                          }
                              keep x y r
                              gen remove = (x==y)
                              keep if remove != 1
                              
                      *        collapse (mean) r, by (x)
                              export excel x r using "er_outputasia.xlsx", sheet(`y') sheetmodify
                      }
                      Last edited by Lovely Tolin; 24 Jun 2020, 00:22.

                      Comment

                      Working...
                      X