Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • correlations between and within units

    Hi,

    I'm reading a paper that has used panel data and authors present a correlation table in which correlations below the diagonal are overall correlations between units and correlations above the diagonal represent the average within-unit correlations. I was wondering how I can create such a table using panel data below between y, x1, x2.

    I really appreciate your help with it.






    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(unit_id yearmonth y x1 x2)
    3 648   -1.673604  -.7442273 -1.0363818
    3 649    .4038841  -.8535328 -1.0363818
    3 650     .562588 -1.1877229 -1.0363818
    3 651    1.291389  -1.146664 -1.0363818
    3 652    .8052794 -1.1743647 -1.0363818
    3 653  -.03675416  -.7409243 -1.0363818
    3 654   1.0632977  -.8755629 -1.0363818
    3 655   -.5068667  -.8839995 -1.0363818
    3 656    .5646673  -1.376372 -1.0363818
    3 657     .934234  -.8519906 -1.0363818
    3 658   .32183555 -.59005296   .7075664
    4 648   -2.053771  -.8607342 -1.0363818
    4 649  -.28470853  -.8418642 -1.0363818
    4 650    .1102932   -.457932 -1.0363818
    4 651    .5085503  -.2997211 -1.0363818
    4 652    .4732249 -.05104671 -1.0363818
    4 653   -.3396801  -.3438883 -1.0363818
    4 654   -.8023075 -.59110814 -1.0363818
    4 655     .524676  -.6341597 -1.0363818
    4 656   -.0806455   -.442387 -1.0363818
    4 657    .7601319  -.9101606 -1.0363818
    4 658  -.29708236  -.7848284   .7053064
    5 648  -.09884956 -2.3671591 -1.0363818
    5 649    .3637376 -2.0098896 -1.0363818
    5 650    .5272369 -2.1749105 -1.0363818
    5 651    .4010301 -2.4400935 -1.0363818
    5 652   .03462299 -1.6249136 -1.0363818
    5 653   -.8958375  -1.634501 -1.0363818
    5 654   -.8420179  -1.736356 -1.0363818
    5 655   -1.972452 -2.0890148 -1.0363818
    5 656   -.8649931 -1.9769305 -1.0363818
    5 657  -.13952091 -1.4404632 -1.0363818
    5 658 -.035976883 -.23603535   .8973646
    6 648    .6677211  .25581893 -1.0363818
    6 649     .997246   .4341331 -1.0363818
    6 650    .6545075  .20891613 -1.0363818
    6 651    .8990768  .20612007 -1.0363818
    6 652    .4727888   .3011284 -1.0363818
    6 653   -.5000833   .2332805 -1.0363818
    6 654   1.8032534  .33466095 -1.0363818
    6 655   -.9693494  .03520535 -1.0363818
    6 656   .05690836 -.14496662 -1.0363818
    6 657     .463342   .1866461 -1.0363818
    6 658   .16892995   .3850788   .6388407
    7 648    .7216926   .3505641 -1.0363818
    7 649    .8041119  .12411392 -1.0363818
    7 650    .7418164  -.1596837 -1.0363818
    7 651    .8701565 -.15722057 -1.0363818
    7 652    .7349644 .025661064 -1.0363818
    7 653   .12824593  .10673832 -1.0363818
    7 654   .58063346  .29550174 -1.0363818
    7 655   .10464822   .3558027 -1.0363818
    7 656    .4740608   .3864914 -1.0363818
    7 657    .4891589   .3713816 -1.0363818
    7 658   .21748453   1.371178   1.101524
    end
    format %tm yearmonth

  • #2
    Code:
    //    GET OVERALL CORRELATIONS
    corr y x1 x2
    matrix result = r(C)
    
    //    NOW DO WITHIN-ID CORRELATIONS
    foreach v of varlist y x1 x2 {
        by unit_id, sort: egen `v'_mean = mean(`v')
        gen `v'_within_id = `v' - `v'_mean
    }
    corr y_within_id x1_within_id x2_within_id
    matrix within = r(C)
    
    forvalues i = 1/3 {
        forvalues j = `=`i'+1'/3 {
            matrix result[`i', `j'] = within[`j', `i']
        }
    }
    
    matrix list result

    Comment


    • #3
      Thanks so much, Clyde. Just a quick question to make sure I understand it correctly. I got the within part, but for the between units correlation, should I only find the overall correlation between variables or should I find the correlation between the mean values of the variables in groups?
      Last edited by Monica Muller; 02 Feb 2018, 15:35.

      Comment


      • #4
        I understood you to want within-unit correlations above the diagonal, and overall correlations below. That's what the code does. It does not calculate between-unit correlations: I don't see that you asked for that anywhere. If you want between unit correlations:

        Code:
        egen flag = tag(unit_id)
        corr y_mean x1_mean x2_mean if flag
        and you can copy those into the lower part of the result matrix using code similar to what I showed in #2, but reversing the roles of i and j.

        As for which you should be interested in, overall or between, I have no idea. If you're trying to replicate a paper, then you have to look at the paper and figure out which one the authors did. If the paper is unclear, then you should contact one of its authors.

        Comment


        • #5
          Thank you very much. It's super helpful as always

          Comment


          • #6
            Hi,

            Just to add to this discussion: A reference page on calculating within-unit (group) correlations using R explains that there are two ways to do it. One way is the way shown above (within-group-center all variables and then calculate the correlation coefficient). The other way is to calculate the correlation coefficient for each group separately, and then calculate a weighted average, weighted by the size of each group. I tested it with some multilevel data I have (individuals nested within provinces), and the results differ if the group sizes are unequal. Here is the source:
            https://personality-project.org/r/ps...p/statsBy.html

            Comment

            Working...
            X