Inputting multiple 2x2 matrices into a correlation matrix

Robert Shaw

Join Date: Nov 2021

Posts: 37
#1

Inputting multiple 2x2 matrices into a correlation matrix

30 Jun 2023, 10:35

Background:
I have several binary variables. I am working out the correlation coefficients between a number of them. As the data is binary I am using Kramer's V as my correlation coefficient, which, handily, in this case, gives the same numeric value as if I were to use Pearson's correlation coefficient, thus allowing me to use pwcorr to produce a correlation matrix. which I can turn into a heatmap using the command heatplot (ssc install heatplot), overlaying the correlation coefficient and p-value.

Question: There is a risk that a high correlation coefficient can be driven by most of the results falling into one cell of the 2x2 matrix. The only way I can think of to give the reader confidence that the result is not skewed in this way is also to show the 2x2 frequency table for each cell in the correlation matrix heatplot. Rather than presenting a long list of 2x2 frequency matrices, it would be easier for the reader to have these 2x2 frequency tables presented in an identically laid out correlation matrix, which, instead of containing correlation coefficients and p-values, each would contain the relevant 2x2 frequency matrix. I am at a complete loss as to how it might be possible to do this - any advice would be hugely appreciated.

Kind regards
Robert Shaw
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10296

01 Jul 2023, 14:22

If you are using heatplot from SSC, you can enter these as marker labels of an added scatter plot or using several -text()- options. I show the latter, which you could write a program that loops and puts these values into locals (which I do not do).

Code:

clear
set obs 200
set seed 07012023
forval i=1/4{
    gen var`i'= rnormal()<0.`i'*2
}
quietly corr var1-var4
matrix W = r(C)
tab var1 var2
tab var1 var4
local offset 0.1
heatplot W,  color(hcl diverging, intensity(.6)) legend(off) aspectratio(1) lower drop(0) ///
text(`=2-`offset'' 1 "25 59`=char(13)'`=char(10)'42 74", size(medsmall)) ///
text(`=4-`offset'' 1 "23 61`=char(13)'`=char(10)'20 96", size(medsmall)) ///
plotregion(color(white) margin(zero)) xlab(, noticks) ylab(, noticks)

Code:

. tab var1 var2

           |         var2
      var1 |         0          1 |     Total
-----------+----------------------+----------
         0 |        25         59 |        84
         1 |        42         74 |       116
-----------+----------------------+----------
     Total |        67        133 |       200

. tab var1 var4

           |         var4
      var1 |         0          1 |     Total
-----------+----------------------+----------
         0 |        23         61 |        84
         1 |        20         96 |       116
-----------+----------------------+----------
     Total |        43        157 |       200

Click image for larger version

Name: Graph.png
Views: 1
Size: 28.2 KB
ID: 1719075

Last edited by Andrew Musau; 01 Jul 2023, 14:37.

Comment

Robert Shaw

Join Date: Nov 2021

Posts: 37
#3

01 Jul 2023, 15:29

Dear Andrew,
Once again you have provided invaluable advice. This appears to be exactly what I want. I will work on this to get the exact result that I want and post back here. I should also mention that the advice you had previously given another use https://www.statalist.org/forums/for...about-heatplot on overlaying p values and correlation coefficients was supremely useful.
Thanks
Rob
Comment

Robert Shaw

Join Date: Nov 2021
Posts: 37

01 Jul 2023, 17:21

Dear Andrew,
I have been gratefully working off your code to achieve what I want to achieve and am part way there, but am running into some errors:
Dataset:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(TrendB_D0_D56_1_2_2 TrendB_D0_D56_1_2_1 TrendB_D0_D56_1_1_2 TrendB_D0_D56_1_1_1 TrendB_D0_D56_3_2_2 TrendB_D0_D56_3_2_1 TrendB_D0_D56_3_1_2 TrendB_D0_D56_3_1_1)
1 1 1 1 1 0 1 1
0 1 1 1 0 1 1 1
1 1 1 1 0 1 1 1
0 1 1 1 0 1 0 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
0 1 1 1 0 0 1 1
1 1 1 1 0 1 0 0
1 1 1 1 1 1 0 0
1 1 1 1 1 1 1 1
0 1 1 1 0 1 0 1
0 1 1 1 0 1 0 0
1 1 1 1 1 1 0 1
1 1 0 1 1 1 0 0
1 1 1 1 1 1 1 1
1 1 1 1 1 1 0 1
0 1 1 1 0 0 0 1
1 1 1 1 1 1 1 1
0 1 0 1 0 1 0 0
0 1 1 1 0 1 0 0
1 1 1 1 0 1 0 0
0 1 1 1 0 0 0 0
1 1 1 1 1 1 0 0
0 1 1 1 0 1 0 0
0 1 1 1 0 1 0 1
1 1 1 1 1 1 1 1
0 0 1 1 0 0 0 1
0 1 1 1 0 0 0 1
1 1 1 1 1 1 1 0
1 1 1 1 0 1 1 1
1 1 1 1 1 1 0 0
1 1 1 1 1 1 0 1
0 1 1 1 0 1 0 0
1 1 1 1 1 1 1 1
0 1 1 1 0 0 1 1
1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1
0 1 1 1 0 1 1 1
1 1 1 1 1 1 1 1
0 1 1 1 0 1 0 1
1 1 1 1 1 1 0 1
1 1 1 1 1 1 0 1
1 1 1 1 1 1 1 1
1 1 1 1 0 1 0 0
0 1 1 1 0 1 1 1
1 1 1 1 0 0 1 1
1 1 1 1 1 1 1 1
0 0 0 1 0 0 0 0
0 1 1 1 0 1 0 0
1 1 1 1 0 1 0 0
0 1 1 1 0 1 1 1
0 1 1 1 1 1 0 1
1 1 1 1 0 1 0 1
1 1 1 1 1 1 0 1
0 1 1 1 0 1 0 0
0 1 1 1 0 0 0 1
0 1 0 1 0 0 0 0
0 1 1 1 0 1 0 1
1 1 1 1 1 1 1 1
0 1 1 1 0 0 1 0
0 1 1 1 0 0 1 1
0 1 1 1 0 0 1 1
1 0 1 1 0 0 1 1
1 1 1 1 1 1 1 0
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 0
1 1 1 1 1 1 1 1
0 1 1 1 0 1 1 1
0 1 1 1 1 1 1 0
1 1 1 1 1 1 0 0
0 1 1 1 0 1 0 1
0 1 1 1 0 1 0 0
1 1 1 1 0 1 0 1
1 1 1 1 0 1 1 0
1 1 1 1 1 1 1 1
0 1 1 1 0 1 0 1
1 1 1 1 1 1 1 1
1 1 1 1 0 1 0 0
1 1 1 1 1 1 0 0
1 1 1 1 1 1 0 1
0 1 1 1 0 0 0 1
1 1 1 1 1 1 0 0
1 1 1 1 1 1 0 0
1 1 1 1 1 1 1 1
0 1 1 1 0 1 1 1
1 1 1 1 1 1 1 1
end

I am using the following code to achieve multiple different heat plots (for simplicity, the first loop has been reduced to a single term):

Code:

capture program drop _2rowmat
program define _2rowmat, rclass

if r(r)==1 & k[1,1]==0{
    mat cellcounts= cellcounts\ [0,0]
    mat rowname cellcounts= r1 r2
}
if r(r)==1 & k[1,1]==1{
    mat cellcounts= [0,0]\cellcounts
    mat rowname cellcounts= r1 r2
}
end

local counter=1
local a `" "TrendB_D0_D56" "TrendB_D56_D84" "TrendB_D84_D112" "TrendB_D112_D182" "'
foreach d of local a {
    qui corr `d'_1_2_2 `d'_1_2_1 `d'_1_1_2 `d'_1_1_1 `d'_3_2_2 `d'_3_2_1 `d'_3_1_2 `d'_3_1_1
    matrix W = r(C)
    local b `" "_1_2_2" "_1_2_1" "_1_1_2" "_1_1_1" "_3_2_2" "_3_2_1" "_3_1_2" "_3_1_1" "'
    local c `" "_1_2_2" "_1_2_1" "_1_1_2" "_1_1_1" "_3_2_2" "_3_2_1" "_3_1_2" "_3_1_1" "'
    foreach e of local b {
        local c: list c - e
        foreach f of local c {
            tab `d'`e' `d'`f' if "`e'"!="`f'" , matcell(cellcounts) matrow(k)
            _2rowmat
            local a=cellcounts[1,1]
            local b=cellcounts[1,2]
            local a`counter'= "`a'" + " " + "`b'"
            local a=cellcounts[2,1]
            local b=cellcounts[2,2]
            local b`counter'= "`a'" + " " + "`b'"
            di `"`a`counter''"'
            di `"`b`counter''"'
            local counter=`counter'+1
        }
    }
    local offset 0.1
    heatplot W,  colors(blue white red, ipolate(110)) cuts(-1.02(0.02)1.02) lower nodiagonal aspectratio(1) legend(off) ///
    text(`=2-`offset'' 1 "`a1'`=char(13)'`=char(10)'`b1'", size(2)) ///
    text(`=3-`offset'' 1 "`a2'`=char(13)'`=char(10)'`b2'", size(2)) ///
    text(`=4-`offset'' 1 "`a3'`=char(13)'`=char(10)'`b3'", size(2)) ///
    text(`=5-`offset'' 1 "`a4'`=char(13)'`=char(10)'`b4'", size(2)) ///
    text(`=6-`offset'' 1 "`a5'`=char(13)'`=char(10)'`b5'", size(2)) ///
    text(`=7-`offset'' 1 "`a6'`=char(13)'`=char(10)'`b6'", size(2)) ///
    text(`=8-`offset'' 1 "`a7'`=char(13)'`=char(10)'`b7'", size(2)) ///
    text(`=3-`offset'' 2 "`a8'`=char(13)'`=char(10)'`b8'", size(2)) ///
    text(`=4-`offset'' 2 "`a9'`=char(13)'`=char(10)'`b9'", size(2)) ///
    text(`=5-`offset'' 2 "`a10'`=char(13)'`=char(10)'`b10'", size(2)) ///
    text(`=6-`offset'' 2 "`a11'`=char(13)'`=char(10)'`b11'", size(2)) ///
    text(`=7-`offset'' 2 "`a12'`=char(13)'`=char(10)'`b12'", size(2)) ///
    text(`=8-`offset'' 2 "`a13'`=char(13)'`=char(10)'`b13'", size(2)) ///
    text(`=4-`offset'' 3 "`a14'`=char(13)'`=char(10)'`b14'", size(2)) ///
    text(`=5-`offset'' 3 "`a15'`=char(13)'`=char(10)'`b15'", size(2)) ///
    text(`=6-`offset'' 3 "`a16'`=char(13)'`=char(10)'`b16'", size(2)) ///
    text(`=7-`offset'' 3 "`a17'`=char(13)'`=char(10)'`b17'", size(2)) ///
    text(`=8-`offset'' 3 "`a18'`=char(13)'`=char(10)'`b18'", size(2)) ///
    text(`=5-`offset'' 4 "`a19'`=char(13)'`=char(10)'`b19'", size(2)) ///
    text(`=6-`offset'' 4 "`a20'`=char(13)'`=char(10)'`b20'", size(2)) ///
    text(`=7-`offset'' 4 "`a21'`=char(13)'`=char(10)'`b21'", size(2)) ///
    text(`=8-`offset'' 4 "`a22'`=char(13)'`=char(10)'`b22'", size(2)) ///
    text(`=6-`offset'' 5 "`a23'`=char(13)'`=char(10)'`b23'", size(2)) ///
    text(`=7-`offset'' 5 "`a24'`=char(13)'`=char(10)'`b24'", size(2)) ///
    text(`=8-`offset'' 5 "`a25'`=char(13)'`=char(10)'`b25'", size(2)) ///
    text(`=7-`offset'' 6 "`a26'`=char(13)'`=char(10)'`b26'", size(2)) ///
    text(`=8-`offset'' 6 "`a27'`=char(13)'`=char(10)'`b27'", size(2)) ///
    text(`=8-`offset'' 7 "`a28'`=char(13)'`=char(10)'`b28'", size(2)) ///
    plotregion(color(white) margin(zero)) xlab(, noticks) ylab(, noticks)  
}

The problems I face are that
1) only the first column's values get displayed (if I change the "1" to a 2, the value is displayed in the appropriate column, but this is not the case of subsequent lines of data)
2) the second line of values (ie the bottom half of the 2x2 matrix are not being displayed
3) whatever problems there are stops the code without a clear error being thrown up by Stata, as the outermost loop does not cycle through "TrendB_D56_D84" "TrendB_D84_D112" "TrendB_D112_D182"

Kind regards
Robert Shaw

Last edited by sladmin; 06 Jul 2023, 10:02. Reason: Remove sensitive data.

Comment

Bruce Weaver

Join Date: May 2014

Posts: 1143
#5

02 Jul 2023, 05:57

Robert Shaw, since reading your first post about this, I've been wondering if there is some way to superimpose a 2x2 clustered bar chart of the cell counts instead of the 4 cell counts as text. I think the information you are trying to convey would pop out much more quickly if there was. Given that the point (I think) is to display imbalance in the cell counts, you could suppress the Y-axis, and perhaps show the correlation (i.e., the phi-coefficient) in the title. Something like this:

Code:

clear sysuse auto generate byte r78 = rep78 > 3 pwcorr foreign r78 local r = string(r(rho), "%04.3f") *local r = r(rho) graph bar (count), over(foreign, gap(2)) over(r78, gap(30)) /// title("r{sub:{&phi}} = `r'", size(vhuge)) yscale(of * https://www.stata.com/bookstore/pdf/g_text.pdf

I currently have no idea how easy (or not) it would be to superimpose bar charts like this onto a heatmap graph.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Robert Shaw

Join Date: Nov 2021

Posts: 37
#6

02 Jul 2023, 15:22

Thanks for your suggestion Bruce - It's a nice way of showing it, but I think I will stick with the 2x2 table for the moment though, not least because of programming complexity.

Andrew, I have played further with my code to ascertain what is going on and have solved the issues:

To solve the issue with the lack of second line appearing, I used the following notation "line 1" "line 2" to get a line break rather than inserting `char(10)' and `char(13)' which I gather are line break unicode in linux and windows.
The issue of further columns not appearing (and the code stopping) was one of supreme naivity - I used the same letter twice for locals and confused everything.

Last edited by Robert Shaw; 02 Jul 2023, 15:32.
Comment

Announcement