Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inputting multiple 2x2 matrices into a correlation matrix

    Background:
    I have several binary variables. I am working out the correlation coefficients between a number of them. As the data is binary I am using Kramer's V as my correlation coefficient, which, handily, in this case, gives the same numeric value as if I were to use Pearson's correlation coefficient, thus allowing me to use pwcorr to produce a correlation matrix. which I can turn into a heatmap using the command heatplot (ssc install heatplot), overlaying the correlation coefficient and p-value.

    Question: There is a risk that a high correlation coefficient can be driven by most of the results falling into one cell of the 2x2 matrix. The only way I can think of to give the reader confidence that the result is not skewed in this way is also to show the 2x2 frequency table for each cell in the correlation matrix heatplot. Rather than presenting a long list of 2x2 frequency matrices, it would be easier for the reader to have these 2x2 frequency tables presented in an identically laid out correlation matrix, which, instead of containing correlation coefficients and p-values, each would contain the relevant 2x2 frequency matrix. I am at a complete loss as to how it might be possible to do this - any advice would be hugely appreciated.

    Kind regards
    Robert Shaw

  • #2
    If you are using heatplot from SSC, you can enter these as marker labels of an added scatter plot or using several -text()- options. I show the latter, which you could write a program that loops and puts these values into locals (which I do not do).

    Code:
    clear
    set obs 200
    set seed 07012023
    forval i=1/4{
        gen var`i'= rnormal()<0.`i'*2
    }
    quietly corr var1-var4
    matrix W = r(C)
    tab var1 var2
    tab var1 var4
    local offset 0.1
    heatplot W,  color(hcl diverging, intensity(.6)) legend(off) aspectratio(1) lower drop(0) ///
    text(`=2-`offset'' 1 "25 59`=char(13)'`=char(10)'42 74", size(medsmall)) ///
    text(`=4-`offset'' 1 "23 61`=char(13)'`=char(10)'20 96", size(medsmall)) ///
    plotregion(color(white) margin(zero)) xlab(, noticks) ylab(, noticks)


    Code:
    . tab var1 var2
    
               |         var2
          var1 |         0          1 |     Total
    -----------+----------------------+----------
             0 |        25         59 |        84
             1 |        42         74 |       116
    -----------+----------------------+----------
         Total |        67        133 |       200
    
    . tab var1 var4
    
               |         var4
          var1 |         0          1 |     Total
    -----------+----------------------+----------
             0 |        23         61 |        84
             1 |        20         96 |       116
    -----------+----------------------+----------
         Total |        43        157 |       200
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	28.2 KB
ID:	1719075

    Last edited by Andrew Musau; 01 Jul 2023, 14:37.

    Comment


    • #3
      Dear Andrew,
      Once again you have provided invaluable advice. This appears to be exactly what I want. I will work on this to get the exact result that I want and post back here. I should also mention that the advice you had previously given another use https://www.statalist.org/forums/for...about-heatplot on overlaying p values and correlation coefficients was supremely useful.
      Thanks
      Rob

      Comment


      • #4
        Dear Andrew,
        I have been gratefully working off your code to achieve what I want to achieve and am part way there, but am running into some errors:
        Dataset:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(TrendB_D0_D56_1_2_2 TrendB_D0_D56_1_2_1 TrendB_D0_D56_1_1_2 TrendB_D0_D56_1_1_1 TrendB_D0_D56_3_2_2 TrendB_D0_D56_3_2_1 TrendB_D0_D56_3_1_2 TrendB_D0_D56_3_1_1)
        1 1 1 1 1 0 1 1
        0 1 1 1 0 1 1 1
        1 1 1 1 0 1 1 1
        0 1 1 1 0 1 0 1
        1 1 1 1 1 1 1 1
        1 1 1 1 1 1 1 1
        0 1 1 1 0 0 1 1
        1 1 1 1 0 1 0 0
        1 1 1 1 1 1 0 0
        1 1 1 1 1 1 1 1
        0 1 1 1 0 1 0 1
        0 1 1 1 0 1 0 0
        1 1 1 1 1 1 0 1
        1 1 0 1 1 1 0 0
        1 1 1 1 1 1 1 1
        1 1 1 1 1 1 0 1
        0 1 1 1 0 0 0 1
        1 1 1 1 1 1 1 1
        0 1 0 1 0 1 0 0
        0 1 1 1 0 1 0 0
        1 1 1 1 0 1 0 0
        0 1 1 1 0 0 0 0
        1 1 1 1 1 1 0 0
        0 1 1 1 0 1 0 0
        0 1 1 1 0 1 0 1
        1 1 1 1 1 1 1 1
        0 0 1 1 0 0 0 1
        0 1 1 1 0 0 0 1
        1 1 1 1 1 1 1 0
        1 1 1 1 0 1 1 1
        1 1 1 1 1 1 0 0
        1 1 1 1 1 1 0 1
        0 1 1 1 0 1 0 0
        1 1 1 1 1 1 1 1
        0 1 1 1 0 0 1 1
        1 1 1 1 1 1 1 1
        0 0 1 1 1 1 1 1
        0 1 1 1 0 1 1 1
        1 1 1 1 1 1 1 1
        0 1 1 1 0 1 0 1
        1 1 1 1 1 1 0 1
        1 1 1 1 1 1 0 1
        1 1 1 1 1 1 1 1
        1 1 1 1 0 1 0 0
        0 1 1 1 0 1 1 1
        1 1 1 1 0 0 1 1
        1 1 1 1 1 1 1 1
        0 0 0 1 0 0 0 0
        0 1 1 1 0 1 0 0
        1 1 1 1 0 1 0 0
        0 1 1 1 0 1 1 1
        0 1 1 1 1 1 0 1
        1 1 1 1 0 1 0 1
        1 1 1 1 1 1 0 1
        0 1 1 1 0 1 0 0
        0 1 1 1 0 0 0 1
        0 1 0 1 0 0 0 0
        0 1 1 1 0 1 0 1
        1 1 1 1 1 1 1 1
        0 1 1 1 0 0 1 0
        0 1 1 1 0 0 1 1
        0 1 1 1 0 0 1 1
        1 0 1 1 0 0 1 1
        1 1 1 1 1 1 1 0
        1 1 1 1 1 1 1 1
        1 1 1 1 1 1 1 0
        1 1 1 1 1 1 1 1
        0 1 1 1 0 1 1 1
        0 1 1 1 1 1 1 0
        1 1 1 1 1 1 0 0
        0 1 1 1 0 1 0 1
        0 1 1 1 0 1 0 0
        1 1 1 1 0 1 0 1
        1 1 1 1 0 1 1 0
        1 1 1 1 1 1 1 1
        0 1 1 1 0 1 0 1
        1 1 1 1 1 1 1 1
        1 1 1 1 0 1 0 0
        1 1 1 1 1 1 0 0
        1 1 1 1 1 1 0 1
        0 1 1 1 0 0 0 1
        1 1 1 1 1 1 0 0
        1 1 1 1 1 1 0 0
        1 1 1 1 1 1 1 1
        0 1 1 1 0 1 1 1
        1 1 1 1 1 1 1 1
        end
        I am using the following code to achieve multiple different heat plots (for simplicity, the first loop has been reduced to a single term):
        Code:
        capture program drop _2rowmat
        program define _2rowmat, rclass
        
        if r(r)==1 & k[1,1]==0{
            mat cellcounts= cellcounts\ [0,0]
            mat rowname cellcounts= r1 r2
        }
        if r(r)==1 & k[1,1]==1{
            mat cellcounts= [0,0]\cellcounts
            mat rowname cellcounts= r1 r2
        }
        end
        
        local counter=1
        local a `" "TrendB_D0_D56" "TrendB_D56_D84" "TrendB_D84_D112" "TrendB_D112_D182" "'
        foreach d of local a {
            qui corr `d'_1_2_2 `d'_1_2_1 `d'_1_1_2 `d'_1_1_1 `d'_3_2_2 `d'_3_2_1 `d'_3_1_2 `d'_3_1_1
            matrix W = r(C)
            local b `" "_1_2_2" "_1_2_1" "_1_1_2" "_1_1_1" "_3_2_2" "_3_2_1" "_3_1_2" "_3_1_1" "'
            local c `" "_1_2_2" "_1_2_1" "_1_1_2" "_1_1_1" "_3_2_2" "_3_2_1" "_3_1_2" "_3_1_1" "'
            foreach e of local b {
                local c: list c - e
                foreach f of local c {
                    tab `d'`e' `d'`f' if "`e'"!="`f'" , matcell(cellcounts) matrow(k)
                    _2rowmat
                    local a=cellcounts[1,1]
                    local b=cellcounts[1,2]
                    local a`counter'= "`a'" + " " + "`b'"
                    local a=cellcounts[2,1]
                    local b=cellcounts[2,2]
                    local b`counter'= "`a'" + " " + "`b'"
                    di `"`a`counter''"'
                    di `"`b`counter''"'
                    local counter=`counter'+1
                }
            }
            local offset 0.1
            heatplot W,  colors(blue white red, ipolate(110)) cuts(-1.02(0.02)1.02) lower nodiagonal aspectratio(1) legend(off) ///
            text(`=2-`offset'' 1 "`a1'`=char(13)'`=char(10)'`b1'", size(2)) ///
            text(`=3-`offset'' 1 "`a2'`=char(13)'`=char(10)'`b2'", size(2)) ///
            text(`=4-`offset'' 1 "`a3'`=char(13)'`=char(10)'`b3'", size(2)) ///
            text(`=5-`offset'' 1 "`a4'`=char(13)'`=char(10)'`b4'", size(2)) ///
            text(`=6-`offset'' 1 "`a5'`=char(13)'`=char(10)'`b5'", size(2)) ///
            text(`=7-`offset'' 1 "`a6'`=char(13)'`=char(10)'`b6'", size(2)) ///
            text(`=8-`offset'' 1 "`a7'`=char(13)'`=char(10)'`b7'", size(2)) ///
            text(`=3-`offset'' 2 "`a8'`=char(13)'`=char(10)'`b8'", size(2)) ///
            text(`=4-`offset'' 2 "`a9'`=char(13)'`=char(10)'`b9'", size(2)) ///
            text(`=5-`offset'' 2 "`a10'`=char(13)'`=char(10)'`b10'", size(2)) ///
            text(`=6-`offset'' 2 "`a11'`=char(13)'`=char(10)'`b11'", size(2)) ///
            text(`=7-`offset'' 2 "`a12'`=char(13)'`=char(10)'`b12'", size(2)) ///
            text(`=8-`offset'' 2 "`a13'`=char(13)'`=char(10)'`b13'", size(2)) ///
            text(`=4-`offset'' 3 "`a14'`=char(13)'`=char(10)'`b14'", size(2)) ///
            text(`=5-`offset'' 3 "`a15'`=char(13)'`=char(10)'`b15'", size(2)) ///
            text(`=6-`offset'' 3 "`a16'`=char(13)'`=char(10)'`b16'", size(2)) ///
            text(`=7-`offset'' 3 "`a17'`=char(13)'`=char(10)'`b17'", size(2)) ///
            text(`=8-`offset'' 3 "`a18'`=char(13)'`=char(10)'`b18'", size(2)) ///
            text(`=5-`offset'' 4 "`a19'`=char(13)'`=char(10)'`b19'", size(2)) ///
            text(`=6-`offset'' 4 "`a20'`=char(13)'`=char(10)'`b20'", size(2)) ///
            text(`=7-`offset'' 4 "`a21'`=char(13)'`=char(10)'`b21'", size(2)) ///
            text(`=8-`offset'' 4 "`a22'`=char(13)'`=char(10)'`b22'", size(2)) ///
            text(`=6-`offset'' 5 "`a23'`=char(13)'`=char(10)'`b23'", size(2)) ///
            text(`=7-`offset'' 5 "`a24'`=char(13)'`=char(10)'`b24'", size(2)) ///
            text(`=8-`offset'' 5 "`a25'`=char(13)'`=char(10)'`b25'", size(2)) ///
            text(`=7-`offset'' 6 "`a26'`=char(13)'`=char(10)'`b26'", size(2)) ///
            text(`=8-`offset'' 6 "`a27'`=char(13)'`=char(10)'`b27'", size(2)) ///
            text(`=8-`offset'' 7 "`a28'`=char(13)'`=char(10)'`b28'", size(2)) ///
            plotregion(color(white) margin(zero)) xlab(, noticks) ylab(, noticks)  
        }
        The problems I face are that
        1) only the first column's values get displayed (if I change the "1" to a 2, the value is displayed in the appropriate column, but this is not the case of subsequent lines of data)
        2) the second line of values (ie the bottom half of the 2x2 matrix are not being displayed
        3) whatever problems there are stops the code without a clear error being thrown up by Stata, as the outermost loop does not cycle through "TrendB_D56_D84" "TrendB_D84_D112" "TrendB_D112_D182"

        Kind regards
        Robert Shaw
        Last edited by sladmin; 06 Jul 2023, 10:02. Reason: Remove sensitive data.

        Comment


        • #5
          Robert Shaw, since reading your first post about this, I've been wondering if there is some way to superimpose a 2x2 clustered bar chart of the cell counts instead of the 4 cell counts as text. I think the information you are trying to convey would pop out much more quickly if there was. Given that the point (I think) is to display imbalance in the cell counts, you could suppress the Y-axis, and perhaps show the correlation (i.e., the phi-coefficient) in the title. Something like this:

          Code:
          clear
          sysuse auto
          generate byte r78 = rep78 > 3
          pwcorr foreign r78
          local r = string(r(rho), "%04.3f")
          *local r = r(rho)
          graph bar (count), over(foreign, gap(2)) over(r78, gap(30)) ///
          title("r{sub:{&phi}} = `r'", size(vhuge)) yscale(of
          
          * https://www.stata.com/bookstore/pdf/g_text.pdf


          Click image for larger version

Name:	Shaw_barchart_phi.png
Views:	1
Size:	29.0 KB
ID:	1719107


          I currently have no idea how easy (or not) it would be to superimpose bar charts like this onto a heatmap graph.
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            Thanks for your suggestion Bruce - It's a nice way of showing it, but I think I will stick with the 2x2 table for the moment though, not least because of programming complexity.

            Andrew, I have played further with my code to ascertain what is going on and have solved the issues:

            To solve the issue with the lack of second line appearing, I used the following notation "line 1" "line 2" to get a line break rather than inserting `char(10)' and `char(13)' which I gather are line break unicode in linux and windows.
            The issue of further columns not appearing (and the code stopping) was one of supreme naivity - I used the same letter twice for locals and confused everything.
            Last edited by Robert Shaw; 02 Jul 2023, 15:32.

            Comment

            Working...
            X