Background:
I have several binary variables. I am working out the correlation coefficients between a number of them. As the data is binary I am using Kramer's V as my correlation coefficient, which, handily, in this case, gives the same numeric value as if I were to use Pearson's correlation coefficient, thus allowing me to use pwcorr to produce a correlation matrix. which I can turn into a heatmap using the command heatplot (ssc install heatplot), overlaying the correlation coefficient and p-value.
Question: There is a risk that a high correlation coefficient can be driven by most of the results falling into one cell of the 2x2 matrix. The only way I can think of to give the reader confidence that the result is not skewed in this way is also to show the 2x2 frequency table for each cell in the correlation matrix heatplot. Rather than presenting a long list of 2x2 frequency matrices, it would be easier for the reader to have these 2x2 frequency tables presented in an identically laid out correlation matrix, which, instead of containing correlation coefficients and p-values, each would contain the relevant 2x2 frequency matrix. I am at a complete loss as to how it might be possible to do this - any advice would be hugely appreciated.
Kind regards
Robert Shaw
I have several binary variables. I am working out the correlation coefficients between a number of them. As the data is binary I am using Kramer's V as my correlation coefficient, which, handily, in this case, gives the same numeric value as if I were to use Pearson's correlation coefficient, thus allowing me to use pwcorr to produce a correlation matrix. which I can turn into a heatmap using the command heatplot (ssc install heatplot), overlaying the correlation coefficient and p-value.
Question: There is a risk that a high correlation coefficient can be driven by most of the results falling into one cell of the 2x2 matrix. The only way I can think of to give the reader confidence that the result is not skewed in this way is also to show the 2x2 frequency table for each cell in the correlation matrix heatplot. Rather than presenting a long list of 2x2 frequency matrices, it would be easier for the reader to have these 2x2 frequency tables presented in an identically laid out correlation matrix, which, instead of containing correlation coefficients and p-values, each would contain the relevant 2x2 frequency matrix. I am at a complete loss as to how it might be possible to do this - any advice would be hugely appreciated.
Kind regards
Robert Shaw
Comment