table of correlations between same two variables, but on different subsamples

Max Piper

Join Date: Dec 2015

Posts: 61
#1

table of correlations between same two variables, but on different subsamples

04 Sep 2018, 12:50

My goal is to have Stata display (ideally, export to Latex) an M-by-N "correlation table", where however each cell is not the correlation between variables m and n, but always between the same two variables x and y where x is in categories 1...M and y is in categories 1...N. (The cells should contain the correlation coefficient and t- or p-value/significance stars.)

In the below example, I would want to investigate how the correlation between price and mpg depends varies in longer vs. shorter, and foreign vs. domestic cars. The categories M and N are not categories of x and y, but of other variables, as below:

Code:

set more off clear* sysuse auto gen large = 0 replace large = 1 if length > 190 sort large foreign by large foreign: pwcorr price mpg

Effectively, I need just the results from

Code:

by large foreign: pwcorr price mpg

but displayed in table form and exported.
I followed the suggestion here: https://www.statalist.org/forums/for...ed-as-a-matrix

Code:

clear* sysuse auto capture program drop mycorr program define mycorr corr price weight gen rho = r(rho) keep rep78 rho keep in 1 exit end runby mycorr, by(rep78) list, noobs clean

and then outputting to excel. But this is for categories along one dimension, and I cannot figure out how this would be extended to two dimensions.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

04 Sep 2018, 12:55

You can put more than one variable in the -by()- option of -runby-. Read -help runby-. You will see that illustrated in the example "partitioning a file into subfiles."

Last edited by Clyde Schechter; 04 Sep 2018, 13:01.
1 like
Comment
Max Piper

Join Date: Dec 2015

Posts: 61
#3

04 Sep 2018, 13:34

Thank you. Yes, this works:

Code:

set more off clear* sysuse auto gen large = 0 replace large = 1 if length > 190 capture program drop mycorr program define mycorr corr price mpg gen rho = r(rho) keep large foreign rho keep in 1 exit end runby mycorr, by(large foreign) list, noobs clean

I am now wondering if there is a way to use e.g. a modified -corrtex- command to have a 2x2 table instead of a 4x1 column list of coefficients?
I see that the "partitioning a file into subfiles" section explains how to save two 2x1-columns that I would then manually combine, but that seems unpractical, especially as the number of categories increase.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

04 Sep 2018, 14:38

My reference to that example in -help runby- was just to illustrate that you can use more than one variable in the -by()- option.

I don't know of any -corrtext- command. There's no such command in official Stata. If it's community contributed, you could modify it if you know how to do that. But you can get your correlations re-arranged without that as follows:

Code:

set more off clear* sysuse auto gen large = 0 replace large = 1 if length > 190 capture program drop mycorr program define mycorr corr price mpg gen rho = r(rho) keep large foreign rho keep in 1 exit end runby mycorr, by(large foreign) list, noobs clean reshape wide rho, i(foreign) j(large) forvalues i = 0/1 { rename rho`i' large`i' }

If your goal here is to create a cross-tabulation for human eyes, this will do it. You can export it to a text file or a spreadsheet if you like (-help export delimited-, -help export excel-).

If, however, you plan to do additional analysis on these results, it is probably better to leave them the way they were at the end of -runby-, because most Stata commands work better (or only) with data in long layout.
1 like
Comment
Max Piper

Join Date: Dec 2015

Posts: 61
#5

06 Sep 2018, 09:03

Thank you, that works. Can you also identify how to print the significance of the correlation coefficient, similar to -rho- and -N-?
I can replace -corr- with -pwcorr- in your code, which has the -sig- option, but that seems not to be stored under -r-:

Code:

. pwcorr ind_HHI $fam , sig | ind_HHI ownfam -------------+------------------ ind_HHI | 1.0000 | | ownfam | 0.0475 1.0000 | 0.0000 | . return list scalars: r(N) = 20595 r(rho) = .0474957089324448
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

06 Sep 2018, 09:33

Are you using an old version of Stata? In my setup, -pwcorr- also returns two matrices, r(C) and r(sig), the latter containing the p-value.

Code:

. sysuse auto, clear
(1978 Automobile Data)

. pwcorr price mpg, sig

             |    price      mpg
-------------+------------------
       price |   1.0000 
             |
             |
         mpg |  -0.4686   1.0000 
             |   0.0000
             |

. return list

scalars:
                  r(N) =  74
                r(rho) =  -.4685966881951871

matrices:
                  r(C) :  2 x 2
                r(sig) :  2 x 2

. matrix list r(sig)

symmetric r(sig)[2,2]
           price        mpg
price          0
  mpg  .00002546

So, you can do this:

Code:

set more off
clear*
sysuse auto

gen large = 0
replace large = 1 if length > 190

capture program drop mycorr
program define mycorr
    pwcorr price mpg, sig
    return list
     matrix S = r(sig)
    gen rho = r(rho)
    keep large foreign rho
    gen pvalue = S[2, 1]
    keep in 1
    exit
end

runby mycorr, by(large foreign)

list, noobs clean

reshape wide rho pvalue, i(foreign) j(large)
forvalues i = 0/1 {
    rename rho`i' large`i'_rho
    rename pvalue`i' large`i'_p
}

Announcement

table of correlations between same two variables, but on different subsamples

Comment

Comment

Comment

Comment

Comment