New on SSC: -pwcorrf- module, a more powerful version of pwcorr (with within-variable correlation option)

Jesse Wursten

Join Date: Jan 2016

Posts: 915
#1

New on SSC: -pwcorrf- module, a more powerful version of pwcorr (with within-variable correlation option)

22 Jul 2016, 05:18

Dear all

-pwcorrf- is now available on SSC. It has three advantages over the standard pwcorr command.
It is much faster (often 10x or more)

It can calculate within variable correlations. E.g. correlations across panel units. Before you'd have to reshape the data first, which was not always possible (variable limit) and very very slow.

It returns the matrix r(T) which shows the number of observations used to calculate each pairwise correlation.

Demo

Code:

*** Correlation across variables sysuse citytemp.dta, clear pwcorrf heatdd cooldd tempjan tempjuly, showt qui replace heatdd = . if runiform() < 0.3 qui replace tempjan = . if runiform() < 0.8 pwcorrf heatdd cooldd tempjan tempjuly, showt *** Correlation within variables sysuse xtline1.dta, clear pwcorrf calories, reshape qui reshape wide calories, i(day) j(person) pwcorrf calories* pwcorr calories* *** Returns r(T) pwcorrf calories* return list pwcorr calories* return list

Todo list
return r(P), a matrix with the significance of each pairwise correlation

return r(Pd), the same matrix with Dunnett's test-based p-values. Note that in my understanding, Bonferroni and Sidak corrections are not valid for pairwise correlations as they assume the tests are independent? I.e. different "control groups" for each correlation.

Comments, feedback, bug reports and so on are always welcome.
Jesse Wursten
KU Leuven

Last edited by Jesse Wursten; 22 Jul 2016, 05:38.
Tags: None

1 like
Nick Cox

Join Date: Mar 2014

Posts: 35639
#2

22 Jul 2016, 05:25

Looks helpful, but "the number of time periods used" presupposes that the user has time series data, which is very often not so. Number of observations is presumably the right generalisation.
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#3

22 Jul 2016, 05:38

Originally posted by Nick Cox View Post

Looks helpful, but "the number of time periods used" presupposes that the user has time series data, which is very often not so. Number of observations is presumably the right generalisation.

Good point, I have edited the post. Thanks!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35639
#4

22 Jul 2016, 05:44

Thanks for the quick edit, but for the program and its help the same point applies. T is a common and congenial notation to (many) economists thinking of a time index running t = 1, ..., T, but the Stata convention is that r(N) is the number of observations in question and n is pretty standard for sample size.

It's your program! But to maximize compatibility with Stata conventions and majority conventions across statistical science, I'd suggest a quick syntax change.
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#5

22 Jul 2016, 05:55

Originally posted by Nick Cox View Post

Thanks for the quick edit, but for the program and its help the same point applies. T is a common and congenial notation to (many) economists thinking of a time index running t = 1, ..., T, but the Stata convention is that r(N) is the number of observations in question and n is pretty standard for sample size.

It's your program! But to maximize compatibility with Stata conventions and majority conventions across statistical science, I'd suggest a quick syntax change.

The problem with using N is that in a panel setting, N stands for the number of panel units rather than the total number of observations. I can perhaps change it to "nobs", which I think is also used in some other Stata programs to indicate the relevant number of observations?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35639
#6

22 Jul 2016, 06:13

I can't think that r(nobs) would be better at all and I indeed I don't recollect ever seeing it.

As I say, it's your program. All I want to emphasise from my experience over some while in putting Stata programs into the public domain is that being as consistent as possible with other Stata conventions helps your users and is in your interests too in cutting down on the email and forum questions that may be asked.
Comment
Mazen Mourad

Join Date: Dec 2020

Posts: 2
#7

25 Dec 2020, 23:37

Hi Jesse.

If I have fixed effects panel data that's already reshaped to long format and want to test correlation between a handful of variables, how do I input the code?
is it simply pwcorrf var1 var2 var3.... ?

Thanks
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#8

29 Dec 2020, 06:28

Originally posted by Mazen Mourad View Post

Hi Jesse.

If I have fixed effects panel data that's already reshaped to long format and want to test correlation between a handful of variables, how do I input the code?
is it simply pwcorrf var1 var2 var3.... ?

Thanks

Yes.
Comment
ericmelse

Join Date: May 2014

Posts: 434
#9

29 Dec 2020, 06:50

Hi Jesse,

Are you going to follow up on your(?):

Todo list
return r(P), a matrix with the significance of each pairwise correlation

return r(Pd), the same matrix with Dunnett's test-based p-values. Note that in my understanding, Bonferroni and Sidak corrections are not valid for pairwise correlations as they assume the tests are independent? I.e. different "control groups" for each correlation.

I think that it would be helpful to have these return matrices.

http://publicationslist.org/eric.melse
Comment
Mazen Mourad

Join Date: Dec 2020

Posts: 2
#10

30 Dec 2020, 00:53

Originally posted by Jesse Wursten View Post

Yes.

Thanks for your response.
Is there a way to get the significance with pwcorrf? It doesn't seem to work as it would with pwcorr.
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#11

04 Jan 2021, 03:48

I don't have time for it at the moment, though maybe in the next weeks I can look into it
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#12

03 Jun 2021, 13:06

Originally posted by Mazen Mourad View Post

Thanks for your response.
Is there a way to get the significance with pwcorrf? It doesn't seem to work as it would with pwcorr.

Apologies for revisiting five and a half years after the last post, but that would indeed be very interesting to find out!

On another note,is it possible to employ pwcorrf to study cross sectional correlation of residuals? In another words, could pwcorrf be employed as postestimation command?

Thanks in advance!

Last edited by Maxence Morlet; 03 Jun 2021, 13:11.
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#13

07 Jun 2021, 01:10

Nothing prevents you from using the residuals as variables (with the reshape option). However, I would suggest running an cross-sectional dependence test instead, e.g. Pesaran's CD test, implemented in xtcdf (and the main motivation to write this command).
Comment

Announcement

New on SSC: -pwcorrf- module, a more powerful version of pwcorr (with within-variable correlation option)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment