Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New on SSC: -pwcorrf- module, a more powerful version of pwcorr (with within-variable correlation option)

    Dear all

    -pwcorrf- is now available on SSC. It has three advantages over the standard pwcorr command.
    1. It is much faster (often 10x or more)
    2. It can calculate within variable correlations. E.g. correlations across panel units. Before you'd have to reshape the data first, which was not always possible (variable limit) and very very slow.
    3. It returns the matrix r(T) which shows the number of observations used to calculate each pairwise correlation.

    Demo
    Code:
    *** Correlation across variables
    sysuse citytemp.dta, clear
    pwcorrf heatdd cooldd tempjan tempjuly, showt
    
    qui replace heatdd = . if runiform() < 0.3
    qui replace tempjan = . if runiform() < 0.8
    pwcorrf heatdd cooldd tempjan tempjuly, showt
    
    *** Correlation within variables
    sysuse xtline1.dta, clear
    pwcorrf calories, reshape
    
    qui reshape wide calories, i(day) j(person)
    pwcorrf calories*
    pwcorr calories*
    
    *** Returns r(T)
    pwcorrf calories*
    return list
    
    pwcorr calories*
    return list
    Todo list
    1. return r(P), a matrix with the significance of each pairwise correlation
    2. return r(Pd), the same matrix with Dunnett's test-based p-values. Note that in my understanding, Bonferroni and Sidak corrections are not valid for pairwise correlations as they assume the tests are independent? I.e. different "control groups" for each correlation.

    Comments, feedback, bug reports and so on are always welcome.
    Jesse Wursten
    KU Leuven
    Last edited by Jesse Wursten; 22 Jul 2016, 05:38.

  • #2
    Looks helpful, but "the number of time periods used" presupposes that the user has time series data, which is very often not so. Number of observations is presumably the right generalisation.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Looks helpful, but "the number of time periods used" presupposes that the user has time series data, which is very often not so. Number of observations is presumably the right generalisation.
      Good point, I have edited the post. Thanks!

      Comment


      • #4
        Thanks for the quick edit, but for the program and its help the same point applies. T is a common and congenial notation to (many) economists thinking of a time index running t = 1, ..., T, but the Stata convention is that r(N) is the number of observations in question and n is pretty standard for sample size.

        It's your program! But to maximize compatibility with Stata conventions and majority conventions across statistical science, I'd suggest a quick syntax change.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          Thanks for the quick edit, but for the program and its help the same point applies. T is a common and congenial notation to (many) economists thinking of a time index running t = 1, ..., T, but the Stata convention is that r(N) is the number of observations in question and n is pretty standard for sample size.

          It's your program! But to maximize compatibility with Stata conventions and majority conventions across statistical science, I'd suggest a quick syntax change.
          The problem with using N is that in a panel setting, N stands for the number of panel units rather than the total number of observations. I can perhaps change it to "nobs", which I think is also used in some other Stata programs to indicate the relevant number of observations?

          Comment


          • #6
            I can't think that r(nobs) would be better at all and I indeed I don't recollect ever seeing it.

            As I say, it's your program. All I want to emphasise from my experience over some while in putting Stata programs into the public domain is that being as consistent as possible with other Stata conventions helps your users and is in your interests too in cutting down on the email and forum questions that may be asked.

            Comment


            • #7
              Hi Jesse.

              If I have fixed effects panel data that's already reshaped to long format and want to test correlation between a handful of variables, how do I input the code?
              is it simply pwcorrf var1 var2 var3.... ?

              Thanks

              Comment


              • #8
                Originally posted by Mazen Mourad View Post
                Hi Jesse.

                If I have fixed effects panel data that's already reshaped to long format and want to test correlation between a handful of variables, how do I input the code?
                is it simply pwcorrf var1 var2 var3.... ?

                Thanks
                Yes.

                Comment


                • #9
                  Hi Jesse,

                  Are you going to follow up on your(?):

                  Todo list
                  1. return r(P), a matrix with the significance of each pairwise correlation
                  2. return r(Pd), the same matrix with Dunnett's test-based p-values. Note that in my understanding, Bonferroni and Sidak corrections are not valid for pairwise correlations as they assume the tests are independent? I.e. different "control groups" for each correlation.
                  I think that it would be helpful to have these return matrices.

                  http://publicationslist.org/eric.melse

                  Comment


                  • #10
                    Originally posted by Jesse Wursten View Post

                    Yes.
                    Thanks for your response.
                    Is there a way to get the significance with pwcorrf? It doesn't seem to work as it would with pwcorr.

                    Comment


                    • #11
                      I don't have time for it at the moment, though maybe in the next weeks I can look into it

                      Comment


                      • #12
                        Originally posted by Mazen Mourad View Post

                        Thanks for your response.
                        Is there a way to get the significance with pwcorrf? It doesn't seem to work as it would with pwcorr.
                        Apologies for revisiting five and a half years after the last post, but that would indeed be very interesting to find out!

                        On another note,is it possible to employ pwcorrf to study cross sectional correlation of residuals? In another words, could pwcorrf be employed as postestimation command?

                        Thanks in advance!
                        Last edited by Maxence Morlet; 03 Jun 2021, 13:11.

                        Comment


                        • #13
                          Nothing prevents you from using the residuals as variables (with the reshape option). However, I would suggest running an cross-sectional dependence test instead, e.g. Pesaran's CD test, implemented in xtcdf (and the main motivation to write this command).

                          Comment

                          Working...
                          X