Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • pairwise covariance matrix

    Is there a convenient way to get a pairwise covariance matrix in Stata?
    corr x y z, cov
    returns a listwise covariance matrix, and
    pwcorr x yz
    returns a pairwise correlation matrix, but
    pwcorr x y z, cov
    returns an error because pwcorr doesn't have a cov option, though corr does.

    Is there anything out there, short of calculating it "by hand," that would return the pairwise covariance matrix.

    (BTW I do know that pairwise covariance matrices are often a bad idea, and in fact my goal here is to conduct a simulation that would show that clearly.)


  • #2
    Yes, there’s a user-contributed command called -pwcov- or similar on SSC that does this.

    Comment


    • #3
      Thanks! Next round: how do I pull a certain covariate out of the matrix in the return list from pwcov.

      I often wonder how to pull values out of the matrix in a return list.

      I'm hoping there's someway to make pwcov compatible with statsby....
      Last edited by paulvonhippel; 17 May 2023, 13:35.

      Comment


      • #4
        Thanks! Next round: how do I pull a certain covariate out of the matrix in the return list from pwcov.

        I often wonder how to pull values out of the matrix in a return list.

        I'm hoping there's someway to make pwcov compatible with statsby....

        Comment


        • #5
          Sometimes like this will work. Just wrap the matrix expression in parentheses. Unfortunately, the matrix returned by pwcov is not labeled by the variable names, so you'll have to manage with the numeric indices yourself, or write a wrapper that adequately labels the matrix.

          Code:
          sysuse auto
          statsby  cov21=(r(pwcov)[2,1]) ,by(foreign) : pwcov price mpg

          Comment


          • #6
            Nice, thanks! But I'm noticing that -pwcov- only returns the covariances, not the variances along the diagonal of the covariance matrix. Is there a command that returns both?

            Maybe KitBaum , who wrote -pwcov-, knows?

            Comment


            • #7
              Might I suggest rolling your own program in this case? Here's a quick and dirty wrapper I wrote:

              Code:
              program makepwcov, rclass
                version 18
                syntax varlist [if] [in]
                marksample touse
              
                unab vrows : `varlist'
                local vcols `vrows'
                mat pwcov = J(`: word count `vrows'', `: word count `vrows'', .)
                forval j = 1/`: word count `vrows'' {
                  local x : word `j' of `vrows'
                  qui summ `x' if `touse'
                  mat pwcov[`j', `j'] = `r(Var)'
                }
               
                forval j = 1/`: word count `vrows'' {
                  forval k = 1/`: word count `vcols'' {
                    if `k' > `j' {
                      local x : word `j' of `vrows'
                      local y : word `k' of `vcols'
                      qui corr `x' `y' if `touse', cov
                      mat pwcov[`j', `k'] = `r(cov_12)'
                      mat pwcov[`k', `j'] = `r(cov_12)'
                    }
                  }
                }
                assert issymmetric(pwcov)
               
                matrix rownames pwcov = `vrows'
                matrix colnames pwcov = `vcols'
                return matrix pwcov = pwcov
              end

              Comment


              • #8
                Wow, Leonardo Guizzetti ! I'm impressed with how quickly you whipped that up! Even more thanks if you'd like to fix the little bug I demonstrate in the code below:
                Code:
                webuse mheart5, clear
                
                /* Here's the covariance matrix from -makepwcov- */
                makepwcov age bmi attack
                matrix list r(pwcov)
                
                /* And here's the covariance matrix from -pwcov- */
                pwcov age bmi attack
                matrix list r(pwcov)
                
                /* They don't agree on the pairwise covariance between age and attack.
                   The next command shows that pwcov is right about that covariance. */
                corr age attack, cov

                Comment


                • #9
                  In fact, Leonardo Guizzetti , I think your command actually produces the listwise covariance matrix rather than the pairwise covariance matrix.
                  Because the output from

                  Code:
                  makepwcov age bmi attack
                  matrix list r(pwcov)
                  is the same as the output from
                  Code:
                  corr age bmi attack, cov
                  I think this is because you calculate all variances and covariances from the same sample, defined near the top of your code with
                  Code:
                  marksample touse
                  That's what listwise deletion does. Pairwise deletion calculates each variance and covariance from a different sample, each sample consisting of the cases with observed values for the one or two variables being covaried.

                  Comment


                  • #10
                    I think I debugged the -pwcov- program by removing every use of -marksample- and -touse-.
                    After debugging, -makepwcov- returns the same covariances as -pwcov-.
                    The debugged code reads:
                    Code:
                    program makepwcov, rclass
                      version 16
                      syntax varlist [if] [in]
                    *  marksample touse
                    
                      unab vrows : `varlist'
                      local vcols `vrows'
                      mat pwcov = J(`: word count `vrows'', `: word count `vrows'', .)
                      forval j = 1/`: word count `vrows'' {
                        local x : word `j' of `vrows'
                        qui summ `x' /* if `touse' */
                        mat pwcov[`j', `j'] = `r(Var)'
                      }
                     
                      forval j = 1/`: word count `vrows'' {
                        forval k = 1/`: word count `vcols'' {
                          if `k' > `j' {
                            local x : word `j' of `vrows'
                            local y : word `k' of `vcols'
                            qui corr `x' `y' /* if `touse' */, cov
                            mat pwcov[`j', `k'] = `r(cov_12)'
                            mat pwcov[`k', `j'] = `r(cov_12)'
                          }
                        }
                      }
                      assert issymmetric(pwcov)
                     
                      matrix rownames pwcov = `vrows'
                      matrix colnames pwcov = `vcols'
                      return matrix pwcov = pwcov
                    end
                    Last edited by paulvonhippel; 17 May 2023, 19:13.

                    Comment


                    • #11
                      Hi Paul, well-spotted with the bug. I agree that by marking the sample out based on the varlist, it would indeed be equivalent to list-wise deletion. I did say it was quickly written.

                      Comment


                      • #12
                        OK, round 3 is to make -pwcov- work with -statsby-. The following output isn't right because it's the same for both -by- groups:
                        Code:
                        webuse mheart5, clear
                        statsby  cov21=(r(pwcov)[3,1]), by(female) saving(pwcov, replace): makepwcov age bmi attack
                        use pwcov
                        list
                        What change to the -pwcov- program would make this work right? I suspect this involves -marksample- again. Or maybe -byable-...
                        Last edited by paulvonhippel; 17 May 2023, 21:28.

                        Comment


                        • #13
                          So the revised version of my program should be:

                          Code:
                          program makepwcov, rclass
                            version 18
                            syntax varlist [if] [in]
                            marksample touse, novarlist
                           
                            list
                           
                            unab vrows : `varlist'
                            local vcols `vrows'
                            mat pwcov = J(`: word count `vrows'', `: word count `vrows'', .)
                            forval j = 1/`: word count `vrows'' {
                              local x : word `j' of `vrows'
                              qui summ `x' if `touse'
                              mat pwcov[`j', `j'] = `r(Var)'
                            }
                           
                            forval j = 1/`: word count `vrows'' {
                              forval k = 1/`: word count `vcols'' {
                                if `k' > `j' {
                                  local x : word `j' of `vrows'
                                  local y : word `k' of `vcols'
                                  qui corr `x' `y' if `touse', cov
                                  mat pwcov[`j', `k'] = `r(cov_12)'
                                  mat pwcov[`k', `j'] = `r(cov_12)'
                                }
                              }
                            }
                            assert issymmetric(pwcov)
                           
                            matrix rownames pwcov = `vrows'
                            matrix colnames pwcov = `vcols'
                            return matrix pwcov = pwcov
                          end
                          This change to marksample will not exclude observations in the varlist which have missing values. The 'if' condition is nevertheless required for -statsby- with a by-group.

                          Note that since the row and column names of the returned -r(pwcov)- matrix are labelled, you can address elements by name.

                          Code:
                          webuse mheart5, clear
                          sort female
                          by female: corr age bmi, cov
                          by female: corr age attack, cov
                          by female: corr bmi attack, cov
                          statsby  cov21=(r(pwcov)["bmi","age"]), by(female) clear: makepwcov age bmi attack
                          list

                          Comment


                          • #14
                            This is a great solution for small datasets, but it runs slowly on big ones. Today KitBaum kindly updated his fast, Mata-based command -pwcov- to include variances on the diagonal. You can install it with
                            ssc install pwcov

                            Comment


                            • #15
                              The updated pwcov, available from SSC, presents variances on the diagonal of the return matrix and, being based on Mata, can handle variables of any N (works fine with set obs 1000000). The non-Mata solution posted earlier would not work on Stata/BE, for example, for N>800. The Mata code also runs much faster than the ado-file code.

                              Comment

                              Working...
                              X