Spearman correlation - 95% confidence interval?

Robin Fatch

Join Date: Oct 2015

Posts: 36
#1

Spearman correlation - 95% confidence interval?

01 Jun 2017, 10:30

Hi,

I have calculated a Spearman correlation between 2 variables (see example below). Is there a way to compute the 95% CI for the correlation?

Thanks,
Robin

spearman var1 var2, pw

Number of obs = 38
Spearman's rho = 0.6750

Test of Ho: var1 and var2 are independent
Prob > |t| = 0.0000
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

01 Jun 2017, 10:57

Robin:
you may want to consider -bootstrap-:

Code:

. sysuse auto.dta
(1978 Automobile Data)

. spearman mpg rep78

 Number of obs =      69
Spearman's rho =       0.3098

Test of Ho: mpg and rep78 are independent
    Prob > |t| =       0.0096


. bootstrap r(rho), reps(1000) nodots : spearman mpg rep78

Warning:  Because spearman is not an estimation command or does not set e(sample), bootstrap has no way to
          determine which observations are used in calculating the statistics and so assumes that all
          observations are used.  This means that no observations will be excluded from the resampling because
          of missing values or other reasons.

          If the assumption is not true, press Break, save the data, and drop the observations that are to be
          excluded.  Be sure that the dataset in memory contains only the relevant data.

Bootstrap results                               Number of obs     =         74
                                                Replications      =      1,000

      command:  spearman mpg rep78
        _bs_1:  r(rho)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   .3098267   .1220576     2.54   0.011     .0705982    .5490552
------------------------------------------------------------------------------

. estat bootstrap, all

Bootstrap results                               Number of obs     =         74
                                                Replications      =       1000

      command:  spearman mpg rep78
        _bs_1:  r(rho)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   .30982668  -.0000306    .1220576    .0705982   .5490552   (N)
             |                                       .0525013   .5468293   (P)
             |                                       .0427392   .5305987  (BC)
------------------------------------------------------------------------------
(N)    normal confidence interval
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Robin Fatch

Join Date: Oct 2015

Posts: 36
#3

01 Jun 2017, 17:29

Thank you, Carlo! This is great.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#4

02 Jun 2017, 01:45

Alternatively, you can use the fact that a Spearman correlation is just a regular correlation on variables transformed to be their rank. You can do the transformation with the egen function rank(), and you can compute the confidence interval using Nick Cox's corrci (type search corrci to find it):

Code:

// open and prepare some example data sysuse auto, clear gen byte touse = !missing(price, weight) egen iprice = rank(price) if touse egen iweight = rank(weight) if touse // user writen, type -search corrci- to find it corrci iprice iweight // check if I indeed compute a Spearman correlation spearman price weight // check ci against the bootstrap keep if touse bootstrap r(rho), reps(20000) nodots bca : spearman price weight estat bootstrap, bca

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35694
#5

02 Jun 2017, 06:56

I like Maarten's approach for a mix of obvious and not so obvious reasons.

There is small print as usual. In particular, the sampling distribution of Spearman rho is in principle discrete, so watch out with small samples. (The definition of a small sample is that its size can bite you.)
Comment
Robin Fatch

Join Date: Oct 2015

Posts: 36
#6

02 Jun 2017, 16:30

Thanks! We do have a small sample (n = 38). I assume that I should expect the corrci CI to be similar but not identical to the bootstrap CI?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35694
#7

02 Jun 2017, 23:36

I wouldn't assume it; I would check whether it's true. If there is a clash, I would go with bootstrap.
Comment
Tiago Munhoz

Join Date: Mar 2015

Posts: 16
#8

08 Apr 2018, 12:51

Dear all,

Could I estimate Spearman's rho adjusted for a third variable? ( "confunding variable" ) . If so, how could I do that?

Thanks in advance. Best wishes.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1132
#9

08 Apr 2018, 13:30

Hi Tiago. I think your question is different enough from Robin's to merit a new topic. I recommend that you post your question again with a topic something like:
Spearman correlation: How can I adjust for a 3rd variable?
If everyone starts answering in this thread, others who are searching for an answer to the same question in future will have trouble finding this discussion.

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
2 likes
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#10

08 Apr 2018, 15:32

I agree wirh Bruce. That said, you may check - npregress -, for a nonparametric regression, which is available in Stata 15.

Best regards,

Marcos
1 like
Comment
Erum Hartung

Join Date: Dec 2018

Posts: 1
#11

10 Dec 2018, 13:43

Carlo, your bootstrap solution was very helpful, thank you! One question - what is the correct p value to use when reporting the data? As in the example above, when I ran this using my data, the p value I get from running the regular Spearman is different than the one that is reported when using bootstrap, e.g.

Regular Spearman:
. spearman var1 var2, stats(rho obs p)

Number of obs = 25
Spearman's rho = -0.4640

Test of Ho: var1 and var2 are independent
Prob > |t| = 0.0195

Bootstrap:

Bootstrap results Number of obs = 25
Replications = 1000

command: spearman var1 var2, stats(rho obs p)
_bs_1: r(rho)

------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_bs_1 | -.4640247 .1759925 -2.64 0.008 -.8089636 -.1190858
------------------------------------------------------------------------------

Thanks in advance for any advice!

Last edited by Erum Hartung; 10 Dec 2018, 13:46. Reason: Sorry the spacing doesn't show up properly when I tried to cut and paste my answer. The bootstrap gives the same rho (-0.4640), but the p value is 0.008 (rather than 0.0195)
Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3456

#12

11 Dec 2018, 03:56

If you want to report a bootstrap p-value for a (Spearman's) correlation, then I would not report the results from bootstrap. The problem with that is that it only uses the bootstrap to estimate the standard error, but the p-value and confidence interval still assume a normally distributed sampling distribution for the correlation coefficients, which we know can be problematic.

To get more meaningful bootstrap confidence intervals you would add the option bca to the bootstrap command, and after the bootstrap command you type estat bootstrap, bca

Getting meaningful bootstrap p-values (achieved significance levels, or ASL) is a bit more tricky. In the manual there is example on how an ASL can be computed. It is still a bit tricky, so I packaged that up in a program

Code:

*! version 1.0.0 11Dec2018 MLB
program define asl_spearman, rclass
    syntax varlist(min=2 max=2 numeric) [if] [in], ///
           [nodots reps(numlist max=1 integer >0)  ///
           mclevel(cilevel)]

    preserve
    tempvar touse ivar1 ivar2 resid
    tempname observed p mcalpha asl a b lb ub
    tempfile bs    

    if "`reps'" == "" {
        local reps = 1000
    }

    quietly {
        gettoken var1 var2 : varlist
        marksample touse
        keep if `touse'

        egen `ivar1' = rank(`var1')
        egen `ivar2' = rank(`var2')

        corr `ivar1' `ivar2'
        scalar `observed' = r(rho)

        reg `ivar1' `ivar2'
        scalar `p' =  2*ttail(e(df_r), abs(_b[`ivar2']/_se[`ivar2']))
        predict double `resid' , resid


        noi bootstrap rho = r(rho), reps(`reps') saving(`bs') ///
            `dots' nowarn noheader notable: ///
            corr `resid' `ivar2'
            
        use `bs', clear
        count if abs(rho) >= abs(`observed')
        scalar `a'   = r(N) + 1

        count if !missing(rho)
        scalar `b'   = r(N) + 2 - `a'
        scalar `asl' = (`a') / (r(N) + 1)
        local valid = r(N)
    }
    
    scalar `mcalpha' = (100-`mclevel')/200
    local ndecimal = min(ceil(log10(`reps'+1)),4)
    local aslfmt "%`=`ndecimal'+2'.`ndecimal'f"
    
    scalar `lb' = invibeta(`a', `b', `mcalpha')
    scalar `ub' = invibetatail(`a', `b', `mcalpha')

    display _n
    display as txt "Spearman's rho                    = " as result  %-6.4f `observed'
    display as txt "asymptotic significance level     = " as result `aslfmt' `p'
    display as txt "achieved significance level (ASL) = " as result `aslfmt' `asl'
    display as txt "`mclevel'% Monte Carlo CI for ASL {col 35}= [" _c
    display as result `aslfmt' `lb' as txt ", " as result `aslfmt' `ub' as txt "]"

    return scalar ub         = `ub'
    return scalar lb         = `lb'
    return scalar reps       = `reps'
    return scalar valid_resp = `valid'
    return scalar p_asymp    = `p'
    return scalar asl        = `asl'
    return scalar rho        = `observed'
end

(as a minor detail I tend to prefer (k+1)/(B+1) over k/B as used in the manual entry as an estimate for the ASL, for two justifications of my choice see: http://maartenbuis.nl/presentations/gsug13.pdf in the section "alternative")

I added in the output a Monte Carlo confidence interval for the p-value. There is by design randomness in the Bootstrap and its results, the Monte Carlo confidence interval tells you the kind of variability you can expect when rerunning the bootstrap with a different seed. If the MCCI is too large, you can make it smaller by increasing the number of replications. For a "first look" at the results I typically use something like a 1,000 replications, and 20,000 typically works well for me for final results.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Bruce Weaver

Join Date: May 2014
Posts: 1132

#13

10 Nov 2022, 15:39

My apologies for resurrecting a thread that has been asleep for about 4 years, but something I saw earlier today prompted me to do a search on <Stata Spearman rho confidence intervals>, and I landed here. I then proceeded to tinker around with Marten's example in #4. Briefly, I computed the CI for Spearman's rho "by hand", but using two different SEs. The two SEs were:

sqrt(1/(n-3))
sqrt((1+rs^2/2)/(n-3))

The second SE is recommended by Bonett & Wright (2000) for Spearman correlations with absolute values < 0.95.

Nick Cox, how feasible would it be to add an option to -corrci- to use this second SE when one is using ranked data?

My code can be found at the end of this post. Here are the main results.

Code:

. list zcrit-SE in 1/2

     +--------------------------------------------------------------------------------------------------+
     |    zcrit    n           r         zr        SEzr          lb          ub                      SE |
     |--------------------------------------------------------------------------------------------------|
  1. | 1.959964   74   .48653732   .5315137   .11867817   .29031367   .64349652           sqrt(1/(n-3)) |
  2. | 1.959964   74   .48653732   .5315137   .12550514   .27801376     .651269   sqrt((1+r^2/2)/(n-3)) |
     +--------------------------------------------------------------------------------------------------+

.  
. // Compare to -corrci- using ranks
. corrci iprice iweight

(obs=74)

                  correlation and 95% limits
iprice  iweight      0.487    0.290    0.643


. // Notice that the CI I computed in row 1 matches the CI obtained via
. // -corrci-.  This confirms that my "hand" calcuations are correct.
.
. // check ci against the bootstrap
. set seed 1234

. bootstrap r(rho), reps(20000) nodots bca : spearman price weight

warning: spearman does not set e(sample), so no observations will be excluded from the resampling because of missing values or other
         reasons. To exclude observations, press Break, save the data, drop any observations that are to be excluded, and rerun
         bootstrap.

Bootstrap results                                       Number of obs =     74
                                                        Replications  = 20,000

      Command: spearman price weight
        _bs_1: r(rho)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _bs_1 |   .4865373   .0961411     5.06   0.000     .2981042    .6749705
------------------------------------------------------------------------------

. estat bootstrap, bca

Bootstrap results                               Number of obs     =         74
                                                Replications      =      20000

      Command: spearman price weight
        _bs_1: r(rho)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             | coefficient       Bias    std. err.  [95% conf. interval]
-------------+----------------------------------------------------------------
       _bs_1 |   .48653732  -.0055842   .09614112    .2745595   .6518462 (BCa)
------------------------------------------------------------------------------
Key: BCa: Bias-corrected and accelerated

// Notice that my row 2 CI is very similar to the BCa interval.

Now here is the code.

Code:

// Modification of Maarten Buis's code in #4.
// open and prepare some example data
sysuse auto, clear
keep if !missing(price, weight)
egen iprice = rank(price)
egen iweight = rank(weight)

// Bonett and Wright (2000) say that for absolute values of Spearman
// correlations < .95, one should use sqrt((1+r^2/2)/(n-3)) as the SE
// of atanh(rs), not sqrt(1/(n-3)), as one does for Pearson r.

// Let's compute the 95% CI for Spearman's rho "by hand" using
// both of those SEs.
// On row 1, use SEzr = sqrt(1/(n-3)) to match the results from -corrci-.
// On row 2, use SEzr = sqrt((1+rs^2/2)/(n-3)), as suggested by B&W (2000).

quietly {
pwcorr iprice iweight // Pearson r on the ranks = Spearman rho
generate double zcrit = invnormal(0.975) in 1/2
generate n = r(N) in 1/2
generate double r = r(rho) in 1
generate double zr = atanh(r) in 1
generate double SEzr = sqrt(1/(n-3)) in 1
generate double lb = tanh(zr - (zcrit*SEzr)) in 1
generate double ub = tanh(zr + (zcrit*SEzr)) in 1
spearman price weight
replace r = r(rho) in 2
replace zr = atanh(r) in 2
// Now use the SE recommended by B&W (2000)
replace SEzr = sqrt((1+r^2/2)/(n-3)) in 2
replace lb = tanh(zr - (zcrit*SEzr)) in 2
replace ub = tanh(zr + (zcrit*SEzr)) in 2
generate str25 SE = "sqrt(1/(n-3))" in 1
replace SE = "sqrt((1+r^2/2)/(n-3))" in 2
}

list zcrit-SE in 1/2
 
// Compare to -corrci- using ranked data
corrci iprice iweight
matrix list r(lb)
matrix list r(ub)
// Notice that the CI I computed in row 1 matches the CI obtained via
// -corrci-.  This confirms that my "hand" calcuations are correct.

// check CIs against the bootstrap
set seed 1234
bootstrap r(rho), reps(20000) nodots bca : spearman price weight
estat bootstrap, bca
list zcrit-SE in 1/2
// Notice that my row 2 CI is very similar to the BCa interval.

/*
References

Bonett, D. G., & Wright, T. A. (2000). Sample size requirements
  for estimating Pearson, Kendall and Spearman correlations.
  Psychometrika, 65(1), 23-28.
 
https://www.youtube.com/watch?v=gKHgCq5E864

https://www.statalist.org/forums/forum/general-stata-discussion/general/1395923-spearman-correlation-95-confidence-interval

*/

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35694
#14

10 Nov 2022, 17:16

It’s feasible, but I am reluctant to add it.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35694
#15

11 Nov 2022, 11:44

Just to expand on the previous:

corrci wraps around official Stata code for Pearson correlation. It neither checks for, nor cares about, whether the original data are ranks.

So, nothing stops a user applying it to ranks and thereby getting a Spearman correlation as a Pearson correlation on ranks. But that's a user decision and whether the inferential machinery carries over to Spearman correlations is a question for the user.

Also, I don't want to get into adding options for complications. Yet another example might be the application of correlations to time or spatial series, where adjustments for dependence might well be urged. If so, that is an issue that needs different code.

So, this is the program author throwing responsibility back on the user, where it does belong.
Comment

Announcement