Spearman correlation - 95% confidence interval?

Bruce Weaver

Join Date: May 2014
Posts: 1133

#16

05 Jan 2023, 09:47

After reading this blog post by Thom Baguley (see also this R code), I tidied up my code from #13 a bit and added the method due to Fieller et al. (1957). I'm posting it in case someone finds it useful.

Code:

* File:  CI_for_Spearman_rho.do
* Date:  5-Jan-2023
* Name:  Bruce Weaver, [email protected]

// Modification of Maarten Buis's code in #4 in the Statalist
// thread shown in the References below.
// Open and prepare some example data
sysuse auto, clear
keep if !missing(price, weight)
egen iprice = rank(price)
egen iweight = rank(weight)

// For Pearson's r, it is customary to use sqrt1/(n - 3) as the SE
// when computing a CI for Zr.  But (at least) two other suggestions
// have been made when one is computing Spearman's rho.

// Fieller et al. (1957) suggested that for |Spearman rho| <= .8,
// one should use sqrt(1.06/(n - 3)) as the SE.

// Bonett and Wright (2000) say that for absolute values of Spearman
// correlations < .95, one should use sqrt((1+r^2/2)/(n-3)) as the SE
// of atanh(rs), not sqrt(1/(n-3)), as one does for Pearson r.

// Let's compute the 95% CI for Spearman's rho "by hand" using
// all 3 of those SEs for Zr.
// On row 1, use SEzr = sqrt(1/(n-3)) to match the results from -corrci-.
// On row 2, use SEzr = sqrt(1.06/(n - 3)), as suggested by Fieller (1957).
// On row 3, use SEzr = sqrt((1+rs^2/2)/(n-3)), as suggested by B&W (2000).

quietly {
pwcorr iprice iweight // Pearson r on the ranks = Spearman rho
generate double zcrit = invnormal(0.975) in 1/3
generate n = r(N) in 1/3
generate double r = r(rho) in 1/3
generate double zr = atanh(r) in 1/3
// Make no adjustment to SEzr in row 1
generate double SEzr = sqrt(1/(n-3)) in 1
// Use Fieller's (1957) correction in row 2
replace SEzr = sqrt(1.06/(n - 3)) in 2
// Use the Bonnet & Write (2000) correction in row 3
replace SEzr = sqrt((1+r^2/2)/(n-3)) in 3
generate double lb = tanh(zr - (zcrit*SEzr)) in 1/3
generate double ub = tanh(zr + (zcrit*SEzr)) in 1/3
generate str25 SE = "sqrt(1/(n-3))" in 1
replace SE = "sqrt(1.06/(n - 3))" in 2
replace SE = "sqrt((1+r^2/2)/(n-3))" in 3
generate str25 Correction = "none"
replace Correction = "Fieller (1957)" in 2
replace Correction = "B&W (2000)" in 3
}

format r-ub %9.6f
list zcrit-Correction in 1/3
 
// Compare to -corrci- using ranked data
corrci iprice iweight
matrix list r(lb)
matrix list r(ub)
// Notice that the CI I computed in row 1 matches the CI obtained via
// -corrci-.  This confirms that my "hand" calcuations are correct.

// check CIs against the bootstrap
set seed 1234
bootstrap r(rho), reps(20000) nodots bca : spearman price weight
estat bootstrap, bca
list zcrit-Correction in 1/3
// Notice that the Bonett & Wright (2000) CI in row 3
// is very similar to the BCa interval.

/*
References

Bonett, D. G., & Wright, T. A. (2000). Sample size requirements
  for estimating Pearson, Kendall and Spearman correlations.
  Psychometrika, 65(1), 23-28.
 
Fieller, E. C., Hartley, H. O., & Pearson, E. S. (1957). Tests for
  rank correlation coefficients: I. Biometrika, 44, 470–481.
 
https://www.youtube.com/watch?v=gKHgCq5E864

https://www.statalist.org/forums/forum/general-stata-discussion/general/1395923-spearman-correlation-95-confidence-interval

https://seriousstats.wordpress.com/2020/05/18/cis-for-spearmans-rho-and-kendalls-tau/

https://rpubs.com/seriousstats/616206

*/

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Announcement

Comment