Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spearman correlation - 95% confidence interval?

    Hi,

    I have calculated a Spearman correlation between 2 variables (see example below). Is there a way to compute the 95% CI for the correlation?

    Thanks,
    Robin


    spearman var1 var2, pw

    Number of obs = 38
    Spearman's rho = 0.6750

    Test of Ho: var1 and var2 are independent
    Prob > |t| = 0.0000



  • #2
    Robin:
    you may want to consider -bootstrap-:
    Code:
    . sysuse auto.dta
    (1978 Automobile Data)
    
    . spearman mpg rep78
    
     Number of obs =      69
    Spearman's rho =       0.3098
    
    Test of Ho: mpg and rep78 are independent
        Prob > |t| =       0.0096
    
    
    . bootstrap r(rho), reps(1000) nodots : spearman mpg rep78
    
    Warning:  Because spearman is not an estimation command or does not set e(sample), bootstrap has no way to
              determine which observations are used in calculating the statistics and so assumes that all
              observations are used.  This means that no observations will be excluded from the resampling because
              of missing values or other reasons.
    
              If the assumption is not true, press Break, save the data, and drop the observations that are to be
              excluded.  Be sure that the dataset in memory contains only the relevant data.
    
    Bootstrap results                               Number of obs     =         74
                                                    Replications      =      1,000
    
          command:  spearman mpg rep78
            _bs_1:  r(rho)
    
    ------------------------------------------------------------------------------
                 |   Observed   Bootstrap                         Normal-based
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _bs_1 |   .3098267   .1220576     2.54   0.011     .0705982    .5490552
    ------------------------------------------------------------------------------
    
    . estat bootstrap, all
    
    Bootstrap results                               Number of obs     =         74
                                                    Replications      =       1000
    
          command:  spearman mpg rep78
            _bs_1:  r(rho)
    
    ------------------------------------------------------------------------------
                 |    Observed               Bootstrap
                 |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _bs_1 |   .30982668  -.0000306    .1220576    .0705982   .5490552   (N)
                 |                                       .0525013   .5468293   (P)
                 |                                       .0427392   .5305987  (BC)
    ------------------------------------------------------------------------------
    (N)    normal confidence interval
    (P)    percentile confidence interval
    (BC)   bias-corrected confidence interval
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you, Carlo! This is great.

      Comment


      • #4
        Alternatively, you can use the fact that a Spearman correlation is just a regular correlation on variables transformed to be their rank. You can do the transformation with the egen function rank(), and you can compute the confidence interval using Nick Cox's corrci (type search corrci to find it):

        Code:
        // open and prepare some example data
        sysuse auto, clear
        gen byte touse = !missing(price, weight)
        egen iprice = rank(price) if touse
        egen iweight = rank(weight) if touse
        
        // user writen, type -search corrci- to find it
        corrci iprice iweight
        
        // check if I indeed compute a Spearman correlation
        spearman price weight
        
        // check ci against the bootstrap
        keep if touse
        bootstrap r(rho), reps(20000) nodots bca : spearman price weight
        estat bootstrap, bca
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          I like Maarten's approach for a mix of obvious and not so obvious reasons.

          There is small print as usual. In particular, the sampling distribution of Spearman rho is in principle discrete, so watch out with small samples. (The definition of a small sample is that its size can bite you.)

          Comment


          • #6
            Thanks! We do have a small sample (n = 38). I assume that I should expect the corrci CI to be similar but not identical to the bootstrap CI?

            Comment


            • #7
              I wouldn't assume it; I would check whether it's true. If there is a clash, I would go with bootstrap.

              Comment


              • #8
                Dear all,

                Could I estimate Spearman's rho adjusted for a third variable? ( "confunding variable" ) . If so, how could I do that?

                Thanks in advance. Best wishes.

                Comment


                • #9
                  Hi Tiago. I think your question is different enough from Robin's to merit a new topic. I recommend that you post your question again with a topic something like:
                  Spearman correlation: How can I adjust for a 3rd variable?
                  If everyone starts answering in this thread, others who are searching for an answer to the same question in future will have trouble finding this discussion.

                  HTH.
                  --
                  Bruce Weaver
                  Email: [email protected]
                  Version: Stata/MP 18.5 (Windows)

                  Comment


                  • #10
                    I agree wirh Bruce. That said, you may check - npregress -, for a nonparametric regression, which is available in Stata 15.
                    Best regards,

                    Marcos

                    Comment


                    • #11
                      Carlo, your bootstrap solution was very helpful, thank you! One question - what is the correct p value to use when reporting the data? As in the example above, when I ran this using my data, the p value I get from running the regular Spearman is different than the one that is reported when using bootstrap, e.g.

                      Regular Spearman:
                      . spearman var1 var2, stats(rho obs p)

                      Number of obs = 25
                      Spearman's rho = -0.4640

                      Test of Ho: var1 and var2 are independent
                      Prob > |t| = 0.0195


                      Bootstrap:

                      Bootstrap results Number of obs = 25
                      Replications = 1000

                      command: spearman var1 var2, stats(rho obs p)
                      _bs_1: r(rho)

                      ------------------------------------------------------------------------------
                      | Observed Bootstrap Normal-based
                      | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                      _bs_1 | -.4640247 .1759925 -2.64 0.008 -.8089636 -.1190858
                      ------------------------------------------------------------------------------


                      Thanks in advance for any advice!
                      Last edited by Erum Hartung; 10 Dec 2018, 13:46. Reason: Sorry the spacing doesn't show up properly when I tried to cut and paste my answer. The bootstrap gives the same rho (-0.4640), but the p value is 0.008 (rather than 0.0195)

                      Comment


                      • #12
                        If you want to report a bootstrap p-value for a (Spearman's) correlation, then I would not report the results from bootstrap. The problem with that is that it only uses the bootstrap to estimate the standard error, but the p-value and confidence interval still assume a normally distributed sampling distribution for the correlation coefficients, which we know can be problematic.

                        To get more meaningful bootstrap confidence intervals you would add the option bca to the bootstrap command, and after the bootstrap command you type estat bootstrap, bca

                        Getting meaningful bootstrap p-values (achieved significance levels, or ASL) is a bit more tricky. In the manual there is example on how an ASL can be computed. It is still a bit tricky, so I packaged that up in a program

                        Code:
                        *! version 1.0.0 11Dec2018 MLB
                        program define asl_spearman, rclass
                            syntax varlist(min=2 max=2 numeric) [if] [in], ///
                                   [nodots reps(numlist max=1 integer >0)  ///
                                   mclevel(cilevel)]
                        
                            preserve
                            tempvar touse ivar1 ivar2 resid
                            tempname observed p mcalpha asl a b lb ub
                            tempfile bs    
                        
                            if "`reps'" == "" {
                                local reps = 1000
                            }
                        
                            quietly {
                                gettoken var1 var2 : varlist
                                marksample touse
                                keep if `touse'
                        
                                egen `ivar1' = rank(`var1')
                                egen `ivar2' = rank(`var2')
                        
                                corr `ivar1' `ivar2'
                                scalar `observed' = r(rho)
                        
                                reg `ivar1' `ivar2'
                                scalar `p' =  2*ttail(e(df_r), abs(_b[`ivar2']/_se[`ivar2']))
                                predict double `resid' , resid
                        
                        
                                noi bootstrap rho = r(rho), reps(`reps') saving(`bs') ///
                                    `dots' nowarn noheader notable: ///
                                    corr `resid' `ivar2'
                                    
                                use `bs', clear
                                count if abs(rho) >= abs(`observed')
                                scalar `a'   = r(N) + 1
                        
                                count if !missing(rho)
                                scalar `b'   = r(N) + 2 - `a'
                                scalar `asl' = (`a') / (r(N) + 1)
                                local valid = r(N)
                            }
                            
                            scalar `mcalpha' = (100-`mclevel')/200
                            local ndecimal = min(ceil(log10(`reps'+1)),4)
                            local aslfmt "%`=`ndecimal'+2'.`ndecimal'f"
                            
                            scalar `lb' = invibeta(`a', `b', `mcalpha')
                            scalar `ub' = invibetatail(`a', `b', `mcalpha')
                        
                            display _n
                            display as txt "Spearman's rho                    = " as result  %-6.4f `observed'
                            display as txt "asymptotic significance level     = " as result `aslfmt' `p'
                            display as txt "achieved significance level (ASL) = " as result `aslfmt' `asl'
                            display as txt "`mclevel'% Monte Carlo CI for ASL {col 35}= [" _c
                            display as result `aslfmt' `lb' as txt ", " as result `aslfmt' `ub' as txt "]"
                        
                            return scalar ub         = `ub'
                            return scalar lb         = `lb'
                            return scalar reps       = `reps'
                            return scalar valid_resp = `valid'
                            return scalar p_asymp    = `p'
                            return scalar asl        = `asl'
                            return scalar rho        = `observed'
                        end
                        (as a minor detail I tend to prefer (k+1)/(B+1) over k/B as used in the manual entry as an estimate for the ASL, for two justifications of my choice see: http://maartenbuis.nl/presentations/gsug13.pdf in the section "alternative")

                        I added in the output a Monte Carlo confidence interval for the p-value. There is by design randomness in the Bootstrap and its results, the Monte Carlo confidence interval tells you the kind of variability you can expect when rerunning the bootstrap with a different seed. If the MCCI is too large, you can make it smaller by increasing the number of replications. For a "first look" at the results I typically use something like a 1,000 replications, and 20,000 typically works well for me for final results.
                        ---------------------------------
                        Maarten L. Buis
                        University of Konstanz
                        Department of history and sociology
                        box 40
                        78457 Konstanz
                        Germany
                        http://www.maartenbuis.nl
                        ---------------------------------

                        Comment


                        • #13
                          My apologies for resurrecting a thread that has been asleep for about 4 years, but something I saw earlier today prompted me to do a search on <Stata Spearman rho confidence intervals>, and I landed here. I then proceeded to tinker around with Marten's example in #4. Briefly, I computed the CI for Spearman's rho "by hand", but using two different SEs. The two SEs were:
                          • sqrt(1/(n-3))
                          • sqrt((1+rs^2/2)/(n-3))
                          The second SE is recommended by Bonett & Wright (2000) for Spearman correlations with absolute values < 0.95.

                          Nick Cox, how feasible would it be to add an option to -corrci- to use this second SE when one is using ranked data?

                          My code can be found at the end of this post. Here are the main results.

                          Code:
                          . list zcrit-SE in 1/2
                          
                               +--------------------------------------------------------------------------------------------------+
                               |    zcrit    n           r         zr        SEzr          lb          ub                      SE |
                               |--------------------------------------------------------------------------------------------------|
                            1. | 1.959964   74   .48653732   .5315137   .11867817   .29031367   .64349652           sqrt(1/(n-3)) |
                            2. | 1.959964   74   .48653732   .5315137   .12550514   .27801376     .651269   sqrt((1+r^2/2)/(n-3)) |
                               +--------------------------------------------------------------------------------------------------+
                          
                          .  
                          . // Compare to -corrci- using ranks
                          . corrci iprice iweight
                          
                          (obs=74)
                          
                                            correlation and 95% limits
                          iprice  iweight      0.487    0.290    0.643
                          
                          
                          . // Notice that the CI I computed in row 1 matches the CI obtained via
                          . // -corrci-.  This confirms that my "hand" calcuations are correct.
                          .
                          . // check ci against the bootstrap
                          . set seed 1234
                          
                          . bootstrap r(rho), reps(20000) nodots bca : spearman price weight
                          
                          warning: spearman does not set e(sample), so no observations will be excluded from the resampling because of missing values or other
                                   reasons. To exclude observations, press Break, save the data, drop any observations that are to be excluded, and rerun
                                   bootstrap.
                          
                          Bootstrap results                                       Number of obs =     74
                                                                                  Replications  = 20,000
                          
                                Command: spearman price weight
                                  _bs_1: r(rho)
                          
                          ------------------------------------------------------------------------------
                                       |   Observed   Bootstrap                         Normal-based
                                       | coefficient  std. err.      z    P>|z|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                                 _bs_1 |   .4865373   .0961411     5.06   0.000     .2981042    .6749705
                          ------------------------------------------------------------------------------
                          
                          . estat bootstrap, bca
                          
                          Bootstrap results                               Number of obs     =         74
                                                                          Replications      =      20000
                          
                                Command: spearman price weight
                                  _bs_1: r(rho)
                          
                          ------------------------------------------------------------------------------
                                       |    Observed               Bootstrap
                                       | coefficient       Bias    std. err.  [95% conf. interval]
                          -------------+----------------------------------------------------------------
                                 _bs_1 |   .48653732  -.0055842   .09614112    .2745595   .6518462 (BCa)
                          ------------------------------------------------------------------------------
                          Key: BCa: Bias-corrected and accelerated
                          
                          // Notice that my row 2 CI is very similar to the BCa interval.
                          Now here is the code.
                          Code:
                          // Modification of Maarten Buis's code in #4.
                          // open and prepare some example data
                          sysuse auto, clear
                          keep if !missing(price, weight)
                          egen iprice = rank(price)
                          egen iweight = rank(weight)
                          
                          // Bonett and Wright (2000) say that for absolute values of Spearman
                          // correlations < .95, one should use sqrt((1+r^2/2)/(n-3)) as the SE
                          // of atanh(rs), not sqrt(1/(n-3)), as one does for Pearson r.
                          
                          // Let's compute the 95% CI for Spearman's rho "by hand" using
                          // both of those SEs.
                          // On row 1, use SEzr = sqrt(1/(n-3)) to match the results from -corrci-.
                          // On row 2, use SEzr = sqrt((1+rs^2/2)/(n-3)), as suggested by B&W (2000).
                          
                          quietly {
                          pwcorr iprice iweight // Pearson r on the ranks = Spearman rho
                          generate double zcrit = invnormal(0.975) in 1/2
                          generate n = r(N) in 1/2
                          generate double r = r(rho) in 1
                          generate double zr = atanh(r) in 1
                          generate double SEzr = sqrt(1/(n-3)) in 1
                          generate double lb = tanh(zr - (zcrit*SEzr)) in 1
                          generate double ub = tanh(zr + (zcrit*SEzr)) in 1
                          spearman price weight
                          replace r = r(rho) in 2
                          replace zr = atanh(r) in 2
                          // Now use the SE recommended by B&W (2000)
                          replace SEzr = sqrt((1+r^2/2)/(n-3)) in 2
                          replace lb = tanh(zr - (zcrit*SEzr)) in 2
                          replace ub = tanh(zr + (zcrit*SEzr)) in 2
                          generate str25 SE = "sqrt(1/(n-3))" in 1
                          replace SE = "sqrt((1+r^2/2)/(n-3))" in 2
                          }
                          
                          list zcrit-SE in 1/2
                           
                          // Compare to -corrci- using ranked data
                          corrci iprice iweight
                          matrix list r(lb)
                          matrix list r(ub)
                          // Notice that the CI I computed in row 1 matches the CI obtained via
                          // -corrci-.  This confirms that my "hand" calcuations are correct.
                          
                          // check CIs against the bootstrap
                          set seed 1234
                          bootstrap r(rho), reps(20000) nodots bca : spearman price weight
                          estat bootstrap, bca
                          list zcrit-SE in 1/2
                          // Notice that my row 2 CI is very similar to the BCa interval.
                          
                          /*
                          References
                          
                          Bonett, D. G., & Wright, T. A. (2000). Sample size requirements
                            for estimating Pearson, Kendall and Spearman correlations.
                            Psychometrika, 65(1), 23-28.
                           
                          https://www.youtube.com/watch?v=gKHgCq5E864
                          
                          https://www.statalist.org/forums/forum/general-stata-discussion/general/1395923-spearman-correlation-95-confidence-interval
                          
                          */
                          --
                          Bruce Weaver
                          Email: [email protected]
                          Version: Stata/MP 18.5 (Windows)

                          Comment


                          • #14
                            It’s feasible, but I am reluctant to add it.

                            Comment


                            • #15
                              Just to expand on the previous:

                              corrci wraps around official Stata code for Pearson correlation. It neither checks for, nor cares about, whether the original data are ranks.

                              So, nothing stops a user applying it to ranks and thereby getting a Spearman correlation as a Pearson correlation on ranks. But that's a user decision and whether the inferential machinery carries over to Spearman correlations is a question for the user.

                              Also, I don't want to get into adding options for complications. Yet another example might be the application of correlations to time or spatial series, where adjustments for dependence might well be urged. If so, that is an issue that needs different code.

                              So, this is the program author throwing responsibility back on the user, where it does belong.

                              Comment

                              Working...
                              X