Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Meta-analysis of correlation coefficients in Stata

    Dear Statalist

    I'm currently working on a meta-analysis of correlation coefficients and am looking at the commands available in Stata. Imagine I have data like this:

    Code:
    input str10 study year r n
    Natak    1992 .40  50
    Bundhi   1998 .50 100
    Rashnam  2001 .40  18
    Chetram  2002 .20 730
    Sankaram 2008 .70  44
    Chetty   2016 .45  28
    end
    I would have to Fisher's z-transform r and calculate its standard error like this:

    [CODE]
    generate z = .5 * ln((1 + r) / (1 - r))
    generate sez = sqrt(1/(n - 3))
    [\CODE]

    And then I can use -metan-:

    [CODE]
    metan z sez, label(namevar = study, yearvar = year)
    [\CODE]

    Which would give me results like this:

    Click image for larger version

Name:	Graph.png
Views:	1
Size:	39.0 KB
ID:	1456062

    However, the convention in meta-analyses seems to be to transform the Fisher's z effect sizes back into correlations for presentation purposes.

    My questions are:
    • Is there any option in -metan- that would do that for me?
    • Is there perhaps any other Stata command for meta-analyses that would do that for me? So far I have the impression that none of the available commands does
    Thanks for your consideration
    Go

  • #2
    I would recommend using Fisher's z transform or converting the correlations to Cohen's delta effect sizes.

    However, if you want to present the results as correlations in the forest plot, the standard error of r can be calculated as:

    SE of r = sqrt(1 - r^2)/n - 2

    Red Owl
    Stata/IC 15.1 (Windows 10, 64-bit)

    Comment


    • #3
      Thanks for your response, Red Owl. Does that mean you'd suggest not using any of the available Stata commands for meta-analysis of correlation coefficients?

      Cheers
      Go

      Comment


      • #4
        No, I was just offering the formula for estimating a standard error of a Pearson's correlation coefficient to use if you wanted to use r as your effect size in the forest plot.

        I personally would convert the r to d (Cohen's d) and use that in the forest plot, but you may have good reasons to use r instead. I wasn't offering an opinion about what is best, just what is possible.

        Red Owl
        Stata/IC 15.1, Windows 10 (64-bt)

        Comment


        • #5
          Hi Gobinda. Note that the r-to-Z transformation is really the inverse hyperbolic tangent, so if you meta-analyze the Zr values, the hyperbolic tangent can be used to transform the pooled estimate back to the original scale. The Stata functions are atanh() and tanh().

          Here's an example using your data.

          Code:
          clear *
          input str10 study year r n
          Natak    1992 .40  50
          Bundhi   1998 .50 100
          Rashnam  2001 .40  18
          Chetram  2002 .20 730
          Sankaram 2008 .70  44
          Chetty   2016 .45  28
          end
          generate z = atanh(r) // r-to-z = inverse hyperbolic tangent
          generate sez = sqrt(1/(n - 3))
          metan z sez, label(namevar = study, yearvar = year)
          display _newline ///
          " Pooled estimate of r = " tanh(r(ES)) _newline ///
          "Lower Limit of 95% CI = " tanh(r(ci_low)) _newline ///
          "Upper Limit of 95% CI = " tanh(r(ci_upp))
          Output from -metan- and -display-:

          Code:
          . metan z sez, label(namevar = study, yearvar = year)
          
                     Study     |     ES    [95% Conf. Interval]     % Weight
          ---------------------+---------------------------------------------------
          Natak (1992)         |  0.424       0.138     0.710          4.94
          Bundhi (1998)        |  0.549       0.350     0.748         10.19
          Rashnam (2001)       |  0.424      -0.082     0.930          1.58
          Chetram (2002)       |  0.203       0.130     0.275         76.37
          Sankaram (2008)      |  0.867       0.561     1.173          4.31
          Chetty (2016)        |  0.485       0.093     0.877          2.63
          ---------------------+---------------------------------------------------
          I-V pooled ES        |  0.288       0.225     0.352        100.00
          ---------------------+---------------------------------------------------
          
            Heterogeneity chi-squared =  27.78 (d.f. = 5) p = 0.000
            I-squared (variation in ES attributable to heterogeneity) =  82.0%
          
            Test of ES=0 : z=   8.90 p = 0.000
          
          . display _newline ///
          > " Pooled estimate of r = " tanh(r(ES)) _newline ///
          > "Lower Limit of 95% CI = " tanh(r(ci_low)) _newline ///
          > "Upper Limit of 95% CI = " tanh(r(ci_upp))
          
           Pooled estimate of r = .28071524
          Lower Limit of 95% CI = .22121715
          Upper Limit of 95% CI = .33813133
          HTH.
          --
          Bruce Weaver
          Email: [email protected]
          Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
          Version: Stata/MP 18.0 (Windows)

          Comment


          • #6
            Thanks both for helping me out here. What I learned from your comments and a bit more research is that I wanted to ultimately end up with this here, for which I needed the commands -admetan- and -forestplot-:

            Code:
            clear
            
            input str10 study year r n
            Natak    1992 .40  50
            Bundhi   1998 .50 100
            Rashnam  2001 .40  18
            Chetram  2002 .20 730
            Sankaram 2008 .70  44
            Chetty   2016 .45  28
            end
            
            generate z = atanh(r) // r-to-z = inverse hyperbolic tangent
            generate sez = sqrt(1/(n - 3))
            
            admetan z sez
            
            display _newline ///
            " Pooled estimate of r = " tanh(r(eff)) _newline ///
            "Lower Limit of 95% CI = " tanh(r(eff) - (1.96 * r(se_eff))) _newline ///
            "Upper Limit of 95% CI = " tanh(r(eff) + (1.96 * r(se_eff)))
            
            // Prepare data for -forestplot-
            generate _USE = 1
            
              // Generate CI's for r
            generate lb = tanh(_LCI)
            generate ub = tanh(_UCI)
            
              // Generate study labels for -forestplot-
            generate _LABELS = study + " (" + string(year, "%02.0f") + ")"
            
            label var n "Sample size"
            
              // Add effect size to data set
            local new = _N + 1
            set obs `new'
            replace _LABELS = "{bf:Overall}"                    if _n == _N
            replace r       = tanh(r(eff))                      if _LABELS == "{bf:Overall}"
            replace lb      = tanh(r(eff) - (1.96 * r(se_eff))) if _LABELS == "{bf:Overall}"
            replace ub      = tanh(r(eff) + (1.96 * r(se_eff))) if _LABELS == "{bf:Overall}"
            replace _USE    = 5                                 if _LABELS == "{bf:Overall}"
            
              // Forest plot
            forestplot r lb ub, nonull effect("Correlation") rcol(n) leftjustify nowt
            So that I could end up with this plot:

            Click image for larger version

Name:	Graph.png
Views:	1
Size:	54.4 KB
ID:	1456547


            Which kind of looks like what I was interested in (for future reference: This post here https://www.statalist.org/forums/for...34#post1375334 seems to contain some deep knowledge on how to format -forestplot-).

            Thanks again
            Go

            Comment


            • #7
              Gobinda Natak and Bruce Weaver

              I tested my approach calculating SE or r directly to compare the results to those produced by the r-to-z transform approach suggested by Bruce Weaver .

              I got similar (but not exactly the same) results with fixed effects meta-analysis. I noticed, however, that the heterogeneity is relatively high (heterogeneity chi-square (I-squared) = 81.8%), suggesting that a random effects approach is probably warranted.

              With the fixed effects approach, my overall mean r effect size is .296 [.236,.356], and the one produced with the r-to-z transform method is .288 [.225,.352].

              With the random effects approach my overall mean r effect size rises to .435 [.240,.631].
              Code:
              clear
              
              input str10 study year r n
              Natak    1992 .40  50
              Bundhi   1998 .50 100
              Rashnam  2001 .40  18
              Chetram  2002 .20 730
              Sankaram 2008 .70  44
              Chetty   2016 .45  28
              end
              
              * Estimate SE of r and format r and SEr
              gen SEr = sqrt((1 - r^2)/(n - 2))
              format r SEr %4.3f
              
              * Generate Study var and labels
              generate Study = study + " (" + string(year, "%02.0f") + ")"
              label var n "Sample size"
              
              * Fixed effects meta-analysis assuming homogeneity of effects
              metan r SEr, lcols(Study) rcols(SEr n) astext(85) xlabels(0(.25)1) name(forestfixd, replace)
              
              * Random effects meta-analysis assuming heterogeneity of effects
              metan r SEr, random lcols(Study) rcols(SEr n) astext(85) xlabels(0(.25)1) name(forestrand, replace)
              
              * Combined fixed and random effects forest plots
              graph combine forestfixd forestrand, ysize(3) xsize(6) name(ROcombined, replace)
              Click image for larger version

Name:	forestplots.png
Views:	1
Size:	59.5 KB
ID:	1456608



              Red Owl
              Stata/IC 15.1, Windows 10 (64-bit)

              Comment


              • #8
                I'm trying to use metan to do a meta analysis of pearson correlations with inverse variance weighting with some high sample heterogeneity. I input metan Z (fishers' z) and SE. How do I get metan to recognize n for sample size to do the weighting? If I use randomi or fixedi it doesn't seem to recognize the sample sizes?

                Comment


                • #9
                  Messmer George, I'm confused. How can you use inverse variance weighting and weighting by sample size at the same time? You have to pick one or the other, I think. Please clarify.
                  --
                  Bruce Weaver
                  Email: [email protected]
                  Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
                  Version: Stata/MP 18.0 (Windows)

                  Comment


                  • #10
                    Hi All,

                    I am doing a meta to look at the relationship between negative symptoms and functioning in psychosis youth. My initial instinct was to use the Pearson r reported in the studies, convert the correlation coefficient to the Fisher’s z scale, and then perform random effects meta-analyses on the transformed values as suggested in the forum and transform back and your advice worked great. But now I am second guessing myself and I am wondering since the negative scales are different (SANS, SIPS, etc) and the functioning scales are different (GAF, SOFAS, etc) would you instead opt to transform Pearson r to SMD? Any suggestions/ thoughts? And if you recommend this how would one go about doing this? Or does the difference in scales not matter when uses z transformations? Thanks so much, Dan

                    Comment


                    • #11
                      Originally posted by Daniel Devoe View Post
                      But now I am second guessing myself and I am wondering since the negative scales are different (SANS, SIPS, etc) and the functioning scales are different (GAF, SOFAS, etc) would you instead opt to transform Pearson r to SMD? Any suggestions/ thoughts? And if you recommend this how would one go about doing this? Or does the difference in scales not matter when uses z transformations? Thanks so much, Dan
                      Hi Daniel. I can't speak for others, but I don't know what you mean when you say "the negative scales are different (SANS, SIPS, etc) and the functioning scales are different (GAF, SOFAS, etc)". Different from what? Please provide more information about the scales and the correlations you are trying to pool. And what is SMD? Standardized mean difference, perhaps?

                      Thanks for clarifying.

                      Bruce
                      --
                      Bruce Weaver
                      Email: [email protected]
                      Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
                      Version: Stata/MP 18.0 (Windows)

                      Comment


                      • #12
                        Hi Bruce, different from each other, in that they measure very similar concepts either negative symptoms or functioning but do so on different scales. The correlation I am trying to pool is the correlation coefficient r reported in studies that report the correlation coefficient r between functioning and negative symptoms, regardless of the scales used.

                        For example, I have studies reporting this relationship between negative symptoms and functioning using a variety of scales such as the "coefficient r between functioning scale a and negative symptom scale b", "coefficient r between functioning scale b and negative symptom scale c", coefficient r between functioning scale z and negative symptom scale e" etc.

                        For SMD, in this case I was referring to Cohen's d.

                        Thanks so much, Dan


                        Dan Devoe (BA, MSc, PhD Candidate)
                        Dept. of Psychiatry | Cumming School of Medicine | University of Calgary
                        TRW Building | Mathison Centre for Mental Health Research & Education
                        email:[email protected]

                        Comment

                        Working...
                        X