Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting mean difference and concordance correlation

    Hello,

    I am looking at the agreement between some methods of measuring the same thing in one population and I attach here two examples. The methods are percentages of disease risk in the future. I attach here two examples and wondering how to interpret this. I see from the rho there is poor agreement (this was what I was expecting). But I am unclear if there are other parts of the produced analysis I need to be looking at. I am also unsure how to interpret the mean differences - what is considered poor or good agreement? Do the mean differences and limits of agreement show good agreement and contradict the condordance correlation? Do I need to bootstrap?

    Thank you

    Code:
    . concord risk_logit_1 risk_logit_3
    
    Concordance correlation coefficient (Lin, 1989, 2000):
    
     rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
    ---------------------------------------------------------------
     0.503     0.085      25     0.336  0.669    0.000   asymptotic
                                 0.318  0.650    0.000  z-transform
    
    Pearson's r =  0.901  Pr(r = 0) = 0.000  C_b = rho_c/r =  0.558
    Reduced major axis:   Slope =     1.255   Intercept =     2.541
    
    Difference = risk_logit_1 - risk_logit_3
    
            Difference                 95% Limits Of Agreement
       Average     Std Dev.             (Bland & Altman, 1986)
    ---------------------------------------------------------------
         1.460       0.601                  0.283      2.637
    
    Correlation between difference and mean = 0.468
    
    Bradley-Blackwood F = 93.869 (P = 0.00000)
    Code:
    . concord risk_logit_3 risk_logit_4
    
    Concordance correlation coefficient (Lin, 1989, 2000):
    
     rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
    ---------------------------------------------------------------
     0.879     0.038      27     0.805  0.953    0.000   asymptotic
                                 0.781  0.935    0.000  z-transform
    
    Pearson's r =  0.959  Pr(r = 0) = 0.000  C_b = rho_c/r =  0.917
    Reduced major axis:   Slope =     1.093   Intercept =    -0.116
    
    Difference = risk_logit_3 - risk_logit_4
    
            Difference                 95% Limits Of Agreement
       Average     Std Dev.             (Bland & Altman, 1986)
    ---------------------------------------------------------------
        -0.482       0.357                 -1.181      0.217
    
    Correlation between difference and mean = 0.298
    
    Bradley-Blackwood F = 27.291 (P = 0.00000)
    Click image for larger version

Name:	Graph1.png
Views:	1
Size:	12.6 KB
ID:	1460789

    Click image for larger version

Name:	Graph2.png
Views:	1
Size:	12.9 KB
ID:	1460790

    Last edited by Joe Tuckles; 04 Sep 2018, 04:24.

  • #2
    So, I'm inferring that what you are comparing here is the agreement among several different logistic regression models of risk of a certain disease or condition.

    If you had a strong correlation between the measures, then you might want to also focus on the mean difference. Let's say we had two good kitchen scales, but one of them was not correctly tared. The correlation between their weights of the same items would be quite high, but the mean difference would pick up the tare problem--the mean difference estimates the fixed bias. But when the correlation is poor in the first place, the mean difference loses its importance. For example, I could devise a new logistic model for disease risk: it contains only a constant and the constant is calculated so that the predicted risk from that model (for every person, no matter what you know about them) is precisely the mean risk in the population generated by the other risk model you already had. The mean difference between these measures would be zero: and it doesn't get any better than that. But clearly this second model is pretty useless. Its correlation with actual risk or any other model of risk would be zero. And it does nothing to distinguish the risk of one person from another. So mean difference really only matters when the two measures are fairly well correlated.

    Assuming that among your measures there are some that really do show high correlation, so the mean difference becomes important, there is another issue here. You are using the logit of risk rather than the risk itself here. The logit is a highly non-linear transformation, and while it has many desirable properties, it also raises some problems. For example, a difference of, say 2 on the logit scale means a great deal of difference if the "baseline" risk is, say .5: in that case the difference of 2 on the logit scale corresponds to a difference between risk .5 and risk .88, which is huge. But if the baseline risk is .95, that same difference of 2 on the logit scale corresponds to a difference between risk .95 and .99, which is modest at best. And if the baseline risk were .99, adding 2 on the logit scale only brings you up to 0.999--which in most contexts would be a negligible difference. So I think it is very difficult to say anything general about a difference in logit(risk), without saying what risk level you are starting from.

    Comment


    • #3
      Thanks so much for taking the time to reply to me I really appreciate it. So I began with a percentage for each participant, which was their risk for developing a disease within a given time. I was then informed it would be better to consider percentage risks on a logit scale, partly because of how those quantities behave and partly because of their substantive interpretation. I see what you are saying, moving forward do I need to repeat these analyses using the original percentages / converting the percentages to proportions?

      Comment


      • #4
        I think that for certain purposes the use of the logit-transformed risks is better. But I don't think comparing means is among them. And, I'm ambivalent about using the logits for the correlational analysis: the logit really exaggerates the differences among very close probabilities when they are close to 0 or 1. This will tend to degrade the correlations compared to what you would see with a direct correlation of the risks themselves. Whether that is a bug or a feature depends, I think, on your specific goals and the context of how your results might be used.

        Comment


        • #5
          Thank you. If I used the original percentages and re-run the analyses it produces these results. Do you feel this is easier to interpret in terms of the correlational analysis and comparing means?

          Code:
          concord measure1 measure3
          
          Concordance correlation coefficient (Lin, 1989, 2000):
          
           rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
          ---------------------------------------------------------------
           0.224     0.051      27     0.123  0.324    0.000   asymptotic
                                       0.121  0.321    0.000  z-transform
          
          Pearson's r =  0.841  Pr(r = 0) = 0.000  C_b = rho_c/r =  0.266
          Reduced major axis:   Slope =     4.350   Intercept =     0.212
          
          Difference = measure1 - measure3
          
                  Difference                 95% Limits Of Agreement
             Average     Std Dev.             (Bland & Altman, 1986)
          ---------------------------------------------------------------
               7.236       7.322                 -7.115     21.588
          
          Correlation between difference and mean = 0.967
          
          Bradley-Blackwood F = 376.729 (P = 0.00000)
          Code:
           concord measure3 measure4
          
          Concordance correlation coefficient (Lin, 1989, 2000):
          
           rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
          ---------------------------------------------------------------
           0.767     0.071      29     0.627  0.907    0.000   asymptotic
                                       0.587  0.875    0.000  z-transform
          
          Pearson's r =  0.845  Pr(r = 0) = 0.000  C_b = rho_c/r =  0.908
          Reduced major axis:   Slope =     0.963   Intercept =    -0.815
          
          Difference = measure3 - measure4
          
                  Difference                 95% Limits Of Agreement
             Average     Std Dev.             (Bland & Altman, 1986)
          ---------------------------------------------------------------
              -0.920       1.168                 -3.209      1.368
          
          Correlation between difference and mean = -0.070
          
          Bradley-Blackwood F = 8.797 (P = 0.00114)
          Attached Files
          Last edited by Joe Tuckles; 05 Sep 2018, 05:57.

          Comment


          • #6
            Yes and no. The earlier results were presented as logit-transforms of risk. So when you removed the logit-transform, I was expecting to see probabilities, which necessarily range between 0 and 1. But here I am seeing numbers that are on a different scale altogether. Did you multiply by 100 to convert probabilities to percents?

            Assuming that is what happened, then I think these results are clear and easy to work with. Measures 1 and 3 clearly have a pretty large difference on average. But you can also see from the graph that when the measures predict a value close to zero, they tend to agree, whereas when they predict large values they tend to disagree greatly. By contrast, measures 3 and 4 have a pretty small average difference. And the graph shows no obvious relationship between the predicted values themselves and the extent to which they disagree. So the disagreement appears to be more random and less systematic. The correlation between the two measures is also 0.767, which is much more respectable than you saw with the logit-transformed values. I think that the correlation of the logit-transforms was being distorted by a small number of outliers, outliers whose impact on tcorrelation the logit-transform exaggerates.

            Comment


            • #7
              Oh yes sorry I used percents for this analysis. Would you prefer me to post probabilities? I would have thought the correlation between 3 and 4 would have been higher (as I need >0.9 for "good agreement"). Here is an example of my data for 3 and 4 (data in percentages).

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float(measure3 measure4)
               .8863378 1.4922162
              1.1049236  1.942504
                      .  3.949243
              2.7212884  6.405684
               .1496917 .18035504
               .5019374 1.6671507
                .754011 1.2927463
              4.1563263  4.874775
                6.19238  6.048964
              .38537505   .802256
               .7400888 1.5751708
               4.738522 4.6955466
              .22875555  4.372667
               8.531375  7.435046
               .2420856  .6580936
              .22165443  .3152621
               3.139279  4.037205
              1.1763887 2.2554524
               4.188088  4.721145
               1.971524  3.730975
              2.0938213  3.327607
              1.1606071 2.1429203
              .08769826  .1733915
              .31622165  .6414833
              2.1718347  4.859337
               .3661532   .354345
              .29062825   .966186
                2.47116  5.645855
               3.129016  3.417343
               2.829525 3.7610774
              end

              Comment


              • #8
                Of course, it'll be the same whether you use probabilities or percents. I just wasn't expecting to see percents, so I wanted to verify that we were still working with the same data and that you hadn't done any other transformations.

                All I can say is that .9 is a pretty high bar. Not knowing what these measures are, what they predicting the risk of, nor what your research goals are, I can't really comment whether that is too stringent a criterion. I will say this much: over my career I have grown ever more skeptical of the value of threshold criteria for anything that is inherently continuous. Of course, sometimes one has to make a decision to do one thing or a different things depending on the results of some observation, and that imposes a threshold. But such thresholds are properly defined by an actual decision analysis that looks at maximizing expected utility. My experience is that thresholds in use, especially in medicine, often lack any such underlying rationale and are often just round numbers postulated by "experts" working in a data-free environment, and have been arrived at without any explicit consideration of utility at all.

                Again, I do not know your goals here. But I imagine that if I were in your situation I might well disregard any arbitrary criteria for what is "good agreement" and simply report the results that I find. Deciding whether that is "good enough" would then await a separate analysis that explicitly and quantitatively accounts for the consequences of whatever decisions hang on that.

                Comment

                Working...
                X