Interpreting mean difference and concordance correlation

Guest

Interpreting mean difference and concordance correlation

04 Sep 2018, 04:15

Hello,

I am looking at the agreement between some methods of measuring the same thing in one population and I attach here two examples. The methods are percentages of disease risk in the future. I attach here two examples and wondering how to interpret this. I see from the rho there is poor agreement (this was what I was expecting). But I am unclear if there are other parts of the produced analysis I need to be looking at. I am also unsure how to interpret the mean differences - what is considered poor or good agreement? Do the mean differences and limits of agreement show good agreement and contradict the condordance correlation? Do I need to bootstrap?

Thank you

Code:

. concord risk_logit_1 risk_logit_3

Concordance correlation coefficient (Lin, 1989, 2000):

 rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
---------------------------------------------------------------
 0.503     0.085      25     0.336  0.669    0.000   asymptotic
                             0.318  0.650    0.000  z-transform

Pearson's r =  0.901  Pr(r = 0) = 0.000  C_b = rho_c/r =  0.558
Reduced major axis:   Slope =     1.255   Intercept =     2.541

Difference = risk_logit_1 - risk_logit_3

        Difference                 95% Limits Of Agreement
   Average     Std Dev.             (Bland & Altman, 1986)
---------------------------------------------------------------
     1.460       0.601                  0.283      2.637

Correlation between difference and mean = 0.468

Bradley-Blackwood F = 93.869 (P = 0.00000)

Code:

. concord risk_logit_3 risk_logit_4

Concordance correlation coefficient (Lin, 1989, 2000):

 rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
---------------------------------------------------------------
 0.879     0.038      27     0.805  0.953    0.000   asymptotic
                             0.781  0.935    0.000  z-transform

Pearson's r =  0.959  Pr(r = 0) = 0.000  C_b = rho_c/r =  0.917
Reduced major axis:   Slope =     1.093   Intercept =    -0.116

Difference = risk_logit_3 - risk_logit_4

        Difference                 95% Limits Of Agreement
   Average     Std Dev.             (Bland & Altman, 1986)
---------------------------------------------------------------
    -0.482       0.357                 -1.181      0.217

Correlation between difference and mean = 0.298

Bradley-Blackwood F = 27.291 (P = 0.00000)

Click image for larger version

Name: Graph1.png
Views: 1
Size: 12.6 KB
ID: 1460789

Click image for larger version

Name: Graph2.png
Views: 1
Size: 12.9 KB
ID: 1460790

Last edited by sladmin; Today, 12:06. Reason: anonymize original poster

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

04 Sep 2018, 08:57

So, I'm inferring that what you are comparing here is the agreement among several different logistic regression models of risk of a certain disease or condition.

If you had a strong correlation between the measures, then you might want to also focus on the mean difference. Let's say we had two good kitchen scales, but one of them was not correctly tared. The correlation between their weights of the same items would be quite high, but the mean difference would pick up the tare problem--the mean difference estimates the fixed bias. But when the correlation is poor in the first place, the mean difference loses its importance. For example, I could devise a new logistic model for disease risk: it contains only a constant and the constant is calculated so that the predicted risk from that model (for every person, no matter what you know about them) is precisely the mean risk in the population generated by the other risk model you already had. The mean difference between these measures would be zero: and it doesn't get any better than that. But clearly this second model is pretty useless. Its correlation with actual risk or any other model of risk would be zero. And it does nothing to distinguish the risk of one person from another. So mean difference really only matters when the two measures are fairly well correlated.

Assuming that among your measures there are some that really do show high correlation, so the mean difference becomes important, there is another issue here. You are using the logit of risk rather than the risk itself here. The logit is a highly non-linear transformation, and while it has many desirable properties, it also raises some problems. For example, a difference of, say 2 on the logit scale means a great deal of difference if the "baseline" risk is, say .5: in that case the difference of 2 on the logit scale corresponds to a difference between risk .5 and risk .88, which is huge. But if the baseline risk is .95, that same difference of 2 on the logit scale corresponds to a difference between risk .95 and .99, which is modest at best. And if the baseline risk were .99, adding 2 on the logit scale only brings you up to 0.999--which in most contexts would be a negligible difference. So I think it is very difficult to say anything general about a difference in logit(risk), without saying what risk level you are starting from.
Comment
Guest
#3

04 Sep 2018, 09:39

Thanks so much for taking the time to reply to me I really appreciate it. So I began with a percentage for each participant, which was their risk for developing a disease within a given time. I was then informed it would be better to consider percentage risks on a logit scale, partly because of how those quantities behave and partly because of their substantive interpretation. I see what you are saying, moving forward do I need to repeat these analyses using the original percentages / converting the percentages to proportions?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#4

04 Sep 2018, 11:01

I think that for certain purposes the use of the logit-transformed risks is better. But I don't think comparing means is among them. And, I'm ambivalent about using the logits for the correlational analysis: the logit really exaggerates the differences among very close probabilities when they are close to 0 or 1. This will tend to degrade the correlations compared to what you would see with a direct correlation of the risks themselves. Whether that is a bug or a feature depends, I think, on your specific goals and the context of how your results might be used.
Comment

Guest

05 Sep 2018, 05:43

Thank you. If I used the original percentages and re-run the analyses it produces these results. Do you feel this is easier to interpret in terms of the correlational analysis and comparing means?

Code:

concord measure1 measure3

Concordance correlation coefficient (Lin, 1989, 2000):

 rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
---------------------------------------------------------------
 0.224     0.051      27     0.123  0.324    0.000   asymptotic
                             0.121  0.321    0.000  z-transform

Pearson's r =  0.841  Pr(r = 0) = 0.000  C_b = rho_c/r =  0.266
Reduced major axis:   Slope =     4.350   Intercept =     0.212

Difference = measure1 - measure3

        Difference                 95% Limits Of Agreement
   Average     Std Dev.             (Bland & Altman, 1986)
---------------------------------------------------------------
     7.236       7.322                 -7.115     21.588

Correlation between difference and mean = 0.967

Bradley-Blackwood F = 376.729 (P = 0.00000)

Code:

 concord measure3 measure4

Concordance correlation coefficient (Lin, 1989, 2000):

 rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
---------------------------------------------------------------
 0.767     0.071      29     0.627  0.907    0.000   asymptotic
                             0.587  0.875    0.000  z-transform

Pearson's r =  0.845  Pr(r = 0) = 0.000  C_b = rho_c/r =  0.908
Reduced major axis:   Slope =     0.963   Intercept =    -0.815

Difference = measure3 - measure4

        Difference                 95% Limits Of Agreement
   Average     Std Dev.             (Bland & Altman, 1986)
---------------------------------------------------------------
    -0.920       1.168                 -3.209      1.368

Correlation between difference and mean = -0.070

Bradley-Blackwood F = 8.797 (P = 0.00114)

Attached Files

Last edited by sladmin; Today, 12:06. Reason: anonymize original poster

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#6

05 Sep 2018, 11:44

Yes and no. The earlier results were presented as logit-transforms of risk. So when you removed the logit-transform, I was expecting to see probabilities, which necessarily range between 0 and 1. But here I am seeing numbers that are on a different scale altogether. Did you multiply by 100 to convert probabilities to percents?

Assuming that is what happened, then I think these results are clear and easy to work with. Measures 1 and 3 clearly have a pretty large difference on average. But you can also see from the graph that when the measures predict a value close to zero, they tend to agree, whereas when they predict large values they tend to disagree greatly. By contrast, measures 3 and 4 have a pretty small average difference. And the graph shows no obvious relationship between the predicted values themselves and the extent to which they disagree. So the disagreement appears to be more random and less systematic. The correlation between the two measures is also 0.767, which is much more respectable than you saw with the logit-transformed values. I think that the correlation of the logit-transforms was being distorted by a small number of outliers, outliers whose impact on tcorrelation the logit-transform exaggerates.
Comment

Guest

06 Sep 2018, 04:51

Oh yes sorry I used percents for this analysis. Would you prefer me to post probabilities? I would have thought the correlation between 3 and 4 would have been higher (as I need >0.9 for "good agreement"). Here is an example of my data for 3 and 4 (data in percentages).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(measure3 measure4)
 .8863378 1.4922162
1.1049236  1.942504
        .  3.949243
2.7212884  6.405684
 .1496917 .18035504
 .5019374 1.6671507
  .754011 1.2927463
4.1563263  4.874775
  6.19238  6.048964
.38537505   .802256
 .7400888 1.5751708
 4.738522 4.6955466
.22875555  4.372667
 8.531375  7.435046
 .2420856  .6580936
.22165443  .3152621
 3.139279  4.037205
1.1763887 2.2554524
 4.188088  4.721145
 1.971524  3.730975
2.0938213  3.327607
1.1606071 2.1429203
.08769826  .1733915
.31622165  .6414833
2.1718347  4.859337
 .3661532   .354345
.29062825   .966186
  2.47116  5.645855
 3.129016  3.417343
 2.829525 3.7610774
end

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#8

06 Sep 2018, 09:03

Of course, it'll be the same whether you use probabilities or percents. I just wasn't expecting to see percents, so I wanted to verify that we were still working with the same data and that you hadn't done any other transformations.

All I can say is that .9 is a pretty high bar. Not knowing what these measures are, what they predicting the risk of, nor what your research goals are, I can't really comment whether that is too stringent a criterion. I will say this much: over my career I have grown ever more skeptical of the value of threshold criteria for anything that is inherently continuous. Of course, sometimes one has to make a decision to do one thing or a different things depending on the results of some observation, and that imposes a threshold. But such thresholds are properly defined by an actual decision analysis that looks at maximizing expected utility. My experience is that thresholds in use, especially in medicine, often lack any such underlying rationale and are often just round numbers postulated by "experts" working in a data-free environment, and have been arrived at without any explicit consideration of utility at all.

Again, I do not know your goals here. But I imagine that if I were in your situation I might well disregard any arbitrary criteria for what is "good agreement" and simply report the results that I find. Deciding whether that is "good enough" would then await a separate analysis that explicitly and quantitatively accounts for the consequences of whatever decisions hang on that.
Comment

Announcement

Interpreting mean difference and concordance correlation

Comment

Comment

Comment

Comment

Comment

Comment

Comment