Which test should i use on Stata ?

Raph Selenite

Join Date: Oct 2020
Posts: 65

Which test should i use on Stata ?

25 Nov 2022, 01:38

Hello,

I would like to know which test to use on Stata according to my configuration.

So I want to compare 3 ways to collect data
So I have 3 groups for which I collect a Y variable according to the methods A B C, and for each person I also collect the Y variable but according to the reference method (called y_base).

It looks like this with random data generated (in my database, i have something like 500 people in each groups)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 module float(y_base y)
1 "A" 6 2
2 "A" 7 3
3 "A" 6 4
4 "A" 4 4
5 "A" 6 2
6 "A" 4 2
7 "A" 4 5
8 "A" 3 2
9 "A" 6 2
10 "A" 4 6
11 "A" 7 3
12 "A" 5 1
13 "A" 3 3
14 "A" 5 5
15 "A" 1 5
16 "A" 6 4
17 "A" 4 4
18 "A" 3 3
19 "A" 3 5
20 "A" 4 5
21 "B" 4 5
22 "B" 5 3
23 "B" 2 3
24 "B" 3 5
25 "B" 1 5
26 "B" 4 5
27 "B" 1 8
28 "B" 4 6
29 "B" 7 3
30 "B" 8 5
31 "B" 2 5
32 "B" 8 4
33 "B" 7 4
34 "B" 8 1
35 "B" 4 5
36 "B" 6 4
37 "B" 5 7
38 "B" 5 2
39 "B" 5 6
40 "B" 5 5
41 "C" 6 3
42 "C" 8 4
43 "C" 6 3
44 "C" 5 4
45 "C" 6 4
46 "C" 6 4
47 "C" 6 4
48 "C" 3 2
49 "C" 6 3
50 "C" 7 2
51 "C" 5 2
52 "C" 7 4
53 "C" 7 4
54 "C" 4 2
55 "C" 5 2
56 "C" 5 5
57 "C" 6 2
58 "C" 3 5
59 "C" 4 1
60 "C" 5 3
end

Which test on Stata should i use to check which one is closer to my y_base ?

Thanks you

Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

25 Nov 2022, 05:22

Raph:
I'd go:

Code:

. encode module, g(num_module)
. ttest y_base == y if num_module==1, unpaired unequal

Two-sample t test with unequal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
  y_base |      20        4.55    .3515005    1.571958    3.814301    5.285699
       y |      20         3.5    .3120391    1.395481    2.846895    4.153105
---------+--------------------------------------------------------------------
Combined |      40       4.025    .2467416    1.560531    3.525918    4.524082
---------+--------------------------------------------------------------------
    diff |                1.05    .4700224                .0980503     2.00195
------------------------------------------------------------------------------
    diff = mean(y_base) - mean(y)                                 t =   2.2339
H0: diff = 0                     Satterthwaite's degrees of freedom =  37.4736

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.9842         Pr(|T| > |t|) = 0.0315          Pr(T > t) = 0.0158

. ttest y_base == y if num_module==2, unpaired unequal

Two-sample t test with unequal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
  y_base |      20         4.7    .4925765    2.202869    3.669026    5.730974
       y |      20        4.55    .3661679    1.637553    3.783602    5.316398
---------+--------------------------------------------------------------------
Combined |      40       4.625    .3031618    1.917363    4.011797    5.238203
---------+--------------------------------------------------------------------
    diff |                 .15    .6137675               -1.095904    1.395904
------------------------------------------------------------------------------
    diff = mean(y_base) - mean(y)                                 t =   0.2444
H0: diff = 0                     Satterthwaite's degrees of freedom =  35.0866

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.5958         Pr(|T| > |t|) = 0.8084          Pr(T > t) = 0.4042

. ttest y_base == y if num_module==3, unpaired unequal

Two-sample t test with unequal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
  y_base |      20         5.5    .2946898    1.317893    4.883207    6.116793
       y |      20        3.15    .2541757    1.136708    2.618004    3.681996
---------+--------------------------------------------------------------------
Combined |      40       4.325    .2688711     1.70049    3.781157    4.868843
---------+--------------------------------------------------------------------
    diff |                2.35    .3891624                1.561624    3.138376
------------------------------------------------------------------------------
    diff = mean(y_base) - mean(y)                                 t =   6.0386
H0: diff = 0                     Satterthwaite's degrees of freedom =  37.1981

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000



.

You may want to consider adjusting for multiple comparision (y_base vs, "A"; y_base vs "B"; y_base vs "C"

Kind regards,
Carlo
(Stata 19.0)

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

26 Nov 2022, 10:43

The problem seems one of measurement rather than testing, i.e. which "module" is most nearly in agreement with the base measurements?

Two tools are a plot of observed versus fitted (or vice versa if you prefer) with a reference line of equality and concordance correlation, which measures agreement rather than linearity. Some people call this a calibration plot, and there are yet other names.

For a brief survey of the territory you may have access to https://www.sciencedirect.com/scienc...69555X05003740

Concordance correlation is not a difficult calculation but concord from the Stata Journal may be convenient.

Code:

. search concord, sj historical

Search of official help files, FAQs, Examples, and Stata Journals

SJ-10-4 st0015_6  . . . . . . . . . . . . . . . .  Software update for concord
        (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
        Q4/10   SJ 10(4):691
        update explicitly supporting plot() and addplot() and
        allowing systematic control of the reference line

SJ-8-4  st0015_5  . . . . . . . . . . . . . . . .  Software update for concord
        (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
        Q4/08   SJ 8(4):594
        dialog box modified to correct visual layout problems

SJ-7-3  st0015_4  . . . . . . . . . . . . . . . .  Software update for concord
        (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
        Q3/07   SJ 7(3):444
        now compatible with Stata 10; help file updated

SJ-6-2  st0015_3  . . . . . . . . . . . . . . . .  Software update for concord
        (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
        Q2/06   SJ 6(2):284
        updated for compatibility with Stata 9 by() option

SJ-5-3  st0015_2  . . . . . . . . . . . . . . . .  Software update for concord
        (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
        Q3/05   SJ 5(3):470
        minor bug fix for concord

SJ-4-4  st0015_1  . . . . . . . . . . . . . . . .  Software update for concord
        (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
        Q4/04   SJ 4(4):491
        rewritten to provide dialog, Stata 8 graphs, and two
        new tests

SJ-2-2  st0015  . . . . . .  A note on the concordance correlation coefficient
        (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
        Q2/02   SJ 2(2):183--189
        correction based on an erratum for Lin's concordance
        correlation coefficient, and assessment of the impact of
        the change

STB-58  sg84.3  . . . . Concordance correlation coefficient: minor corrections
        (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
        11/00   p.9; STB Reprints Vol 10, p.137
        small bug fixes affecting user control through the connect(),
        symbol(), and pen() options

STB-54  sg84.2  . . .  Concordance correlation coefficient: update for Stata 6
        (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
        3/00    pp.25--26; STB Reprints Vol 9, pp.169--170
        updated to version 6 with corrections and new option for
        saving the standard normal plot

STB-45  sg84.1  . . . . . . . . Concordance correlation coefficient, revisited
        (help concord if installed) . . . . . . . .  T. Steichen and N. J. Cox
        9/98    pp.21--23; STB Reprints Vol 8, pp.143--145
        improvements to the concord command

STB-43  sg84  . . . . . . . . . . . . . .  Concordance correlation coefficient
        (help concord if installed) . . . . . . . .  T. Steichen and N. J. Cox
        5/98    pp.35--39; STB Reprints Vol 8, pp.137--143
        computes Lin's (1989) concordance correlation coefficient for
        agreement on a continuous measure obtained by two persons or
        methods

The results for fake data are naturally of inherent interest but here is some technique.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 module float(y_base y)
1 "A" 6 2
2 "A" 7 3
3 "A" 6 4
4 "A" 4 4
5 "A" 6 2
6 "A" 4 2
7 "A" 4 5
8 "A" 3 2
9 "A" 6 2
10 "A" 4 6
11 "A" 7 3
12 "A" 5 1
13 "A" 3 3
14 "A" 5 5
15 "A" 1 5
16 "A" 6 4
17 "A" 4 4
18 "A" 3 3
19 "A" 3 5
20 "A" 4 5
21 "B" 4 5
22 "B" 5 3
23 "B" 2 3
24 "B" 3 5
25 "B" 1 5
26 "B" 4 5
27 "B" 1 8
28 "B" 4 6
29 "B" 7 3
30 "B" 8 5
31 "B" 2 5
32 "B" 8 4
33 "B" 7 4
34 "B" 8 1
35 "B" 4 5
36 "B" 6 4
37 "B" 5 7
38 "B" 5 2
39 "B" 5 6
40 "B" 5 5
41 "C" 6 3
42 "C" 8 4
43 "C" 6 3
44 "C" 5 4
45 "C" 6 4
46 "C" 6 4
47 "C" 6 4
48 "C" 3 2
49 "C" 6 3
50 "C" 7 2
51 "C" 5 2
52 "C" 7 4
53 "C" 7 4
54 "C" 4 2
55 "C" 5 2
56 "C" 5 5
57 "C" 6 2
58 "C" 3 5
59 "C" 4 1
60 "C" 5 3
end

scatter y y_base, ms(Oh) l1title("y", orient(horiz))  || line y_base y_base, sort by(module, note("") legend(off) row(1) ) aspect(1) 

foreach module in A B C { 
    di "{title:Module `module'}"
    concord y y_base if module == "`module'"
    di _n 
}

Module A

Concordance correlation coefficient (Lin, 1989, 2000):

 rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
---------------------------------------------------------------
-0.274     0.179      20    -0.624  0.077    0.126   asymptotic
                            -0.578  0.098    0.146  z-transform

Pearson's r = -0.348  Pr(r = 0) = 0.133  C_b = rho_c/r =  0.786
Reduced major axis:   Slope =    -0.888   Intercept =     7.539

Difference = y - y_base

        Difference                 95% Limits Of Agreement
   Average     Std Dev.             (Bland & Altman, 1986)
---------------------------------------------------------------
    -1.050       2.438                 -5.829      3.729

Correlation between difference and mean = -0.126

Bradley-Blackwood F = 1.931 (P = 0.17383)


Module B

Concordance correlation coefficient (Lin, 1989, 2000):

 rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
---------------------------------------------------------------
-0.442     0.180      20    -0.793 -0.090    0.014   asymptotic
                            -0.722 -0.037    0.034  z-transform

Pearson's r = -0.463  Pr(r = 0) = 0.040  C_b = rho_c/r =  0.955
Reduced major axis:   Slope =    -0.743   Intercept =     8.044

Difference = y - y_base

        Difference                 95% Limits Of Agreement
   Average     Std Dev.             (Bland & Altman, 1986)
---------------------------------------------------------------
    -0.150       3.297                 -6.612      6.312

Correlation between difference and mean = -0.321

Bradley-Blackwood F = 1.059 (P = 0.36756)


Module C

Concordance correlation coefficient (Lin, 1989, 2000):

 rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
---------------------------------------------------------------
 0.077     0.081      20    -0.081  0.236    0.338   asymptotic
                            -0.082  0.233    0.340  z-transform

Pearson's r =  0.228  Pr(r = 0) = 0.333  C_b = rho_c/r =  0.339
Reduced major axis:   Slope =     0.863   Intercept =    -1.594

Difference = y - y_base

        Difference                 95% Limits Of Agreement
   Average     Std Dev.             (Bland & Altman, 1986)
---------------------------------------------------------------
    -2.350       1.531                 -5.351      0.651

Correlation between difference and mean = -0.151

Bradley-Blackwood F = 23.041 (P = 0.00001)

Click image for larger version

Name: concord.png
Views: 1
Size: 37.3 KB
ID: 1691014

Comment

Raph Selenite

Join Date: Oct 2020

Posts: 65
#4

28 Nov 2022, 08:37

Thanks Nick for your response.

I'm sorry but, how to interpret the results of "concord"?

Raph
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#5

28 Nov 2022, 08:52

You have several references there!
Comment
Raph Selenite

Join Date: Oct 2020

Posts: 65
#6

05 Dec 2022, 00:49

Thanks Nick, that helped me a lot
I just have a question, is there any problem to work with many 0 values?
In my database, in any module I have something like 50% of people with 0 in y or y_base
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#7

05 Dec 2022, 03:57

As implied by #3 and explained in much more detail in the references. concordance correlation is about assessing closeness to full agreement (how far are data described by y = x?) rather than closeness to full linearity of relationship (how far are data described by y = a + bx?). The second is naturally the question answered by correlation.

So, zeros are not a problem as such.

The issues for you are to do with subject-matter. At one extreme, zeros are just a possible value in the range. At another extreme, zeros mark a qualitative difference too.

Here is a silly example, and necessarily I have no idea how close it is to your set-up. Suppose the question was about different methods of measuring amount of smoking (tobacco). Then many people don't smoke at all and different methods might well agree that such people are recorded as zeros. You might decide that smoking among non-smokers was --- or alternatively was not -- part of what you are trying to assess.

The issue is age-old. A celebrated example in the 19th century was forecasting tornados, which mostly didn't happen but occasionally did. It was quickly pointed that always predicting "no tornado" is quite a successful method because most days didn't experience a tornado even in areas that did experience them sometimes. However, the matter doesn't end there.

Last edited by Nick Cox; 05 Dec 2022, 04:03.
Comment

Inaamul Haq

Join Date: Feb 2019
Posts: 57

31 Dec 2022, 19:22

I have a related query.
I have used concord to assess concordance.
So I have a variable LMP (which is the true gestational age). I am using six more methods to measure LMP - y, HC, AC, FL, COMP, and BPD.
For each of these six methods, I calculate concordance using concord with the loa option.
How can I compare the concordance of different tests: specifically, I'm interested in comparing the concordance of y vs. HC, y vs. AC, y vs. FL, y vs. COMP, and y vs. BPD.
Here is the data.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id LMP y HC AC FL COMP BPD)
 1 153 149.87616 155 153 153 154 155
 2 167  165.8013 183 180 187 175 183
 3 258 244.56856 224 226 253 224 230
 4 160  160.5986 161 174 157 164 167
 5 165  160.5986 178 170 179 174 175
 6 220  223.8171 224 219 215 219 221
 7 167  160.5986 168 168 168 167 168
 8 275 263.96915 256 257 250 255 256
 9 135  138.7309 123 131 127 127 132
10 220  223.8171 219 217 225 219 217
11 226  227.5399 209 217 216 213 213
12 196 212.01436 193 216 205 203 201
13 237 219.98854 215 239 226 229 238
14 258  250.6401 255 251 231 248 255
15 125 127.16283 116 115 112 115 124
16 260 261.51474 251 232 250 250 265
17 227 231.15704 167 215 228 226 233
18 272 263.96915 238 267 261 259 268
19 252  250.6401 237 244 237 235 223
20 275 258.95462 249 279 252 254 246
end

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

03 Jan 2023, 03:58

On #8 I just note that I have tended to regard concordance correlation as a measurement procedure. not a testing procedure. Here is the result of getting an overall concordance correlation matrix. The order of variables has been tweaked after looking at initial results.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id LMP y HC AC FL COMP BPD)
 1 153 149.87616 155 153 153 154 155
 2 167  165.8013 183 180 187 175 183
 3 258 244.56856 224 226 253 224 230
 4 160  160.5986 161 174 157 164 167
 5 165  160.5986 178 170 179 174 175
 6 220  223.8171 224 219 215 219 221
 7 167  160.5986 168 168 168 167 168
 8 275 263.96915 256 257 250 255 256
 9 135  138.7309 123 131 127 127 132
10 220  223.8171 219 217 225 219 217
11 226  227.5399 209 217 216 213 213
12 196 212.01436 193 216 205 203 201
13 237 219.98854 215 239 226 229 238
14 258  250.6401 255 251 231 248 255
15 125 127.16283 116 115 112 115 124
16 260 261.51474 251 232 250 250 265
17 227 231.15704 167 215 228 226 233
18 272 263.96915 238 267 261 259 268
19 252  250.6401 237 244 237 235 223
20 275 258.95462 249 279 252 254 246
end

matrix concord = J(7,7,.)

tokenize LMP y COMP AC FL  BPD HC 

forval i = 1/7 { 
    forval j = 1/7 { 
        quietly concord ``i'' ``j''
        matrix concord[`i', `j']  = r(rho_c)
        if `i' != `j' matrix concord[`j', `i']  = r(rho_c)
    } 
}

matrix rownames concord = LMP y COMP AC FL BPD HC
matrix colnames concord = LMP y COMP AC FL BPD HC 

matrix li concord , format(%4.3f)

symmetric concord[7,7]
        LMP      y   COMP     AC     FL    BPD     HC
 LMP  1.000
   y  0.985  1.000
COMP  0.961  0.974  1.000
  AC  0.960  0.959  0.978  1.000
  FL  0.958  0.967  0.980  0.957  1.000
 BPD  0.958  0.966  0.987  0.955  0.969  1.000
  HC  0.897  0.904  0.938  0.920  0.915  0.910  1.000

Although plotting difference versus mean is a good idea, so is a plain scatter plot matrix.

Announcement

Which test should i use on Stata ?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment