Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which test should i use on Stata ?

    Hello,


    I would like to know which test to use on Stata according to my configuration.

    So I want to compare 3 ways to collect data
    So I have 3 groups for which I collect a Y variable according to the methods A B C, and for each person I also collect the Y variable but according to the reference method (called y_base).

    It looks like this with random data generated (in my database, i have something like 500 people in each groups)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str1 module float(y_base y)
    1 "A" 6 2
    2 "A" 7 3
    3 "A" 6 4
    4 "A" 4 4
    5 "A" 6 2
    6 "A" 4 2
    7 "A" 4 5
    8 "A" 3 2
    9 "A" 6 2
    10 "A" 4 6
    11 "A" 7 3
    12 "A" 5 1
    13 "A" 3 3
    14 "A" 5 5
    15 "A" 1 5
    16 "A" 6 4
    17 "A" 4 4
    18 "A" 3 3
    19 "A" 3 5
    20 "A" 4 5
    21 "B" 4 5
    22 "B" 5 3
    23 "B" 2 3
    24 "B" 3 5
    25 "B" 1 5
    26 "B" 4 5
    27 "B" 1 8
    28 "B" 4 6
    29 "B" 7 3
    30 "B" 8 5
    31 "B" 2 5
    32 "B" 8 4
    33 "B" 7 4
    34 "B" 8 1
    35 "B" 4 5
    36 "B" 6 4
    37 "B" 5 7
    38 "B" 5 2
    39 "B" 5 6
    40 "B" 5 5
    41 "C" 6 3
    42 "C" 8 4
    43 "C" 6 3
    44 "C" 5 4
    45 "C" 6 4
    46 "C" 6 4
    47 "C" 6 4
    48 "C" 3 2
    49 "C" 6 3
    50 "C" 7 2
    51 "C" 5 2
    52 "C" 7 4
    53 "C" 7 4
    54 "C" 4 2
    55 "C" 5 2
    56 "C" 5 5
    57 "C" 6 2
    58 "C" 3 5
    59 "C" 4 1
    60 "C" 5 3
    end
    Which test on Stata should i use to check which one is closer to my y_base ?

    Thanks you

  • #2
    Raph:
    I'd go:
    Code:
    . encode module, g(num_module)
    . ttest y_base == y if num_module==1, unpaired unequal
    
    Two-sample t test with unequal variances
    ------------------------------------------------------------------------------
    Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
    ---------+--------------------------------------------------------------------
      y_base |      20        4.55    .3515005    1.571958    3.814301    5.285699
           y |      20         3.5    .3120391    1.395481    2.846895    4.153105
    ---------+--------------------------------------------------------------------
    Combined |      40       4.025    .2467416    1.560531    3.525918    4.524082
    ---------+--------------------------------------------------------------------
        diff |                1.05    .4700224                .0980503     2.00195
    ------------------------------------------------------------------------------
        diff = mean(y_base) - mean(y)                                 t =   2.2339
    H0: diff = 0                     Satterthwaite's degrees of freedom =  37.4736
    
        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
     Pr(T < t) = 0.9842         Pr(|T| > |t|) = 0.0315          Pr(T > t) = 0.0158
    
    . ttest y_base == y if num_module==2, unpaired unequal
    
    Two-sample t test with unequal variances
    ------------------------------------------------------------------------------
    Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
    ---------+--------------------------------------------------------------------
      y_base |      20         4.7    .4925765    2.202869    3.669026    5.730974
           y |      20        4.55    .3661679    1.637553    3.783602    5.316398
    ---------+--------------------------------------------------------------------
    Combined |      40       4.625    .3031618    1.917363    4.011797    5.238203
    ---------+--------------------------------------------------------------------
        diff |                 .15    .6137675               -1.095904    1.395904
    ------------------------------------------------------------------------------
        diff = mean(y_base) - mean(y)                                 t =   0.2444
    H0: diff = 0                     Satterthwaite's degrees of freedom =  35.0866
    
        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
     Pr(T < t) = 0.5958         Pr(|T| > |t|) = 0.8084          Pr(T > t) = 0.4042
    
    . ttest y_base == y if num_module==3, unpaired unequal
    
    Two-sample t test with unequal variances
    ------------------------------------------------------------------------------
    Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
    ---------+--------------------------------------------------------------------
      y_base |      20         5.5    .2946898    1.317893    4.883207    6.116793
           y |      20        3.15    .2541757    1.136708    2.618004    3.681996
    ---------+--------------------------------------------------------------------
    Combined |      40       4.325    .2688711     1.70049    3.781157    4.868843
    ---------+--------------------------------------------------------------------
        diff |                2.35    .3891624                1.561624    3.138376
    ------------------------------------------------------------------------------
        diff = mean(y_base) - mean(y)                                 t =   6.0386
    H0: diff = 0                     Satterthwaite's degrees of freedom =  37.1981
    
        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
     Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000
    
    
    
    .
    You may want to consider adjusting for multiple comparision (y_base vs, "A"; y_base vs "B"; y_base vs "C".
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      The problem seems one of measurement rather than testing, i.e. which "module" is most nearly in agreement with the base measurements?

      Two tools are a plot of observed versus fitted (or vice versa if you prefer) with a reference line of equality and concordance correlation, which measures agreement rather than linearity. Some people call this a calibration plot, and there are yet other names.

      For a brief survey of the territory you may have access to https://www.sciencedirect.com/scienc...69555X05003740

      Concordance correlation is not a difficult calculation but concord from the Stata Journal may be convenient.

      Code:
      . search concord, sj historical
      
      Search of official help files, FAQs, Examples, and Stata Journals
      
      SJ-10-4 st0015_6  . . . . . . . . . . . . . . . .  Software update for concord
              (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
              Q4/10   SJ 10(4):691
              update explicitly supporting plot() and addplot() and
              allowing systematic control of the reference line
      
      SJ-8-4  st0015_5  . . . . . . . . . . . . . . . .  Software update for concord
              (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
              Q4/08   SJ 8(4):594
              dialog box modified to correct visual layout problems
      
      SJ-7-3  st0015_4  . . . . . . . . . . . . . . . .  Software update for concord
              (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
              Q3/07   SJ 7(3):444
              now compatible with Stata 10; help file updated
      
      SJ-6-2  st0015_3  . . . . . . . . . . . . . . . .  Software update for concord
              (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
              Q2/06   SJ 6(2):284
              updated for compatibility with Stata 9 by() option
      
      SJ-5-3  st0015_2  . . . . . . . . . . . . . . . .  Software update for concord
              (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
              Q3/05   SJ 5(3):470
              minor bug fix for concord
      
      SJ-4-4  st0015_1  . . . . . . . . . . . . . . . .  Software update for concord
              (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
              Q4/04   SJ 4(4):491
              rewritten to provide dialog, Stata 8 graphs, and two
              new tests
      
      SJ-2-2  st0015  . . . . . .  A note on the concordance correlation coefficient
              (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
              Q2/02   SJ 2(2):183--189
              correction based on an erratum for Lin's concordance
              correlation coefficient, and assessment of the impact of
              the change
      
      STB-58  sg84.3  . . . . Concordance correlation coefficient: minor corrections
              (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
              11/00   p.9; STB Reprints Vol 10, p.137
              small bug fixes affecting user control through the connect(),
              symbol(), and pen() options
      
      STB-54  sg84.2  . . .  Concordance correlation coefficient: update for Stata 6
              (help concord if installed) . . . . . . . T. J. Steichen and N. J. Cox
              3/00    pp.25--26; STB Reprints Vol 9, pp.169--170
              updated to version 6 with corrections and new option for
              saving the standard normal plot
      
      STB-45  sg84.1  . . . . . . . . Concordance correlation coefficient, revisited
              (help concord if installed) . . . . . . . .  T. Steichen and N. J. Cox
              9/98    pp.21--23; STB Reprints Vol 8, pp.143--145
              improvements to the concord command
      
      STB-43  sg84  . . . . . . . . . . . . . .  Concordance correlation coefficient
              (help concord if installed) . . . . . . . .  T. Steichen and N. J. Cox
              5/98    pp.35--39; STB Reprints Vol 8, pp.137--143
              computes Lin's (1989) concordance correlation coefficient for
              agreement on a continuous measure obtained by two persons or
              methods
      The results for fake data are naturally of inherent interest but here is some technique.


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float id str1 module float(y_base y)
      1 "A" 6 2
      2 "A" 7 3
      3 "A" 6 4
      4 "A" 4 4
      5 "A" 6 2
      6 "A" 4 2
      7 "A" 4 5
      8 "A" 3 2
      9 "A" 6 2
      10 "A" 4 6
      11 "A" 7 3
      12 "A" 5 1
      13 "A" 3 3
      14 "A" 5 5
      15 "A" 1 5
      16 "A" 6 4
      17 "A" 4 4
      18 "A" 3 3
      19 "A" 3 5
      20 "A" 4 5
      21 "B" 4 5
      22 "B" 5 3
      23 "B" 2 3
      24 "B" 3 5
      25 "B" 1 5
      26 "B" 4 5
      27 "B" 1 8
      28 "B" 4 6
      29 "B" 7 3
      30 "B" 8 5
      31 "B" 2 5
      32 "B" 8 4
      33 "B" 7 4
      34 "B" 8 1
      35 "B" 4 5
      36 "B" 6 4
      37 "B" 5 7
      38 "B" 5 2
      39 "B" 5 6
      40 "B" 5 5
      41 "C" 6 3
      42 "C" 8 4
      43 "C" 6 3
      44 "C" 5 4
      45 "C" 6 4
      46 "C" 6 4
      47 "C" 6 4
      48 "C" 3 2
      49 "C" 6 3
      50 "C" 7 2
      51 "C" 5 2
      52 "C" 7 4
      53 "C" 7 4
      54 "C" 4 2
      55 "C" 5 2
      56 "C" 5 5
      57 "C" 6 2
      58 "C" 3 5
      59 "C" 4 1
      60 "C" 5 3
      end
      
      scatter y y_base, ms(Oh) l1title("y", orient(horiz))  || line y_base y_base, sort by(module, note("") legend(off) row(1) ) aspect(1) 
      
      foreach module in A B C { 
          di "{title:Module `module'}"
          concord y y_base if module == "`module'"
          di _n 
      }
      
      Module A
      
      Concordance correlation coefficient (Lin, 1989, 2000):
      
       rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
      ---------------------------------------------------------------
      -0.274     0.179      20    -0.624  0.077    0.126   asymptotic
                                  -0.578  0.098    0.146  z-transform
      
      Pearson's r = -0.348  Pr(r = 0) = 0.133  C_b = rho_c/r =  0.786
      Reduced major axis:   Slope =    -0.888   Intercept =     7.539
      
      Difference = y - y_base
      
              Difference                 95% Limits Of Agreement
         Average     Std Dev.             (Bland & Altman, 1986)
      ---------------------------------------------------------------
          -1.050       2.438                 -5.829      3.729
      
      Correlation between difference and mean = -0.126
      
      Bradley-Blackwood F = 1.931 (P = 0.17383)
      
      
      Module B
      
      Concordance correlation coefficient (Lin, 1989, 2000):
      
       rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
      ---------------------------------------------------------------
      -0.442     0.180      20    -0.793 -0.090    0.014   asymptotic
                                  -0.722 -0.037    0.034  z-transform
      
      Pearson's r = -0.463  Pr(r = 0) = 0.040  C_b = rho_c/r =  0.955
      Reduced major axis:   Slope =    -0.743   Intercept =     8.044
      
      Difference = y - y_base
      
              Difference                 95% Limits Of Agreement
         Average     Std Dev.             (Bland & Altman, 1986)
      ---------------------------------------------------------------
          -0.150       3.297                 -6.612      6.312
      
      Correlation between difference and mean = -0.321
      
      Bradley-Blackwood F = 1.059 (P = 0.36756)
      
      
      Module C
      
      Concordance correlation coefficient (Lin, 1989, 2000):
      
       rho_c   SE(rho_c)   Obs    [   95% CI   ]     P        CI type
      ---------------------------------------------------------------
       0.077     0.081      20    -0.081  0.236    0.338   asymptotic
                                  -0.082  0.233    0.340  z-transform
      
      Pearson's r =  0.228  Pr(r = 0) = 0.333  C_b = rho_c/r =  0.339
      Reduced major axis:   Slope =     0.863   Intercept =    -1.594
      
      Difference = y - y_base
      
              Difference                 95% Limits Of Agreement
         Average     Std Dev.             (Bland & Altman, 1986)
      ---------------------------------------------------------------
          -2.350       1.531                 -5.351      0.651
      
      Correlation between difference and mean = -0.151
      
      Bradley-Blackwood F = 23.041 (P = 0.00001)
      Click image for larger version

Name:	concord.png
Views:	1
Size:	37.3 KB
ID:	1691014




      Comment


      • #4
        Thanks Nick for your response.

        I'm sorry but, how to interpret the results of "concord"?

        Raph

        Comment


        • #5
          You have several references there!

          Comment


          • #6
            Thanks Nick, that helped me a lot
            I just have a question, is there any problem to work with many 0 values?
            In my database, in any module I have something like 50% of people with 0 in y or y_base

            Comment


            • #7
              As implied by #3 and explained in much more detail in the references. concordance correlation is about assessing closeness to full agreement (how far are data described by y = x?) rather than closeness to full linearity of relationship (how far are data described by y = a + bx?). The second is naturally the question answered by correlation.

              So, zeros are not a problem as such.

              The issues for you are to do with subject-matter. At one extreme, zeros are just a possible value in the range. At another extreme, zeros mark a qualitative difference too.

              Here is a silly example, and necessarily I have no idea how close it is to your set-up. Suppose the question was about different methods of measuring amount of smoking (tobacco). Then many people don't smoke at all and different methods might well agree that such people are recorded as zeros. You might decide that smoking among non-smokers was --- or alternatively was not -- part of what you are trying to assess.

              The issue is age-old. A celebrated example in the 19th century was forecasting tornados, which mostly didn't happen but occasionally did. It was quickly pointed that always predicting "no tornado" is quite a successful method because most days didn't experience a tornado even in areas that did experience them sometimes. However, the matter doesn't end there.
              Last edited by Nick Cox; 05 Dec 2022, 04:03.

              Comment


              • #8
                I have a related query.
                I have used concord to assess concordance.
                So I have a variable LMP (which is the true gestational age). I am using six more methods to measure LMP - y, HC, AC, FL, COMP, and BPD.
                For each of these six methods, I calculate concordance using concord with the loa option.
                How can I compare the concordance of different tests: specifically, I'm interested in comparing the concordance of y vs. HC, y vs. AC, y vs. FL, y vs. COMP, and y vs. BPD.
                Here is the data.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float(id LMP y HC AC FL COMP BPD)
                 1 153 149.87616 155 153 153 154 155
                 2 167  165.8013 183 180 187 175 183
                 3 258 244.56856 224 226 253 224 230
                 4 160  160.5986 161 174 157 164 167
                 5 165  160.5986 178 170 179 174 175
                 6 220  223.8171 224 219 215 219 221
                 7 167  160.5986 168 168 168 167 168
                 8 275 263.96915 256 257 250 255 256
                 9 135  138.7309 123 131 127 127 132
                10 220  223.8171 219 217 225 219 217
                11 226  227.5399 209 217 216 213 213
                12 196 212.01436 193 216 205 203 201
                13 237 219.98854 215 239 226 229 238
                14 258  250.6401 255 251 231 248 255
                15 125 127.16283 116 115 112 115 124
                16 260 261.51474 251 232 250 250 265
                17 227 231.15704 167 215 228 226 233
                18 272 263.96915 238 267 261 259 268
                19 252  250.6401 237 244 237 235 223
                20 275 258.95462 249 279 252 254 246
                end

                Comment


                • #9
                  On #8 I just note that I have tended to regard concordance correlation as a measurement procedure. not a testing procedure. Here is the result of getting an overall concordance correlation matrix. The order of variables has been tweaked after looking at initial results.


                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float(id LMP y HC AC FL COMP BPD)
                   1 153 149.87616 155 153 153 154 155
                   2 167  165.8013 183 180 187 175 183
                   3 258 244.56856 224 226 253 224 230
                   4 160  160.5986 161 174 157 164 167
                   5 165  160.5986 178 170 179 174 175
                   6 220  223.8171 224 219 215 219 221
                   7 167  160.5986 168 168 168 167 168
                   8 275 263.96915 256 257 250 255 256
                   9 135  138.7309 123 131 127 127 132
                  10 220  223.8171 219 217 225 219 217
                  11 226  227.5399 209 217 216 213 213
                  12 196 212.01436 193 216 205 203 201
                  13 237 219.98854 215 239 226 229 238
                  14 258  250.6401 255 251 231 248 255
                  15 125 127.16283 116 115 112 115 124
                  16 260 261.51474 251 232 250 250 265
                  17 227 231.15704 167 215 228 226 233
                  18 272 263.96915 238 267 261 259 268
                  19 252  250.6401 237 244 237 235 223
                  20 275 258.95462 249 279 252 254 246
                  end
                  
                  matrix concord = J(7,7,.)
                  
                  tokenize LMP y COMP AC FL  BPD HC 
                  
                  forval i = 1/7 { 
                      forval j = 1/7 { 
                          quietly concord ``i'' ``j''
                          matrix concord[`i', `j']  = r(rho_c)
                          if `i' != `j' matrix concord[`j', `i']  = r(rho_c)
                      } 
                  }
                  
                  matrix rownames concord = LMP y COMP AC FL BPD HC
                  matrix colnames concord = LMP y COMP AC FL BPD HC 
                  
                  matrix li concord , format(%4.3f)
                  
                  symmetric concord[7,7]
                          LMP      y   COMP     AC     FL    BPD     HC
                   LMP  1.000
                     y  0.985  1.000
                  COMP  0.961  0.974  1.000
                    AC  0.960  0.959  0.978  1.000
                    FL  0.958  0.967  0.980  0.957  1.000
                   BPD  0.958  0.966  0.987  0.955  0.969  1.000
                    HC  0.897  0.904  0.938  0.920  0.915  0.910  1.000
                  Although plotting difference versus mean is a good idea, so is a plain scatter plot matrix.

                  Comment

                  Working...
                  X