Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to visualize independent two-sample t-tests?

    A recent thread on Cross Validated http://stats.stackexchange.com/quest...-sample-t-test may be of interest to people here.

    Here I want to develop one of my answers in response to a suggestion from Frank Harrell:

    5down vote
    Besides the nice goal of presenting the results there should be some consideration about which graphics check the assumptions of the two-sample equal variance tt-test for it to have excellent performance. That would be normal inverse functions of the two empirical cumulative distribution functions. To satisfy the test assumptions these two curves must be parallel straight lines.
    The 5 that appears there is the number of votes that answer had received at the time of copy and paste. If you can see "down" that is some kind of HTML artefact.

    qplot (Stata Journal) offers a way to approach this without two much pain. (Use search qplot to identify the source of the most recent revision and pertinent papers.)

    Here is a self-contained example.


    Code:
    sysuse auto, clear
    
    su mpg if foreign
    local m1 = r(mean)
    su mpg if !foreign
    local m0 = r(mean)
    
    qplot mpg, over(foreign) trscale(invnorm(@)) xtitle(standard normal deviate) ///
    aspect(1) legend(order(2 1) col(1) ring(0) pos(11)) mc(red blue) ms(+ Oh) ///
    addplot(scatteri `m1' -2.4 `m1' 2.4, recast(line) lcolor(blue) lw(thin) || scatteri `m0' -2.4 `m0' 2.4, recast(line) lcolor(red) lw(thin)) ytitle("`: var label mpg'") yla(, ang(h)) name(G1)
    
    gen gptm = 1000/mpg
    su gptm if foreign
    local g1 = r(mean)
    su gptm if !foreign
    local g0 = r(mean)
    
    qplot gptm, over(foreign) trscale(invnorm(@)) xtitle(standard normal deviate) ///
    aspect(1) legend(order(2 1) col(1) ring(0) pos(11)) mc(red blue) ms(+ Oh)  ///
    addplot(scatteri `g1' -2.4 `g1' 2.4, recast(line) lcolor(blue) lw(thin) || ///
    scatteri `g0' -2.4 `g0' 2.4, recast(line) lcolor(red) lw(thin)) ytitle("Gallons/1000 miles") yla(, ang(h)) name(G2)
    
    graph combine G1 G2
    Click image for larger version

Name:	qplotboth.png
Views:	1
Size:	10.5 KB
ID:	1322694



    Code:
    . ttest mpg, by(foreign)
    
    Two-sample t test with equal variances
    ------------------------------------------------------------------------------
       Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
    Domestic |      52    19.82692     .657777    4.743297    18.50638    21.14747
     Foreign |      22    24.77273     1.40951    6.611187    21.84149    27.70396
    ---------+--------------------------------------------------------------------
    combined |      74     21.2973    .6725511    5.785503     19.9569    22.63769
    ---------+--------------------------------------------------------------------
        diff |           -4.945804    1.362162               -7.661225   -2.230384
    ------------------------------------------------------------------------------
        diff = mean(Domestic) - mean(Foreign)                         t =  -3.6308
    Ho: diff = 0                                     degrees of freedom =       72
    
        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
     Pr(T < t) = 0.0003         Pr(|T| > |t|) = 0.0005          Pr(T > t) = 0.9997
    
    . ttest gptm, by(foreign)
    
    Two-sample t test with equal variances
    ------------------------------------------------------------------------------
       Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
    Domestic |      52    53.18155    1.697862    12.24346    49.77295    56.59016
     Foreign |      22    43.12848    2.439844    11.44388    38.05455    48.20242
    ---------+--------------------------------------------------------------------
    combined |      74     50.1928    1.487802    12.79856    47.22762    53.15799
    ---------+--------------------------------------------------------------------
        diff |            10.05307    3.056002                3.961044     16.1451
    ------------------------------------------------------------------------------
        diff = mean(Domestic) - mean(Foreign)                         t =   3.2896
    Ho: diff = 0                                     degrees of freedom =       72
    
        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
     Pr(T < t) = 0.9992         Pr(|T| > |t|) = 0.0016          Pr(T > t) = 0.0008

    Key points:

    1. Any graph supplementing t test results should surely show the means themselves. This may not seem to deserve emphasis but many courses and texts illustrate t tests with box plots, which by default (in any software I know) don't show means at all. Much more on this in the thread cited.

    2. In this particular example the difference between the two groups is strong and systematic, but the graph is useful too. Without adding diagonal lines showing proportionality between data and fitted normals it is easy to see some divergence of lines on the original scale and a better approximation to parallelism on the reciprocal scale. (Multiplication by 1000 to get convenient numbers is just cosmetic.)

    3. There is plenty of room for discussion about whether standard errors should be added too. It's naturally common to see plots of means and confidence intervals, but I prefer to see more of the data. Similarly some would want to sprinkle P-values too on the graphs as part of the sanctification.

    4. It's not an error, but a consequence of the transformation, that the groups are flipped round. Low miles per gallon mean high gallons per mile.

    5. I didn't use yli() to add horizontal lines for the two means because they would be the same colour. Using scatteri twice allows matching of colours between the lines and the points. Clearly in any report a text caption to explain that would be a good idea.

    6. The use of red and blue has no political connotations in this case.

    I'll give the individual graphs one by one too, as combined they may seem a little small.

    I'd welcome literature references to uses of similar ideas.
    Click image for larger version

Name:	qplotG1.png
Views:	1
Size:	9.0 KB
ID:	1322695

    Click image for larger version

Name:	qplotG2.png
Views:	1
Size:	9.1 KB
ID:	1322696

    Last edited by Nick Cox; 13 Jan 2016, 07:14.
Working...
X