If we run a 2-way ANOVA on ranked data, the test for the interaction in the presence of main effects is reportedly not so good. The aligned rank transform better preserves power and type I error. There are simulations in the literature comparing what measures of central tendency work best under different conditions. The sample mean is a good choice. Here is code following the algorithm presented in Wobbrock et al. (e.g., http://dgergle.soc.northwestern.edu/...ns_CHI2011.pdf). Any comments or corrections welcome.
Code:
/* Aligned Rank Transform After Wobbrock et al., "The aligned rank transform for nonparametric factorial analyses using only ANOVA procedures" URL: <depts.washington.edu/madlab/proj/art/index.html> */ // 1st example from Wobbrock et al. clear input s a b y 1 1 1 12 2 1 2 7 3 2 1 14 4 2 2 8 5 1 1 19 6 1 2 16 7 2 1 14 8 2 2 10 end egen mean = mean(y) bysort a b: egen cellmean = mean(y) generate residual = y - cellmean bysort a: egen a_i_mean = mean(y) bysort b: egen b_j_mean = mean(y) generate a_effect = a_i_mean - mean generate b_effect = b_j_mean - mean generate ab_effect = cellmean - a_i_mean - b_j_mean + mean generate y_a = residual + a_effect generate y_b = residual + b_effect generate y_ab = residual + ab_effect // sanity checks list y_*, sum anova y_a a b a#b anova y_b a b a#b anova y_ab a b a#b egen art_a = rank(y_a) egen art_b = rank(y_b) egen art_ab = rank(y_ab) anova art_a a b a#b // test of a anova art_b a b a#b // test of b anova art_ab a b a#b // test of a#b // 2nd example where it is // easy to see looking at data how // ranking loses interaction effect clear input s a b y 1 1 1 1 2 1 1 2 3 1 1 3 4 1 2 4 5 1 2 5 6 1 2 6 7 1 3 7 8 1 3 8 9 1 3 9 10 2 1 10 11 2 1 11 12 2 1 12 13 2 2 13 14 2 2 14 15 2 2 15 16 2 3 20 17 2 3 21 18 2 3 22 end anova y a b a#b // anova on raw data egen ranks = rank(y) anova ranks a b a#b // anova on ranked data egen mean = mean(y) bysort a b: egen cellmean = mean(y) generate residual = y - cellmean bysort a: egen a_i_mean = mean(y) bysort b: egen b_j_mean = mean(y) generate a_effect = a_i_mean - mean generate b_effect = b_j_mean - mean generate ab_effect = cellmean - a_i_mean - b_j_mean + mean generate y_a = residual + a_effect generate y_b = residual + b_effect generate y_ab = residual + ab_effect list y_*, sum egen art_a = rank(y_a) egen art_b = rank(y_b) egen art_ab = rank(y_ab) anova art_a a b a#b anova art_b a b a#b anova art_ab a b a#b // did not lose interaction with ART
Comment