Multiple Comparisons in Analysis of Variance

StatsDirect provides functions for multiple comparison (simultaneous inference), specifically all pairwise comparisons and all comparisons with a control. For k groups there are k(k-1)/2 possible pairwise comparisons.

Tukey (Tukey-Kramer if unequal group sizes), Scheffé, Bonferroni and Newman-Keuls methods are provided for all pairwise comparisons (Armitage and Berry, 1994; Wallenstein, 1980; Miller, 1981; Hsu, 1996; Kleinbaum et al., 1998 ). Dunnett's method is used for multiple comparisons with a control group (Hsu, 1996).

For k groups, ANOVA can be used to look for a difference across k group means as a whole. If there is a statistically significant difference across k means then a multiple comparison method can be used to look for specific differences between pairs of groups. The reason that two sample methods should not be used to make multiple pairwise comparisons is that they are not designed for repeat testing in a "data dredging" manner.

If 20 repeat pairwise tests are made then you can not accept the conventional 1 in 20 chance of being wrong as a cut off level for statistical inference, i.e. there is a higher risk of type I error. A simple solution to this problem is to reduce the cut-off for statistical significance with increasing numbers of contrasts made; Bonferroni's method does just this with multiple t tests. More sophisticated methods, such as Tukey(-Kramer), consider the statistical distributions associated with systematic repeated testing; both Tukey(-Kramer) and Newman-Keuls methods are based upon the Studentized range statistic. Scheffé's method gives a very conservative/cautious weighting against the risk of type I error and is therefore less powerful for the detection of true differences. The most acceptable general method for all pairwise comparisons is Tukey(-Kramer), the P values for which are exact with balanced designs (Hsu, 1996).

The outputs from the different multiple contrast methods are displayed in decreasing order based upon on the absolute value of the difference between the means of the two groups compared for each contrast. The word "stop" is shown next to the first non-significant P value, this indicates that you should not consider further contrasts if you are making a simultaneous analysis (similar to the Shaffer-Holm method).

The following is a decision tree for selecting a multiple contrast method:

pairwise
- equal groups sizes: Tukey
- unequal group sizes: Tukey-Kramer or Scheffé
not pairwise
- with a control: Dunnett
- planned: Bonferroni
- not planned: Scheffé

Note that Bonferroni and Scheffé methods are completely general; they can be used for unplanned (a posteriori) or planned (a priori) multiple comparisons.

This is a controversial area in statistics and you would be wise to seek the advice of a statistician at the design stage of your study. In general you should design experiments so that you can avoid having to "dredge" groups of data for differences, decide which contrasts you are interested in at the outset. Note that multiple independent comparisons (e.g. multiple t or Mann-Whitney tests) may be justified if you identify the comparisons as valid at the design stage of your investigation.

Other statistical software may refer to LSD (least significant difference) methods, please note that the Bonferroni technique described above is an LSD method.

analysis of variance