Kruskal-Wallis non-parametric test

edsonmontoro
Sep 4, 2020
7 min read

Hi everyone, as we following the "stay at home" recommendations, let's continue with our non parametric tests paper. Now it's time to talk about the Kruskal-Wallis, it's equivalent to the the parametric test ANOVA.

PS: You can check the article here in the blog or if you prefer you can download the article.

Kruskal-Wallis non-parametric test

Author: Edson Rui Montoro

Continuing with the non-parametric tests, this time we are going to talk about a non-parametric test equivalent to ANOVA “one-way”.

Just to remember, non-parametric tests are a little “weaker” than parametric tests, but in compensation you don't have to worry about whether the data for each group obeys a Normal probability distribution model, which for ANOVA is important. Also, as an advantage, you can use non-parametric tests for data measured on an ordinal scale and even for those measured on a nominal scale. Overall, the math involved in these tests is much simpler.

The Kruskal-Wallis test was created by William Kruskal (1919 – 2005), American mathematician and statistician and by W. Allen Wallis (1912 – 1998) economist and American statistician.

The Kruskal-Wallis test does not work with the hypotheses of comparing the parameters, does not test the hypothesis of equality of means and does not test the equality of medians, as many believe. The Kruskal-Wallis test is indicated to test the hypothesis that three or more populations have an equal distribution.

Thus, when applying a Kruskal-Wallis test, in the report at first, it should not be presented means, medians or graphs with these statistics. The Kruskal-Wallis test works with ranks – not with the original data directly.

Just for the sake of clarity, there is another non-parametric test that also works with the comparison of several groups, which is the Friedman test. This test has a particular difference, it considers that each group is using the same individuals in all treatments, a technique that is called the comparison of repeated measures, while Kruskal-Wallis considers that for each group, the individuals are different and independent.

Here is an example to detail in a practical way how the test works.

"A company wants to assess the effectiveness of an operational training module, as required by some ISO standards, such as 9001, and to assess whether there is a significant difference between work shifts. For this, it was chosen 28 employees at random, divided by the work shifts (4 groups on a rotating shift basis), leaving 7 employees from each group. It was submitted all employees to the training module with the same instructor, then applied a practical test at the plant facilities related to training. After the test, the performance of each individual in each group was evaluated. The test scores are shown in Table 1."

The hypotheses for this test are:

H0: M1 = M2 = M3 = M4;

H1: There is at least one different group.

Recalling that the level of significance, previously defined, is 5%; the H0 rejection criterion (similar to ANOVA) is unilateral to the right, that is, if the observed value is greater than or equal to the critical value, H0 is rejected.

For the execution of the test, position measures (rank rij) must be assigned to each experimental value, for this purpose all experimental results are ordered in ascending order (keeping them in their original group) and assigned the appropriate value of the rank. If there is a tie, the mean rank is attributed to the tied experimental values. Table 2 shows the ranked values.

There are two ways to calculate the H statistic:

when there is few or no tie in the ranks;
when there are many ties (there is no clear rule, but usually > 3 ties).

For each group, the sum of ranks (rij) is obtained by:

When there are no ties in the observed values of the samples, or the number of ties is very small, as in the example (only two results tied), the test statistic is:

Where,

k = Number of groups;

N = Total number of experimental measurements;

Ri = Sum of the ranks from each group;

ni = Number of measurements in each group;

H = value of Kruskal-Wallis statistics.

Substituting the respective values in the formula:

The null hypothesis must be rejected if the observed value of the H statistic is greater than the critical value (unilateral test on the right), for this purpose this critical value is sought in the Chi-Square table (χ2), with (4 - 1) degrees of freedom, because there are 4 groups. Considering an alpha of 5%, this value is 7.81. Thus, the null hypothesis is rejected, concluding that there is at least one different group.

It is worth mentioning that some precautions must be taken, depending on the conditions of the test, such as the number of groups and the number of measurements per group. The rules apply:

If k = 3 and ni ≤ 5, see table of the exact distribution of the H statistic, under H0.
If k > 3 or ni ≥ 5, H has approximately Chi-Square distribution (χ2); consult the table of this distribution.
When there are many ties in the observed values of the samples, the test statistic to be used should be:

Where,

When the null hypothesis H0 is rejected in the Kruskal-Wallis test, it indicates that at least one of the groups is different from the others. However, there is no information about which one is different. In this case, a multiple comparison procedure allows to determine which groups are different, in the same way that it is done for ANOVA.

It is worth mentioning that after the rejection of the Kruskal-Wallis test, normally the researcher is interested in knowing which group (s) are different, because as the probability distributions are different, it is interesting to know if the parameters of this distribution also have differences.

But before that, it is interesting to check in a more visual way, using a technique that is also non-parametric (the Box Plot technique), for the scores. The results are shown in Figure 1.

Figure 1 – Box Plot with the score’s values.

It can be seen in Figure 1 that there is probably no significant difference between Groups 1 and 2, as the chamfer of the median confidence intervals coincide (to know more about Box Plot, check our other paper). Groups 3 and 4 are different from groups 1 and 2, while it seems that between Groups 3 and 4 there is also a difference. As the coincidence of the chamfers between Groups 3 and 4 has a small coincidence, one can doubt that conclusion. It will only be possible to answer it with more certainty using a multiple comparison test, which will be presented below. But before it, is still possible to get some more conclusions from the Box Plots, Groups 1 and 2 obtained the lowest scores, while Group 4 seems to have a median score higher than the others.

One might ask, but what about to apply the Box plot for the position measurements (ranks), would the conclusions be the same?

It can be seen in Figure 2, that the groups behave very similarly to the Box plot groups of the original values.

Figure 2 – Box Plot of the ranks.

After the visual analysis, you can perform multiple comparison tests, calculating the differences between the sums of the ranks of each group, and the equality comparison is rejected when the difference in module is greater than or equal to a critical value (Cij), given by the formula:

Where,

ni and nj are the samples sizes form each group i and j, respectively;

N = Number of total measurements, N = n1 + n2 + ... + nk;

Ri. and Rj. are the sum of the ranks of each group being compared, i and j, respectively;

|Ri. - Rj.| is the difference observed from each pairwise comparison; and χ(α,k-1)^2 is the same critical value used in the Kruskal-Wallis test, which in our example is 12.837.

Performing the calculations to test each sub hypothesis (6 in total), we have the results presented below.

The critical value is the same for all groups, as they all have the same number of measurements, that is, 7 experimental measurements each. If the groups have different number of measures, a critical value must be calculated for each sub hypothesis.

1. H0: Median of Group 1 = Median of Group 2

As the difference in module is not greater than the critical value, the first sub hypothesis is not rejected, that is, there is no significant difference between Group 1 and Group 2, with a significance level of 5%.

And this “no rejection” is reinforced by the box plots comparison, there is a reasonable coincidence of the confidence intervals of the median (Figure 2).

2. H0: Median of Group 1 = Median of Group 3