Post-hoc tests - Romeo and Julia, where Romeo is Basic Statistics

5.5 Post-hoc tests

Let’s start with a similar example to the ones we already met.

Imagine that you are a scientist and in the Amazon rain forest you discovered two new species of mice (spB, and spC). Now, you want to compare their body masses with an ordinary lab mice (spA) so you collect the data. If the body masses differ perhaps in the future they will become the criteria for species recognition.

# if you are in 'code_snippets' folder, then use: "./ch05/miceBwtABC.csv"
# if you are in 'ch05' folder, then use: "./miceBwtABC.csv"
miceBwtABC = Csv.read("./code_snippets/ch05/miceBwtABC.csv", Dfs.DataFrame)

Table 3: Body mass [g] of three mice species (fictitious data).
spA	spB	spC
18	21	23
21	25	27
20	26	25
23	24	28
22	21	27
19	24	26

Now, let us quickly look at the means and standard deviations in the three groups to get some impression about the data.

[
(n, Stats.mean(miceBwtABC[!, n]), Stats.std(miceBwtABC[!, n]))
    for n in Dfs.names(miceBwtABC) # n stands for name
]

("spA", 20.5, 1.8708286933869707)

("spB", 23.5, 2.073644135332772)

("spC", 26.0, 1.7888543819998317)

Here, the function Dfs.names returns Vector{T} with names of the columns. In connection with comprehensions we met in Section 3.6.3 it allows us to quickly obtain the desired statistics without typing the names by hand. Alternatively we would have to type

[
("spA", Stats.mean(miceBwtABC[!, "spA"]), Stats.std(miceBwtABC[!, "spA"])),
("spB", Stats.mean(miceBwtABC[!, "spB"]), Stats.std(miceBwtABC[!, "spB"])),
("spC", Stats.mean(miceBwtABC[!, "spC"]), Stats.std(miceBwtABC[!, "spC"])),
]

It didn’t save us a lot of typing in this case, but think what if we had 10, 30 or even 100 columns. The gain would be quite substantial.

Alternatively, if you read the documentation of the before mentioned (Section 5.3) Dfs.describe then you can go with:

Dfs.describe(miceBwtABC, :mean, :std)

Table 4: Selected summary statistics based on miceBwtABC data frame.
variable	mean	std
spA	20.5	1.8708286933869707
spB	23.5	2.073644135332772
spC	26.0	1.7888543819998317

Anyway, based on the means it appears that the three species differ slightly in their body masses. Still, in connection with the standard deviations, we can see that the body masses in the groups overlap slightly. So, is it enough to claim that they are statistically different at the cutoff level of 0.05 (\(\alpha\))? Let’s test that with the one-way ANOVA that we met in the previous chapter.

Let’s start by checking the assumptions. First, the normality assumption

[getSWtestPval(miceBwtABC[!, n]) for n in Dfs.names(miceBwtABC)] |>
pvals -> map(pv -> pv > 0.05, pvals) |>
         all

true

All normal. Here we get the p-values from Shapiro-Wilk test for all our groups. Briefly, we obtain p-value (getSWtestPval) for each group (Dfs.names(miceBwtABC)). Then we pipe (compare with |> in getSortedKeysVals from Section 4.5) the result to map to check if the p-values (pvals) are greater than 0.05 (then we do not reject the null hypothesis of normal distribution). Finally, we pipe (|>) the Vector{Bool} to the function all. The function returns true only if all the elements of the vector are true.

OK, time for the homogeneity of variance assumption

Ht.FlignerKilleenTest(
    [miceBwtABC[!, n] for n in Dfs.names(miceBwtABC)]...
    ) |> Ht.pvalue |> pv -> pv > 0.05

true

The variances are roughly equal. Here [miceBwtABC[!, n] for n in Dfs.names(miceBwtABC)] returns Vector{Vector{<:Real}} so vector of vectors, e.g. [[1, 2], [3, 4], [5, 6]] but Ht.FlingerTest expects separate vectors [1, 2], [3, 4], [5, 6] (no outer square brackets). The splat operator (...) placed after the array removes the outer square brackets. Then we pipe the result of the test Ht.FlingerTest to Ht.pvalue because according to the documentation it extracts the p-value from the result of the test. Finally, we pipe (|>) the result to an anonymous function (pv -> pv > 0.05) to check if the p-value is greater than 0.05 (then we do not reject the null hypothesis of variance homogeneity).

OK, and now for the one-way ANOVA.

Ht.OneWayANOVATest(
    [miceBwtABC[!, n] for n in Dfs.names(miceBwtABC)]...
    ) |> Ht.pvalue

0.0006608056579183923

Hmm, OK, the p-value is lower than the cutoff level of 0.05. What now. Well, by doing one-way ANOVA you ask your computer a very specific question: “Does at least one of the group means differs from the other(s)?”. The computer does exactly what you tell it, nothing more, nothing less. Here, it answers your question precisely with: “Yes” (since \(p \le 0.05\)). I assume that right now you are not satisfied with the answer. After all, what good is it if you still don’t know which group(s) differ one from another: spA vs. spB and/or spA vs spC and/or spB vs spC. If you want your computer to tell you that then you must ask it directly to do so. That is what post-hoc tests are for (post hoc means after the event, here the event is one-way ANOVA).

The split to one-way ANOVA and post-hoc tests made perfect sense in the 1920s-30s and the decades after the method was introduced. Back then you performed calculations with a pen and a piece of paper (and since ~1970s a pocket calculator as well). Once one-way ANOVA produced a p-value greater than 0.05 you stopped (and saved time and energy on an unnecessary additional calculations). Otherwise, and only then, you performed a post-hoc test (again with a pen and a piece of paper). Anyway, as mentioned in Section 4.9.4 the popular choices for post-hoc tests include Fisher’s LSD test and Tukey’s HSD test. Here we are going to use a more universal approach and apply a so called pairwise t-test (which is just a t-test, that you already know, done between every pairs of groups). Ready, here we go

evtt = Ht.EqualVarianceTTest
getPval = Ht.pvalue

# for "spA vs spB", "spA vs spC" and "spB vs spC", respectively
postHocPvals = [
evtt(miceBwtABC[!, "spA"], miceBwtABC[!, "spB"]) |> getPval,
evtt(miceBwtABC[!, "spA"], miceBwtABC[!, "spC"]) |> getPval,
evtt(miceBwtABC[!, "spB"], miceBwtABC[!, "spC"]) |> getPval,
]

postHocPvals

[0.025111501405268754, 0.0003985445257645916, 0.049332195639921715]

OK, here to save us some typing we assigned the long function names (Ht.EqualVarianceTTest and Ht.pvalue) to the shorter ones (evtt and getPval). Then we used them to conduct the t-tests and extract the p-values for all the possible pairs to compare (we will develop some more user friendly functions in the upcoming exercises, see Section 5.7.4). Anyway, it appears that here any mouse species differs with respect to their average body weight from the other two species (all p-vaues are below 0.05). Or does it?

5.4 One-way ANOVA ← → 5.6 Multiplicity correction

CC BY-NC-SA 4.0 Bartlomiej Lukaszuk