Imagine a friend that studies biology told you that he had conducted a research in order to write a dissertation and earn a master’s degree. As part of the research he tested a new drug (drug X) on mice. He hopes the drug is capable to reduce the body weights of the animals (and if so, then in a distant future it might be even tested on humans). He asks you for help with the data analysis. The results obtained by him are as follows.
import CSV as Csv
import DataFrames as Dfs
# if you are in 'code_snippets' folder, then use: "./ch05/miceBwt.csv"
# if you are in 'ch05' folder, then use: "./miceBwt.csv"
miceBwt = Csv.read("./code_snippets/ch05/miceBwt.csv", Dfs.DataFrame)
first(miceBwt, 3)
noDrugX | drugX |
---|---|
26 | 26 |
26 | 25 |
24 | 25 |
Note: The path specification above should work fine on GNU/Linux operating systems. I don’t know about other OSs.
Here, we opened a table with a made up data for mice body weight [g] (this data set can be found here). For that we used two new packages (CSV, and DataFrames).
A *.csv
file can be opened and created, e.g. with a spreadsheet program. Here, we read it as a DataFrame
, i.e. a structure that resembles an array from Section 3.3.7. Since the DataFrame
could potentially have thousands of rows we displayed only the first three (to check that everything succeeded) using the first
function.
Note: We can check the size of a
DataFrame
withsize
function which returns the information in a friendly(numRows, numCols)
format.
OK, let’s take a look at some descriptive statistics using describe function.
Dfs.describe(miceBwt)
variable | mean | min | median | max | nmissing | eltype |
---|---|---|---|---|---|---|
noDrugX | 25.5 | 23 | 25.5 | 29 | 0 | Int64 |
drugX | 24.1 | 21 | 24.5 | 26 | 0 | Int64 |
It appears that mice from group drugX
got somewhat lower body weight. But that could be just a coincidence. Anyway, how should we analyze this data? Well, it depends on the experiment design.
Since we have 10 rows (size(miceBwt)[1]
). Then, either:
noDrugX
), administered the drug and measured their body weight after, e.g. one week (drugX
), ornoDrugX
) and the other 10 (11:20) received additionally drugX
(hence group drugX
).Interestingly, the experimental models deserve slightly different statistical methodology. In the first case we will perform a paired samples t-test, whereas in the other case we will use an unpaired samples t-test. Ready, let’s go.
Running a paired Student’s t-test with HypothesisTests
package is very simple. We just have to send the specific column(s) to the appropriate function. Column selection can be done in one of the few ways, e.g. miceBwt[:, "noDrugX"]
(similarly to array indexing in Section 3.3.7 :
means all rows, note that this form copies the column), miceBwt[!, "noDrugX"]
(!
instead of :
, no copying), miceBwt.noDrugX
(again, no copying).
Note: Copying a column is advantageous when a function may modify the input data, but it is less effective for big data frames. If you wonder does a function changes its input then for starter look at its name and compare it with the convention we discussed in Section 3.4.4. Still, to be sure you would have to examine the function’s code.
And now we can finally run the paired t-test.
# miceBwt.noDrugX or miceBwt.noDrugX returns a column as a Vector
Ht.OneSampleTTest(miceBwt.noDrugX, miceBwt.drugX)
One sample t-test
-----------------
Population details:
parameter of interest: Mean
value under h_0: 0
point estimate: 1.4
95% confidence interval: (0.04271, 2.757)
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: 0.0445
Details:
number of observations: 10
t-statistic: 2.3333333333333335
degrees of freedom: 9
empirical standard error: 0.6
And voila. We got the result. It seems that drugX
actually does lower the body mass of the animals (\(p \le 0.05\)). But wait, didn’t we want to do a (paired) two-samples t-test and not OneSampleTTest
? Yes, we did. Interestingly enough, a paired t-test is actually a one-sample t-test for the difference. Observe.
# miceBwt.noDrugX or miceBwt.noDrugX returns a column as a Vector
# hence we can do element-wise subtraction using dot syntax
miceBwtDiff = miceBwt.noDrugX .- miceBwt.drugX
Ht.OneSampleTTest(miceBwtDiff)
One sample t-test
-----------------
Population details:
parameter of interest: Mean
value under h_0: 0
point estimate: 1.4
95% confidence interval: (0.04271, 2.757)
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: 0.0445
Details:
number of observations: 10
t-statistic: 2.3333333333333335
degrees of freedom: 9
empirical standard error: 0.6
Here, we used the familiar dot syntax from Section 3.6.5 to obtain the differences and then fed the result to OneSampleTTest
from the previous section (see Section 5.2). The output is the same as in the previous code snippet.
I don’t know about you, but when I was a student I often wondered when to choose a paired and when an unpaired t-test. Now I finally know, and it is so simple. Too bad that most statistical programs/packages separate paired t-test from one-sample t-test (unlike the authors of the HypothesisTests
package).
Anyway, this also demonstrates an important feature of the data. The data points in both columns/groups need to be properly ordered, e.g. in our case it makes little sense to subtract body mass of a mouse with 1 on its tail from a mouse with 5 on its tail, right? Doing so has just as little sense as subtracting it from mouse number 6, 7, 8, etc. There is only one clearly good way to do this subtraction and this is to subtract mouse number 1 (drugX
) from mouse number 1 (noDrugX
). So, if you ever wonder a paired or unpaired t-test then think if there is a clearly better way to subtract one column of data from the other. If so, then you should go with the paired t-test, otherwise choose the unpaired t-test.
BTW, do you remember how in Section 5.2.2 we checked the assumptions of our oneSampleTTest
, well it turns out that here we should do the same. However, this time instead of Kolmogorov-Smirnov test I’m going to use Shapiro-Wilk’s normality test from HypothesisTests
package (generally Shapiro-Wilk is more powerful).
Ht.ShapiroWilkTest(miceBwtDiff)
Shapiro-Wilk normality test
---------------------------
Population details:
parameter of interest: Squared correlation of data and expected order
statistics of N(0,1) (W)
value under h_0: 1.0
point estimate: 0.94181
Test summary:
outcome with 95% confidence: fail to reject h_0
one-sided p-value: 0.5733
Details:
number of observations: 10
censored ratio: 0.0
W-statistic: 0.94181
There, all normal (p > 0.05). So, we were right to perform the test. Still, the order was incorrect, in general you should remember to check the assumptions first and then proceed with the test. In case the normality assumption did not hold we should consider doing a Wilcoxon test (non-parametric test), e.g. like so Ht.SignedRankTest(df.noDrugX, df.drugX)
or Ht.SignedRankTest(miceBwtDiff)
. More info on the test can be found in the link above or on the pages of HypothesisTests
package (see here).
OK, now it’s time to move to the other experimental model. A reminder, here we discuss the following situation:
noDrugX
) and the other 10 (11:20) received additionally drugX
(hence group drugX
).Here we will compare mice noDrugX
(miceID: 1:10) with mice drugX
(miceID: 11:20) using an unpaired samples t-test, but this time we will start by checking the assumptions.
First the normality assumption.
function getSWtestPval(v::Vector{<:Real})::Float64
return Ht.ShapiroWilkTest(v) |> Ht.pvalue
end
# for brevity we will extract just the p-values
(
getSWtestPval(miceBwt.noDrugX),
getSWtestPval(miceBwt.drugX)
)
(0.6833331724399464, 0.3254417851120679)
OK, no reason to doubt the normality (p-vals > 0.05). The other assumption that we may test is homogeneity of variance. Homogeneity means that the spread of data around the mean in each group is similar (var(gr1) ≈ var(gr2)). Here, we are going to use Fligner-Killeen test from the HypothesisTests
package.
Ht.FlignerKilleenTest(miceBwt.noDrugX, miceBwt.drugX)
Fligner-Killeen test
--------------------
Population details:
parameter of interest: Variances
value under h_0: "all equal"
point estimate: NaN
Test summary:
outcome with 95% confidence: fail to reject h_0
p-value: 1.0000
Details:
number of observations: [10, 10]
FK statistic: 4.76242e-31
degrees of freedom: 1
Also this time, the assumption is fulfilled (p-value
> 0.05), and now for the unpaired test.
Ht.EqualVarianceTTest(
miceBwt.noDrugX, miceBwt.drugX)
Two sample t-test (equal variance)
----------------------------------
Population details:
parameter of interest: Mean difference
value under h_0: 0
point estimate: 1.4
95% confidence interval: (-0.1877, 2.988)
Test summary:
outcome with 95% confidence: fail to reject h_0
two-sided p-value: 0.0804
Details:
number of observations: [10,10]
t-statistic: 1.8525405838431677
degrees of freedom: 18
empirical standard error: 0.7557189365836423
It appears there is not enough evidence to reject the \(H_{0}\) (the mean difference is equal to 0) on the cutoff level of 0.05. So, how could that be, the means in both groups are still the same, i.e. Stats.mean(miceBwt.noDrugX)
= 25.5 and Stats.mean(miceBwt.drugX)
= 24.1, yet we got different results (reject \(H_{0}\) from paired t-test, not reject \(H_{0}\) from unpaired t-test). Well, it is because we calculated slightly different things and because using paired samples usually removes some between subjects variability.
In the case of unpaired t-test we:
Stats.mean(miceBwt.noDrugX) - Stats.mean(miceBwt.drugX)
= 1.4.When compared with the methodology for one-sample t-test from Section 5.2 it differs only with respect to the points 3, 4 and 5 above. Observe. First the functions
function getSem(v1::Vector{<:Real}, v2::Vector{<:Real})::Float64
sem1::Float64 = getSem(v1)
sem2::Float64 = getSem(v2)
return sqrt((sem1^2) + (sem2^2))
end
function getDf(v1::Vector{<:Real}, v2::Vector{<:Real})::Int
return getDf(v1) + getDf(v2)
end
There are different formulas for pooled sem (standard error of the mean), but I only managed to remember this one because it reminded me the famous Pythagorean theorem, i.e. \(c^2 = a^2 + b^2\), so \(c = \sqrt{a^2 + b^2}\), that I learned in a primary school. As for the degrees of freedom they are just the sum of the degrees of freedom for each of the vectors. OK, so now the calculations
meanDiffBwtH0 = 0
meanDiffBwt = Stats.mean(miceBwt.noDrugX) - Stats.mean(miceBwt.drugX)
pooledSemBwt = getSem(miceBwt.noDrugX, miceBwt.drugX)
zScoreBwt = getZScore(meanDiffBwtH0, meanDiffBwt, pooledSemBwt)
dfBwt = getDf(miceBwt.noDrugX, miceBwt.drugX)
pValBwt = Dsts.cdf(Dsts.TDist(dfBwt), zScoreBwt) * 2
And finally the result that you may compare with the output of the unpaired t-test above and the methodology for the one-sample t-test from Section 5.2.
(
meanDiffBwtH0, # value under h_0
round(meanDiffBwt, digits = 4), # point estimate
round(pooledSemBwt, digits = 4), # empirical standard error
# to get a positive zScore we should have calculated it as:
# getZScore(meanDiffBwt, meanDiffBwtH0, pooledSemBwt)
round(zScoreBwt, digits = 4), # t-statistic
dfBwt, # degrees of freedom
round(pValBwt, digits=4) # two-sided p-value
)
(0, 1.4, 0.7557, -1.8525, 18, 0.0804)
Amazing. In the case of the unpaired two-sample t-test we use the same methodology and reasoning as we did in the case of the one-sample t-test from Section 5.2 (only functions for sem
and df
changed slightly). Given the above I recommend you get back to the section Section 5.2 and make sure you understand the explanations presented there (if you haven’t done this already).
As an alternative to our unpaired t-test we should consider Ht.UnequalVarianceTTest
(if the variances are not equal) or Ht.MannWhitneyUTest
(if both the normality and homogeneity assumptions do not hold).