Two samples Student’s t-test - Romeo and Julia, where Romeo is Basic Statistics

5.3 Two samples Student’s t-test

Imagine a friend that studies biology told you that he had conducted a research in order to write a dissertation and earn a master’s degree. As part of the research he tested a new drug (drug X) on mice. He hopes the drug is capable to reduce the body weights of the animals (and if so, then in a distant future it might be even tested on humans). He asks you for help with the data analysis. The results obtained by him are as follows.

import CSV as Csv
import DataFrames as Dfs

# if you are in 'code_snippets' folder, then use: "./ch05/miceBwt.csv"
# if you are in 'ch05' folder, then use: "./miceBwt.csv"
miceBwt = Csv.read("./code_snippets/ch05/miceBwt.csv", Dfs.DataFrame)
first(miceBwt, 3)

Table 1: Body mass [g] of mice (fictitious data).
noDrugX	drugX
26	26
26	25
24	25

Note: The path specification above should work fine on GNU/Linux operating systems. I don’t know about other OSs.

Here, we opened a table with a made up data for mice body weight [g] (this data set can be found here). For that we used two new packages (CSV, and DataFrames).

A *.csv file can be opened and created, e.g. with a spreadsheet program. Here, we read it as a DataFrame, i.e. a structure that resembles an array from Section 3.3.7. Since the DataFrame could potentially have thousands of rows we displayed only the first three (to check that everything succeeded) using the first function.

Note: We can check the size of a DataFrame with size function which returns the information in a friendly (numRows, numCols) format.

OK, let’s take a look at some descriptive statistics using describe function.

Dfs.describe(miceBwt)

Table 2: Body mass of mice. Descriptive statistics.
variable	mean	min	median	max	nmissing	eltype
noDrugX	25.5	23	25.5	29	0	Int64
drugX	24.1	21	24.5	26	0	Int64

It appears that mice from group drugX got somewhat lower body weight. But that could be just a coincidence. Anyway, how should we analyze this data? Well, it depends on the experiment design.

Since we have 10 rows (size(miceBwt)[1]). Then, either:

we had 10 mice at the beginning. The mice were numbered randomly 1:10 on their tails. Then we measured their initial weight (noDrugX), administered the drug and measured their body weight after, e.g. one week (drugX), or
we had 20 mice at the beginning. The mice were numbered randomly 1:20 on their tails. Then first 10 of them (numbers 1:10) became controls (regular food, group: noDrugX) and the other 10 (11:20) received additionally drugX (hence group drugX).

Interestingly, the experimental models deserve slightly different statistical methodology. In the first case we will perform a paired samples t-test, whereas in the other case we will use an unpaired samples t-test. Ready, let’s go.

5.3.1 Paired samples Student’s t-test

Running a paired Student’s t-test with HypothesisTests package is very simple. We just have to send the specific column(s) to the appropriate function. Column selection can be done in one of the few ways, e.g. miceBwt[:, "noDrugX"] (similarly to array indexing in Section 3.3.7 : means all rows, note that this form copies the column), miceBwt[!, "noDrugX"] (! instead of :, no copying), miceBwt.noDrugX (again, no copying).

Note: Copying a column is advantageous when a function may modify the input data, but it is less effective for big data frames. If you wonder does a function changes its input then for starter look at its name and compare it with the convention we discussed in Section 3.4.4. Still, to be sure you would have to examine the function’s code.

And now we can finally run the paired t-test.

# miceBwt.noDrugX or miceBwt.noDrugX returns a column as a Vector
Ht.OneSampleTTest(miceBwt.noDrugX, miceBwt.drugX)

One sample t-test
-----------------
Population details:
    parameter of interest:   Mean
    value under h_0:         0
    point estimate:          1.4
    95% confidence interval: (0.04271, 2.757)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           0.0445

Details:
    number of observations:   10
    t-statistic:              2.3333333333333335
    degrees of freedom:       9
    empirical standard error: 0.6

And voila. We got the result. It seems that drugX actually does lower the body mass of the animals (\(p \le 0.05\)). But wait, didn’t we want to do a (paired) two-samples t-test and not OneSampleTTest? Yes, we did. Interestingly enough, a paired t-test is actually a one-sample t-test for the difference. Observe.

# miceBwt.noDrugX or miceBwt.noDrugX returns a column as a Vector
# hence we can do element-wise subtraction using dot syntax
miceBwtDiff = miceBwt.noDrugX .- miceBwt.drugX
Ht.OneSampleTTest(miceBwtDiff)

One sample t-test
-----------------
Population details:
    parameter of interest:   Mean
    value under h_0:         0
    point estimate:          1.4
    95% confidence interval: (0.04271, 2.757)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           0.0445

Details:
    number of observations:   10
    t-statistic:              2.3333333333333335
    degrees of freedom:       9
    empirical standard error: 0.6

Here, we used the familiar dot syntax from Section 3.6.5 to obtain the differences and then fed the result to OneSampleTTest from the previous section (see Section 5.2). The output is the same as in the previous code snippet.

I don’t know about you, but when I was a student I often wondered when to choose a paired and when an unpaired t-test. Now I finally know, and it is so simple. Too bad that most statistical programs/packages separate paired t-test from one-sample t-test (unlike the authors of the HypothesisTests package).

Anyway, this also demonstrates an important feature of the data. The data points in both columns/groups need to be properly ordered, e.g. in our case it makes little sense to subtract body mass of a mouse with 1 on its tail from a mouse with 5 on its tail, right? Doing so has just as little sense as subtracting it from mouse number 6, 7, 8, etc. There is only one clearly good way to do this subtraction and this is to subtract mouse number 1 (drugX) from mouse number 1 (noDrugX). So, if you ever wonder a paired or unpaired t-test then think if there is a clearly better way to subtract one column of data from the other. If so, then you should go with the paired t-test, otherwise choose the unpaired t-test.

BTW, do you remember how in Section 5.2.2 we checked the assumptions of our oneSampleTTest, well it turns out that here we should do the same. However, this time instead of Kolmogorov-Smirnov test I’m going to use Shapiro-Wilk’s normality test from HypothesisTests package (generally Shapiro-Wilk is more powerful).

Ht.ShapiroWilkTest(miceBwtDiff)

Shapiro-Wilk normality test
---------------------------
Population details:
    parameter of interest:   Squared correlation of data and expected order
                             statistics of N(0,1) (W)
    value under h_0:         1.0
    point estimate:          0.94181

Test summary:
    outcome with 95% confidence: fail to reject h_0
    one-sided p-value:           0.5733

Details:
    number of observations: 10
    censored ratio:         0.0
    W-statistic:            0.94181

There, all normal (p > 0.05). So, we were right to perform the test. Still, the order was incorrect, in general you should remember to check the assumptions first and then proceed with the test. In case the normality assumption did not hold we should consider doing a Wilcoxon test (non-parametric test), e.g. like so Ht.SignedRankTest(df.noDrugX, df.drugX) or Ht.SignedRankTest(miceBwtDiff). More info on the test can be found in the link above or on the pages of HypothesisTests package (see here).

5.3.2 Unpaired samples Student’s t-test

OK, now it’s time to move to the other experimental model. A reminder, here we discuss the following situation:

we had 20 mice at the beginning. The mice were numbered randomly 1:20 on their tails. Then first 10 of them (numbers 1:10) became controls (regular food, group: noDrugX) and the other 10 (11:20) received additionally drugX (hence group drugX).

Here we will compare mice noDrugX (miceID: 1:10) with mice drugX (miceID: 11:20) using an unpaired samples t-test, but this time we will start by checking the assumptions.

First the normality assumption.

function getSWtestPval(v::Vector{<:Real})::Float64
    return Ht.ShapiroWilkTest(v) |> Ht.pvalue
end

# for brevity we will extract just the p-values
(
    getSWtestPval(miceBwt.noDrugX),
    getSWtestPval(miceBwt.drugX)
)

(0.6833331724399464, 0.3254417851120679)

OK, no reason to doubt the normality (p-vals > 0.05). The other assumption that we may test is homogeneity of variance. Homogeneity means that the spread of data around the mean in each group is similar (var(gr1) ≈ var(gr2)). Here, we are going to use Fligner-Killeen test from the HypothesisTests package.

Ht.FlignerKilleenTest(miceBwt.noDrugX, miceBwt.drugX)

Fligner-Killeen test
--------------------
Population details:
    parameter of interest:   Variances
    value under h_0:         "all equal"
    point estimate:          NaN

Test summary:
    outcome with 95% confidence: fail to reject h_0
    p-value:                     1.0000

Details:
    number of observations: [10, 10]
    FK statistic:           4.76242e-31
    degrees of freedom:     1

Also this time, the assumption is fulfilled (p-value > 0.05), and now for the unpaired test.

Ht.EqualVarianceTTest(
    miceBwt.noDrugX, miceBwt.drugX)

Two sample t-test (equal variance)
----------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          1.4
    95% confidence interval: (-0.1877, 2.988)

Test summary:
    outcome with 95% confidence: fail to reject h_0
    two-sided p-value:           0.0804

Details:
    number of observations:   [10,10]
    t-statistic:              1.8525405838431677
    degrees of freedom:       18
    empirical standard error: 0.7557189365836423

It appears there is not enough evidence to reject the \(H_{0}\) (the mean difference is equal to 0) on the cutoff level of 0.05. So, how could that be, the means in both groups are still the same, i.e. Stats.mean(miceBwt.noDrugX) = 25.5 and Stats.mean(miceBwt.drugX) = 24.1, yet we got different results (reject \(H_{0}\) from paired t-test, not reject \(H_{0}\) from unpaired t-test). Well, it is because we calculated slightly different things and because using paired samples usually removes some between subjects variability.

In the case of unpaired t-test we:

assume that the difference between the means under \(H_{0}\) is equal to 0.
calculate the observed difference between the means, Stats.mean(miceBwt.noDrugX) - Stats.mean(miceBwt.drugX) = 1.4.
calculate the sem (with a slightly different formula than for the one-sample/paired t-test)
obtain the z-score (in case of t-test it is named t-score or t-statistics)
calculate the probability for the t-statistics (slightly different calculation of the degrees of freedom)

When compared with the methodology for one-sample t-test from Section 5.2 it differs only with respect to the points 3, 4 and 5 above. Observe. First the functions

function getSem(v1::Vector{<:Real}, v2::Vector{<:Real})::Float64
    sem1::Float64 = getSem(v1)
    sem2::Float64 = getSem(v2)
    return sqrt((sem1^2) + (sem2^2))
end

function getDf(v1::Vector{<:Real}, v2::Vector{<:Real})::Int
    return getDf(v1) + getDf(v2)
end

There are different formulas for pooled sem (standard error of the mean), but I only managed to remember this one because it reminded me the famous Pythagorean theorem, i.e. \(c^2 = a^2 + b^2\), so \(c = \sqrt{a^2 + b^2}\), that I learned in a primary school. As for the degrees of freedom they are just the sum of the degrees of freedom for each of the vectors. OK, so now the calculations

meanDiffBwtH0 = 0
meanDiffBwt = Stats.mean(miceBwt.noDrugX) - Stats.mean(miceBwt.drugX)
pooledSemBwt = getSem(miceBwt.noDrugX, miceBwt.drugX)
zScoreBwt = getZScore(meanDiffBwtH0, meanDiffBwt, pooledSemBwt)
dfBwt = getDf(miceBwt.noDrugX, miceBwt.drugX)
pValBwt = Dsts.cdf(Dsts.TDist(dfBwt), zScoreBwt) * 2

And finally the result that you may compare with the output of the unpaired t-test above and the methodology for the one-sample t-test from Section 5.2.

(
    meanDiffBwtH0, # value under h_0
    round(meanDiffBwt, digits = 4), # point estimate
    round(pooledSemBwt, digits = 4), # empirical standard error
    # to get a positive zScore we should have calculated it as:
    # getZScore(meanDiffBwt, meanDiffBwtH0, pooledSemBwt)
    round(zScoreBwt, digits = 4), # t-statistic
    dfBwt, # degrees of freedom
    round(pValBwt, digits=4) # two-sided p-value
)

(0, 1.4, 0.7557, -1.8525, 18, 0.0804)

Amazing. In the case of the unpaired two-sample t-test we use the same methodology and reasoning as we did in the case of the one-sample t-test from Section 5.2 (only functions for sem and df changed slightly). Given the above I recommend you get back to the section Section 5.2 and make sure you understand the explanations presented there (if you haven’t done this already).

As an alternative to our unpaired t-test we should consider Ht.UnequalVarianceTTest (if the variances are not equal) or Ht.MannWhitneyUTest (if both the normality and homogeneity assumptions do not hold).

5.2 One sample Student’s t.. ← → 5.4 One-way ANOVA

CC BY-NC-SA 4.0 Bartlomiej Lukaszuk