26.2 Solution

The following solution is based on a computer simulation and relies on a few assumptions, namely:

  1. there are always exactly 365 days in a year (no country for leap years)
  2. a person is equally likely to have been born on any day of a year (no seasonal, yearly and other patterns, etc.)
  3. the birth date of one person does not influence the birth date of another person (not twins, siblings, etc.)

None of the above is exactly true, still, those are some reasonable assumptions that will allow us to knack the problem.

OK, let’s begin.

import Random as Rnd

function getBdaysAtParty(nPeople::Int)::Vec{Int}
    @assert 3 < nPeople < 366 "nPeople must be in range [4-365]"
    return Rnd.rand(1:365, nPeople)
end

function getCounts(v::Vector{T})::Dict{T,Int} where T
    counts::Dict{T,Int} = Dict()
    for elt in v
        counts[elt] = get(counts, elt, 0) + 1
    end
    return counts
end

We start by defining two simple functions. getBdaysAtParty returns a random set of birthdays (Vec{Int}) drawn from all the possible days in a year (1:365, with possible repetitions). On the other hand, getCounts is a function that I copied-pasted from my previous book. It does what it promises, i.e. it returns a summary statistics (counts) that tells us how many times a given day in the birthdays (result of getBdaysAtParty) appears.

Time to find a way to determine does the event that we are looking for occured during the party.

function nSameBdays(counts::Dict{Int, Int}, n::Int=2)::Bool
    @assert 1 < n < 366 "n must be in range [2-365]"
    return isnothing(findfirst((>=)(n), counts)) ? false : true
end

function anyMyBday(counts::Dict{Int, Int}, myBday::Int=1)::Bool
    @assert 0 < myBday < 366 "myBDay must be in range [1-365]"
    return haskey(counts, myBday)
end

First, we check for nSameBdays with the build in findfirst. That last function accepts a predicate and a collection (like a vector or a dictionary). The predicate is a function that takes an element of a collection (for a dictionary one of its values) as an input and returns a Bool. findfirst brings back the first index (for vectors) or the first key (for dictionaries) for which predicate(vector's element) or predicate(dictionary's value) is true, respectively. Often a predicate is just an anonymous function, which in our case could be key -> key >= n. Instead, here I used a partial function application. The (>=)(n) is an equivalent of the >= n mentioned in the previous sentence. It is just a function applied to one argument. The function still lacks an argument on its left site, just before the >. This missing argument will be any element of our collection that is currently being tested. Anyway, one more point to notice, if findfirst finds no element for which the predicate is true then it returns nothing. Therefore, to get our final answer we send the possible result through isnothing and a ternary expression . Technically, we could have shorten the last line to return !isnothing(firndfirst((>=)(n), counts)) (or even without the return), but I thought that this might be too much condensation for a one line of code.

Our second function, anyMyBday, is much simpler. It just uses haskey to check if our day of birth occurred in the birthdays of the guests at our party (counts). Fun, fact, under our assumptions, it does not matter whether you were born on the first or 365th day of the year. Go ahead, change the default value to your actual day of birth and compare the result with the one provided later in this chapter.

OK, time to estimate the probability of success, i.e. the ratio between the number of times the event took place to the total trials (number of simulations).

# isEventFn(Dict{Int, Int}) -> Bool
function getProbSuccess(nPeople::Int, isEventFn::Function,
                        nSimulations::Int=100_000)::Flt
    @assert 3 < nPeople < 366 "nPeople must be in range [4-365]"
    @assert 1e4 <= nSimulations <= 1e6 "nSimulations not in range [1e4-1e6]"
    successes::Vec{Bool} = Vec{Bool}(undef, nSimulations)
    for i in 1:nSimulations
        peopleBdays = getBdaysAtParty(nPeople)
        counts = getCounts(peopleBdays)
        eventOccured = isEventFn(counts)
        successes[i] = eventOccured
    end
    return sum(successes)/nSimulations
end

First, we initialize the vector that will hold the results of our simulations (successes). The Vec{Bool}(undef, nSimulations) creates a vector of size specified in nSimulations that is currently occupied by some unspecified (undef like undefined) values (basically garbage that is currently in a specific place of our computer’s memory). We will fill the successes with the values of interest in the for loop. For each simulation, we throw a party and get the guests’ birthdays (peopleBdays). We obtain counts for them and test did the event that we consider a success occurred that time (with isEventFn). We place the occurrence into the proper spot ([i]) of our vector of successes. Notice, that here peopleBdays, counts, and eventOccured are local (helper) variables that are visible only in the for loop. Anyway, all that’s left to do is to calculate the probability. The sum function treats any true as 1 and false as 0, hence it returns the number of successes, which we divide by the number of simulations.

Let’s see how it works.

Rnd.seed!(101)
peopleAtParty = 5:30
probsAnySameBdays = [getProbSuccess(n, nSameBdays) for n in peopleAtParty]
probsMyBday = [getProbSuccess(n, anyMyBday) for n in peopleAtParty]
Figure 7: Birthday paradox probabilities.

So, with 23 people at a party the probability that any 2 of them have a birthday on the same day is roughly 50% or 0.5. The probability, that any person was born the same day that I have been born is around 8% or 0.08 for 30 people at a party. Actually, this last probability is fairly straightforward to calculate. For the reasons explained here it is just \(\frac{1}{365}*{n\ people\ at\ party}\) so our mean absolute error is roughly equal to:

import Statistics as St

probsMyBdayTheor = (1/365) .* peopleAtParty
(probsMyBday .- probsMyBdayTheor) .|> abs |> St.mean

0.0013415015806111702

which isn’t all that bad (we differ by a tenth part of a percent).

As a mini-exercise you may tweak the code a little and estimate the probability that any 3 people at a party were born on the same day.



CC BY-NC-SA 4.0 Bartlomiej Lukaszuk