Sample Spaces and Conditional Probability

Videos (~35 mins)

Reading (~35 mins)

Warmup (~40 mins)

Problem 1

Imagine that we form a string of \(5\) characters, with each character chosen randomly from the set \(\{a, b, c, d, e, f\}\). We might get a random string like \(adcca\) or \(fbfcd\).

Please answer the following questions:

Part A

Consider the strings \(aaaaa\) and \(cddeb\). Which of these two strings is more likely to occur from this process, and why?

The two strings are equally likely.

Part B

What is the probability of obtaining a string which contains no repeated characters?

There are \(6^5\) possible strings in all, and of these there are \(6!/1! = 6!\) which have no repeated characters. So, the probability is \(6!/6^5\).

Part C

What is the probability of obtaining a string which does not contain the substring \(bad\)?

There are \(6^5\) possible strings. Of these, there are \(3\times 6^2\) substrings which contain \(bad\): there are 3 places we can place the initial \(b\), and then 2 more blanks to fill in, which each have 6 possibilities. So, the probability that the string contains \(bad\) is \(\frac{3}{6^3}\), and the probability that the string does not contain \(bad\) is \(1 - \frac{3}{6^3}\).

Problem 2

Here is a list of probabilities describing the likelihood that a person in the UK was born in each of the 12 months of the year:

probs = [0.083, 0.078, 0.081, 0.081, 0.085, 0.083, 0.087, 0.086, 0.086, 0.084, 0.082, 0.083]
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

For example, there is an 8.3% chance that a UK individual was born in January, a 7.8% chance that they were born in February, etc.

I compiled this data from a chart produced by Niall McCarthy on the website Statista. Technically, the data describes UK residents born during the 20-year period 1995-2014.

You may answer the following two questions using any tools you wish, although my advice is to write short Python programs.

Part A

Suppose that two random citizens of the UK (born between 1995-2014) meet each other on the street. What is the probability that they were born in the same month?

The probability that both individuals are born in January is \(p_{\text{January}}^2\); the probability that both individuals are born in February is \(p_{\text{February}}^2\), etc. We need to sum across all twelve months because the choice of which month both people are born in is disjoint. So, we need to calculate

\[ \begin{aligned} p_{\text{January}}^2 + p_{\text{February}}^2 + \cdots + p_{\text{December}}^2 \end{aligned} \]

Here’s some code that lets us do this:

probs = [0.083, 0.078, 0.081, 0.081, 0.085, 0.083, 0.087, 0.086, 0.086, 0.084, 0.082, 0.083]
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]


prob_same_month = 0
for i in range(len(months)):
    # add the probability that they were both born in the same *specific* month
    # months[i] 
    prob_same_month += probs[i]**2
print(prob_same_month)
0.08323900000000001

This is also a nice opportunity to use list comprehensions if you know that syntax:

print(sum([probs[i]**2 for i in range(len(months))]))
0.08323900000000001

Part B

Suppose that two people meet and tell you that they have the same birthmonth. What is the probability that their birthmonth is April, given this information?

Let \(E\) be the event that both people are born in April, and let \(F\) be the event that they both share a birthmonth. We want to compute

\[ \begin{aligned} p(E|F) = \frac{p(E \cap F)}{p(F)}\;. \end{aligned} \]

Let’s compute each of these two terms. First, \(p(F)\) is just what we computed in Part A. \(p(E\cap F)\) is the probability that these people were born in the same month and that their birthmonths are both April. This is just the same as the probability that they were both born in April. So, \(p(E\cap F) = p_{\text{April}}^2\).

Here’s some Python code to calculate this:

print(probs[3]**2 / sum([probs[i]**2 for i in range(len(months))]))
0.07882122562741022

Part C

Suppose that someone tells you that they share a birthmonth with the first random person they met on the street today. Does that make it more, less, or equally likely that they were born in April?

Less likely, by comparing our answer from Part B to the overall rate of births in April.

Part D

Are the events “being born in April” and “sharing a birthmonth with the first person you meet on the street” independent?

No, because conditioning on the second event changes the probability of the first.



© Phil Chodrow, 2023