Develop an Intuition for Bayes Theorem With Worked Examples

Author: Jason Brownlee

Bayes Theorem provides a principled way for calculating a conditional probability.

It is a deceptively simple calculation, providing a method that is easy to use for scenarios where our intuition often fails.

The best way to develop an intuition for Bayes Theorem is to think about the meaning of the terms in the equation and to apply the calculation many times in a range of different real-world scenarios. This will provide the context for what is being calculated and examples that can be used as a starting point when applying the calculation in new scenarios in the future.

In this tutorial, you will discover an intuition for calculating Bayes Theorem by working through multiple realistic scenarios.

After completing this tutorial, you will know:

Bayes Theorem is a technique for calculating a conditional probability.
The common and helpful names used for the terms in the Bayes Theorem equation.
How to work through three realistic scenarios using Bayes Theorem to find a solution.

Let’s get started.

How to Develop an Intuition for Bayes Theorem With Worked Examples
Phoo by Bureau of Land Management, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

Introduction to Bayes Theorem
Naming the Terms in the Theorem
Example 1: Elderly Fall and Death
Example 2: Email and Spam Detection
Example 3: Liars and Lie Detectors

Introduction to Bayes Theorem

Conditional probability is the probability of one event given the occurrence of another event, often described in terms of events A and B from two dependent random variables e.g. X and Y.

Conditional Probability: Probability of one (or more) event given the occurrence of another event, e.g. P(A given B) or P(A | B).

The conditional probability can be calculated using the joint probability; for example:

P(A | B) = P(A and B) / P(B)

The conditional probability is not symmetrical; for example:

P(A | B) != P(B | A)

Nevertheless, one conditional probability can be calculated using the other conditional probability.

This is called Bayes Theorem, named for Reverend Thomas Bayes, and can be stated as follows:

P(A|B) = P(B|A) * P(A) / P(B)

Bayes Theorem provides a principled way for calculating a conditional probability and an alternative to using the joint probability.

This alternate approach to calculating the conditional probability is useful either when the joint probability is challenging to calculate, or when the reverse conditional probability is available or easy to calculate.

Bayes Theorem: Principled way of calculating a conditional probability without the joint probability.

It is often the case that we do not have access to the denominator directly, e.g. P(B).

We can calculate it an alternative way; for example:

P(B) = P(B|A) * P(A) + P(B|not A) * P(not A)

This gives a formulation of Bayes Theorem that we can use that uses the alternate calculation of P(B), described below:

P(A|B) = P(B|A) * P(A) / P(B|A) * P(A) + P(B|not A) * P(not A)

Note: the denominator is simply the expansion we gave above.

As such, if we have P(A), then we can calculate P(not A) as its complement; for example:

P(not A) = 1 – P(A)

Additionally, if we have P(not B|not A), then we can calculate P(B|not A) as its complement; for example:

P(B|not A) = 1 – P(not B|not A)

Now that we are familiar with the calculation of Bayes Theorem, let’s take a closer look at the meaning of the terms in the equation.

Naming the Terms in the Theorem

The terms in the Bayes Theorem equation are given names depending on the context where the equation is used.

It can be helpful to think about the calculation from these different perspectives and help to map your problem onto the equation.

Firstly, in general, the result P(A|B) is referred to as the posterior probability and P(A) is referred to as the prior probability.

P(A|B): Posterior probability.
P(A): Prior probability.

Sometimes P(B|A) is referred to as the likelihood and P(B) is referred to as the evidence.

P(B|A): Likelihood.
P(B): Evidence.

This allows Bayes Theorem to be restated as:

Posterior = Likelihood * Prior / Evidence

We can make this clear with a smoke and fire case.

What is the probability that there is fire given that there is smoke?

Where P(Fire) is the Prior, P(Smoke|Fire) is the Likelihood, and P(Smoke) is the evidence:

P(Fire|Smoke) = P(Smoke|Fire) * P(Fire) / P(Smoke)

You can imagine the same situation with rain and clouds.

We can also think about the calculation in the terms of a binary classifier.

For example, P(B|A) may be referred to as the True Positive Rate (TPR) or the sensitivity, P(B|not A) may be referred to as the False Positive Rate (FPR), the complement P(not B|not A) may be referred to as the True Negative Rate (TNR) or specificity, and the value we are calculating P(A|B) may be referred to as the Positive Predictive Value (PPV) or precision.

P(not B|not A): True Negative Rate or TNR (specificity).
P(B|not A): False Positive Rate or FPR.
P(not B|A): False Negative Rate or FNR.
P(B|A): True Positive Rate or TPR (sensitivity or recall).
P(A|B): Positive Predictive Value or PPV (precision).

For example, we may re-state the calculation using these terms as follows:

PPV = (TPV * P(A)) / (TPR * P(A) + FPR * P(not A))

This is a useful perspective on Bayes Theorem and is elaborated further in the tutorial:

A Gentle Introduction to Bayes Theorem for Machine Learning

Now that we are familiar with Bayes Theorem and the meaning of the terms, let’s look at some scenarios where we can calculate it.

Note that all of the following examples are contrived; they are not based on real-world probabilities.

Example 1: Elderly Fall and Death

Consider the case where an elderly person (over 80 years of age) falls; what is the probability that they will die from the fall?

Let’s assume that the base rate of someone elderly dying P(A) is 10%, and the base rate for elderly people falling P(B) is 5%, and from all elderly people, 7% of those that die had a fall P(B|A).

Let’s plug what we know into the theorem:

P(A|B) = P(B|A) * P(A) / P(B)
P(Die|Fall) = P(Fall|Die) * P(Die) / P(Fall)

P(Die|Fall) = 0.07 * 0.10 / 0.05
P(Die|Fall) = 0.14

That is, if an elderly person falls, then there is a 14 percent probability that they will die from the fall.

To make this concrete, we can perform the calculation in Python, first defining what we know, then using Bayes Theorem to calculate the outcome.

The complete example is listed below.

# calculate P(A|B) given P(B|A), P(A) and P(B)
def bayes_theorem(p_a, p_b, p_b_given_a):
	# calculate P(A|B) = P(B|A) * P(A) / P(B)
	p_a_given_b = (p_b_given_a * p_a) / p_b
	return p_a_given_b

# P(A)
p_a = 0.10
# P(B)
p_b = 0.05
# P(B|A)
p_b_given_a = 0.07
# calculate P(A|B)
result = bayes_theorem(p_a, p_b, p_b_given_a)
# summarize
print('P(A|B) = %.3f%%' % (result * 100))

Running the example confirms the value we calculated manually.

P(A|B) = 14%

Example 2: Email and Spam Detection

Consider the case where we receive an email and the spam detector puts it in the spam folder; what is the probability it was spam?

Let’s assume some details such as 2 percent of the email we receive is spam P(A). Let’s assume that the spam detector is really good and when an email is spam, it detects it P(B|A) with an accuracy of 99 percent, and when an email is not spam, it will mark it as spam with a very low rate of 0.1 percent P(B|not A).

Let’s plug what we know into the theorem:

P(A|B) = P(B|A) * P(A) / P(B)
P(Spam|Detected) = P(Detected|Spam) * P(Spam) / P(Detected)

P(Spam|Detected) = 0.99 * 0.02 / P(Detected)

We don’t know P(B), that is P(Detected), but we can calculate it using:

P(B) = P(B|A) * P(A) + P(B|not A) * P(not A)

Or in terms of our problem:

P(Detected) = P(Detected|Spam) * P(Spam) + P(Detected|not Spam) * P(not Spam)

We know P(Detected|not Spam), which is 0.1 percent and we can calculate P(not Spam) as 1 – P(Spam); for example:

P(not Spam) = 1 – P(Spam)
P(not Spam) = 1 – 0.02
P(not Spam) = 0.98

Therefore, we can calculate P(Detected) as:

P(Detected) = 0.99 * 0.02 + 0.001 * 0.98
P(Detected) = 0.0198 + 0.00098
P(Detected) = 0.02078

That is, about 2 percent of all emails are detected as spam, regardless of whether they are spam or not.

Now we can calculate the answer as:

P(Spam|Detected) = 0.99 * 0.02 / 0.02078
P(Spam|Detected) = 0.0198 / 0.02078
P(Spam|Detected) = 0.95283926852743

That is, if an email is in the spam folder, there is a 95.2 percent probability that it is, in fact, spam.

Again, let’s confirm this result by calculating it with an example in Python.

The complete example is listed below.

# calculate the probability of an email in the spam folder being spam

# calculate P(A|B) given P(A), P(B|A), P(B|not A)
def bayes_theorem(p_a, p_b_given_a, p_b_given_not_a):
	# calculate P(not A)
	not_a = 1 - p_a
	# calculate P(B)
	p_b = p_b_given_a * p_a + p_b_given_not_a * not_a
	# calculate P(A|B)
	p_a_given_b = (p_b_given_a * p_a) / p_b
	return p_a_given_b

# P(A)
p_a = 0.02
# P(B|A)
p_b_given_a = 0.99
# P(B|not A)
p_b_given_not_a = 0.001
# calculate P(A|B)
result = bayes_theorem(p_a, p_b_given_a, p_b_given_not_a)
# summarize
print('P(A|B) = %.3f%%' % (result * 100))

Running the example gives the same result, confirming our manual calculation.

P(A|B) = 95.284%

Example 3: Liars and Lie Detectors

Consider the case where a person is tested with a lie detector and the test suggests they are lying. What is the probability that the person is indeed lying?

Let’s assume some details, such as most people that are tested are telling the truth, such as 98 percent, meaning (1 – 0.98) or 2 percent are liars P(A). Let’s also assume that when someone is lying, that the test can detect them well, but not great, such as 72 percent of the time P(B|A). Let’s also assume that when the machine says they are not lying, this is true 97 percent of the time P(not B | not A).

Let’s plug what we know into the theorem:

P(A|B) = P(B|A) * P(A) / P(B)
P(Lying|Positive) = P(Positive|Lying) * P(Lying) / P(Positive)

Or:

P(Lying|Positive) = 0.72 * 0.02 / P(Positive)

Again, we don’t know P(B), or in this case how often the detector returns a positive result in general.

We can calculate this using the formula:

P(B) = P(B|A) * P(A) + P(B|not A) * P(not A)

Or:

P(Positive) = P(Positive|Lying) * P(Lying) + P(Positive|not Lying) * P(not Lying)

Or, with numbers:

P(Positive) = 0.72 * 0.02 + P(Positive|not Lying) * (1 – 0.02)
P(Positive) = 0.72 * 0.02 + P(Positive|not Lying) * 0.98

In this case, we don’t know the probability of a positive detection result given that the person was not lying; that is we don’t know the false positive rate or the false alarm rate.

This can be calculated as follows:

P(B|not A) = 1 – P(not B|not A)

Or:

P(Positive|not Lying) = 1 – P(not Positive|not Lying)
P(Positive|not Lying) = 1 – 0.97
P(Positive|not Lying) = 0.03

Therefore, we can calculate P(B) or P(Positive) as:

P(Positive) = 0.72 * 0.02 + 0.03 * 0.98
P(Positive) = 0.0144 + 0.0294
P(Positive) = 0.0438

That is, the test returns a positive result about 4 percent of the time, regardless of whether the person is lying or not.

We can now calculate Bayes Theorem for this scenario:

P(Lying|Positive) = 0.72 * 0.02 / 0.0438
P(Lying|Positive) = 0.0144 / 0.0438
P(Lying|Positive) = 0.328767123287671

That is, if the lie detector test comes back with a positive result, then there is a 32.8 percent probability that they are, in fact, lying. It’s a poor test!

Finally, let’s confirm this calculation in Python.

The complete example is listed below.

# calculate the probability of a person lying given a positive lie detector result

# calculate P(A|B) given P(A), P(B|A), P(not B|not A)
def bayes_theorem(p_a, p_b_given_a, p_not_b_given_not_a):
	# calculate P(not A)
	not_a = 1 - p_a
	# calculate P(B|not A)
	p_b_given_not_a = 1 - p_not_b_given_not_a
	# calculate P(B)
	p_b = p_b_given_a * p_a + p_b_given_not_a * not_a
	# calculate P(A|B)
	p_a_given_b = (p_b_given_a * p_a) / p_b
	return p_a_given_b

# P(A), base rate
p_a = 0.02
# P(B|A)
p_b_given_a = 0.72
# P(not B| not A)
p_not_b_given_not_a = 0.97
# calculate P(A|B)
result = bayes_theorem(p_a, p_b_given_a, p_not_b_given_not_a)
# summarize
print('P(A|B) = %.3f%%' % (result * 100))

Running the example gives the same result, confirming our manual calculation.

P(A|B) = 32.877%

Summary

In this tutorial, you discovered an intuition for calculating Bayes Theorem by working through multiple realistic scenarios.

Specifically, you learned:

Bayes Theorem is a technique for calculating a conditional probability.
The common and helpful names used for the terms in the Bayes Theorem equation.
How to work through three realistic scenarios using Bayes Theorem to find a solution.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post Develop an Intuition for Bayes Theorem With Worked Examples appeared first on Machine Learning Mastery.

Go to Source

Tutorial Overview

Introduction to Bayes Theorem

Naming the Terms in the Theorem

Example 1: Elderly Fall and Death

Example 2: Email and Spam Detection

Example 3: Liars and Lie Detectors

Further Reading

Summary