*Disclaimer: I am writing this because I finally understood how Bayes’ Theorem works. Therefore, this is much more a reference note to myself than anything else, but I am publishing the information here because I think it might be useful for other people. I am no math expert (as can be deduced from the fact that I only understood the theorem now, after having seen it many times), so please let me know if you find any mistakes in my explanation.*

Bayes’ Theorem states that

where P(B|A) is the probability of event B occurring, given event A occurred; P(A|B) is the probability of event A occurring, given event B occurred; P(B) is the probability of only event B occurring; and P(A) is the probability of only event A occurring.

I have seen, and even used, this formula a number of times in my life, but I had never really understood what it was saying. But before I explain it, I will introduce a few basic concepts.

**Relative frequency**

Given an experiment with two, non-mutually exclusive possible outcomes, A and B, and n repetitions of that experiment, and let n_{1} be the number of occurrences of event A alone, n_{2} the number of occurrences of event B alone and n_{3} the number of occurrences of events A and B simultaneosly, the relative frequency of the occurrence of event A, or probability P(A) of event A, is given by

where n_{A} is the number of times event A occurred. Likewise, the relative frequency of the occurrence of event B, or probability P(B) of event B, is given by

where n_{B} is the number of times event B occurred. Finally, the relative frequency of the occurrence of both events, or probability P(AB) of events A and B, is given by

**Conditional probability**

The relative frequency of event A occurring, given event B occurred, is given by

Notice that in the denominator, we account for the occurrences of event B alone and of event B alongside with event A. The lower the value of n_{B} (recall n_{B} is the relative frequency of event B occurring alone), the higher the ratio of the equation above will be (yelding 1 when n_{B} is zero, i.e., event B occurs only when event A occurs). If B occurs more often alongside with A than alone, then the probability of A occurring when B occurs will be higher.

Likewise,

n_{A|B} and n_{B|A} may also be denoted P(A|B) and P(B|A), respectively. P(A|B) is read as “the probability of event A, given B”.

Given the above equations, we observe that

or

Now we’re ready to understand Bayes’ Theorem

**Bayes’ Theorem**

As mentioned in the beggining of this post, Bayes’ Theorem states that:

Now, here’s how you should interpret it. A better way to visualize it is to write it as

given the last equation from the previous section. The explanation is similar to the one given for the equation for n_{A|B} in the previous section. If P(A) is close to P(AB) (recall P(A) is the probability of event A alone **or** alongside with event B), that means most occurrences of event A happen when event B also occurs. From that, we can intuitively conclude that event B will very likely occur given event A occured, since the occurrence of event A is strongly related to the occurrence of **both** A and B.

The explanation above can be expressed in terms of very informal (but I believe reasonable) logical statements:

1. A and B may occur

2. A tends to occur only when B also occurs

3. A occurred

4. It’s likely B will also occur

**Note:** this explanation is strongly based on the online tutorials for Digital Image Processing, by Gonzales & Woods. I used the same notation and terms. However, my objective here was to add the clarifying (at least for me!) explanations.