Conditional Probability

Sources:

  1. Jeseph K. Blitzstein & Jessica Hwang. (2019). Conditional propability. Introduction to Probability (2nd ed., pp. 45-79). CRC Press.

Conditional Probability

Notation

Symbol Type Description
\(\Omega\) Set Sample space, or the set of all possible outcomes
\(\mathcal{F}\) Set Sigma-algebra on the sample space \(\Omega\)
\(P\) \(\in [0, 1]\) Probability measure that assigns probabilities to events in \(\mathcal{F}\)
\(A, B\) \(\in \mathcal{F}\) Events in the probability space
\(P(A)\) \(\in [0, 1]\) Probability of event \(A\)
\(P(A \mid B)\) \(\in [0, 1]\) Conditional probability of \(A\) given \(B\)
\(A_1, A_2, \ldots, A_n\) \(\in \mathcal{F}\) Events forming a partition of the sample space
\(\cap\) Operation Intersection of events, e.g., \(A \cap B\)
\(\{A_1, A_2, \ldots, A_n\}\) Partition Collection of events that are disjoint and whose union is the sample space
\(\infty\) \(\in \mathbb{R}\) Denotes infinity

Abbreviations

Abbreviation Description
r.v. Random variable
LOTP Law of Total Probability

Definition

Conditional Probability: Given two events \(A\) and \(B\) with \(P(B) > 0\), the conditional probability of \(A\) given \(B\), denoted \(P(A \mid B)\), is defined as:

\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \]

Here: - \(A\) represents the event whose uncertainty we want to update. - \(B\) represents the evidence we observe or treat as given.

In this context: - \(P(A)\) is referred to as the prior probability of \(A\), and - \(P(A \mid B)\) is the posterior probability of \(A\), where "prior" and "posterior" signify before and after updating based on the evidence, respectively.

Bayes' Rule and the Law of Total Probability

Probability of the Intersection of Two Events

Theorem: For any events \(A\) and \(B\) with \(P(B) > 0\), \[ P(A \cap B) = P(B) P(A \mid B) = P(A) P(B \mid A) \]

Probability of the Intersection of \(n\) Events

Theorem: For events \(A_1, \ldots, A_n\) with \(P(A_1 \cap A_2 \cap \cdots \cap A_{n-1}) > 0\), \[ P(A_1 \cap A_2 \cap \cdots \cap A_n) = P(A_1) P(A_2 \mid A_1) P(A_3 \mid A_1 \cap A_2) \cdots P(A_n \mid A_1 \cap \cdots \cap A_{n-1}) \]

Note: Here, the commas denote intersections. For example, \(P(A_3 \mid A_1, A_2)\) is shorthand for \(P(A_3 \mid A_1 \cap A_2)\).

Bayes' Rule

Theorem: Bayes' rule states that \[ P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)} \] This theorem follows directly from the probability of the intersection of two events.

Law of Total Probability

Theorem: Let \(A_1, \ldots, A_n\) be a partition of the sample space \(S\) (i.e., the events \(A_i\) are disjoint and their union is \(S\)), with \(P(A_i) > 0\) for all \(i\). Then, for any event \(B\), \[ P(B) = \sum_{i=1}^n P(B \mid A_i) P(A_i) \]

Conditional Probabilities Are Probabilities

It can be shown that conditional probabilities satisfy the axioms of probability. Thus:

Conditional probabilities are probabilities.

Note: When we write \(P(A \mid E)\), it does not imply that \(A \mid E\) is an event. Rather, \(P(\cdot \mid E)\) is a probability function that assigns probabilities based on the knowledge that \(E\) has occurred. This is distinct from \(P(\cdot)\), which assigns probabilities without accounting for whether \(E\) has occurred.

Bayes' Rule with Extra Conditioning

Theorem: Given that \(P(A \cap E) > 0\) and \(P(B \cap E) > 0\), we have \[ P(A \mid B, E) = \frac{P(B \mid A, E) P(A \mid E)}{P(B \mid E)} \]

LOTP with Extra Conditioning

Theorem: Let \(A_1, \ldots, A_n\) be a partition of \(\Omega\). If \(P(A_i \cap E) > 0\) for all \(i\), then \[ P(B \mid E) = \sum_{i=1}^n P(B \mid A_i, E) P(A_i \mid E) \]

Independence of Events

We often encounter situations where conditioning on one event changes our beliefs about another event's probability. However, if events provide no information about each other, they are said to be independent.

Definition: Events \(A\) and \(B\) are independent if \[ P(A \cap B) = P(A) P(B) \]

If \(P(A) > 0\) and \(P(B) > 0\), this is equivalent to \[ P(A \mid B) = P(A) \] and also to \(P(B \mid A) = P(B)\).

Prosecutor's Fallacy

Misunderstanding conditional probabilities can lead to significant errors in reasoning. A well-known example is the Prosecutor's Fallacy.

The Prosecutor's Fallacy is the confusion of \(P(A \mid B)\) with \(P(B \mid A)\). Conversely, the Defense Attorney's Fallacy involves neglecting to condition on all relevant evidence.

Example: Sally Clark Case

In 1998, Sally Clark was tried for murder after the sudden deaths of her two sons shortly after birth. During the trial, an expert witness for the prosecution claimed that the probability of a newborn dying from sudden infant death syndrome (SIDS) was \(1 / 8500\), so the probability of two deaths due to SIDS in one family would be \((1 / 8500)^2 \approx 1\) in 73 million. Based on this, the expert concluded that the probability of Clark's innocence was one in 73 million.

Issues with This Reasoning

  1. Independence Assumption: The expert assumed that the two deaths due to SIDS were independent. This assumption would not hold if there were genetic or familial risk factors affecting both children.

  2. Confusion of Conditional Probabilities: The expert confused \(P(\text{evidence} \mid \text{innocence})\) with \(P(\text{innocence} \mid \text{evidence})\). Specifically, the expert calculated \(P(\text{evidence} \mid \text{innocence})\), but what is needed is \(P(\text{innocence} \mid \text{evidence})\), which by Bayes' rule is:

    \[ P(\text{innocence} \mid \text{evidence}) = \frac{P(\text{evidence} \mid \text{innocence}) \cdot P(\text{innocence})}{P(\text{evidence})} \]