Conditional Probability
Sources:
- Jeseph K. Blitzstein & Jessica Hwang. (2019). Conditional propability. Introduction to Probability (2nd ed., pp. 45-79). CRC Press.
Conditional Probability
Notation
Symbol | Type | Description |
---|---|---|
\(\Omega\) | Set | Sample space, or the set of all possible outcomes |
\(\mathcal{F}\) | Set | Sigma-algebra on the sample space \(\Omega\) |
\(P\) | \(\in [0, 1]\) | Probability measure that assigns probabilities to events in \(\mathcal{F}\) |
\(A, B\) | \(\in \mathcal{F}\) | Events in the probability space |
\(P(A)\) | \(\in [0, 1]\) | Probability of event \(A\) |
\(P(A \mid B)\) | \(\in [0, 1]\) | Conditional probability of \(A\) given \(B\) |
\(A_1, A_2, \ldots, A_n\) | \(\in \mathcal{F}\) | Events forming a partition of the sample space |
\(\cap\) | Operation | Intersection of events, e.g., \(A \cap B\) |
\(\{A_1, A_2, \ldots, A_n\}\) | Partition | Collection of events that are disjoint and whose union is the sample space |
\(\infty\) | \(\in \mathbb{R}\) | Denotes infinity |
Abbreviations
Abbreviation | Description |
---|---|
r.v. | Random variable |
LOTP | Law of Total Probability |
Definition
Conditional Probability: Given two events \(A\) and \(B\) with \(P(B) > 0\), the conditional probability of \(A\) given \(B\), denoted \(P(A \mid B)\), is defined as:
\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \]
Here: - \(A\) represents the event whose uncertainty we want to update. - \(B\) represents the evidence we observe or treat as given.
In this context: - \(P(A)\) is referred to as the prior probability of \(A\), and - \(P(A \mid B)\) is the posterior probability of \(A\), where "prior" and "posterior" signify before and after updating based on the evidence, respectively.
Bayes' Rule and the Law of Total Probability
Probability of the Intersection of Two Events
Theorem: For any events \(A\) and \(B\) with \(P(B) > 0\), \[ P(A \cap B) = P(B) P(A \mid B) = P(A) P(B \mid A) \]
Probability of the Intersection of \(n\) Events
Theorem: For events \(A_1, \ldots, A_n\) with \(P(A_1 \cap A_2 \cap \cdots \cap A_{n-1}) > 0\), \[ P(A_1 \cap A_2 \cap \cdots \cap A_n) = P(A_1) P(A_2 \mid A_1) P(A_3 \mid A_1 \cap A_2) \cdots P(A_n \mid A_1 \cap \cdots \cap A_{n-1}) \]
Note: Here, the commas denote intersections. For example, \(P(A_3 \mid A_1, A_2)\) is shorthand for \(P(A_3 \mid A_1 \cap A_2)\).
Bayes' Rule
Theorem: Bayes' rule states that \[ P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)} \] This theorem follows directly from the probability of the intersection of two events.
Law of Total Probability
Theorem: Let \(A_1, \ldots, A_n\) be a partition of the sample space \(S\) (i.e., the events \(A_i\) are disjoint and their union is \(S\)), with \(P(A_i) > 0\) for all \(i\). Then, for any event \(B\), \[ P(B) = \sum_{i=1}^n P(B \mid A_i) P(A_i) \]
Conditional Probabilities Are Probabilities
It can be shown that conditional probabilities satisfy the axioms of probability. Thus:
Conditional probabilities are probabilities.
Note: When we write \(P(A \mid E)\), it does not imply that \(A \mid E\) is an event. Rather, \(P(\cdot \mid E)\) is a probability function that assigns probabilities based on the knowledge that \(E\) has occurred. This is distinct from \(P(\cdot)\), which assigns probabilities without accounting for whether \(E\) has occurred.
Bayes' Rule with Extra Conditioning
Theorem: Given that \(P(A \cap E) > 0\) and \(P(B \cap E) > 0\), we have \[ P(A \mid B, E) = \frac{P(B \mid A, E) P(A \mid E)}{P(B \mid E)} \]
LOTP with Extra Conditioning
Theorem: Let \(A_1, \ldots, A_n\) be a partition of \(\Omega\). If \(P(A_i \cap E) > 0\) for all \(i\), then \[ P(B \mid E) = \sum_{i=1}^n P(B \mid A_i, E) P(A_i \mid E) \]
Independence of Events
We often encounter situations where conditioning on one event changes our beliefs about another event's probability. However, if events provide no information about each other, they are said to be independent.
Definition: Events \(A\) and \(B\) are independent if \[ P(A \cap B) = P(A) P(B) \]
If \(P(A) > 0\) and \(P(B) > 0\), this is equivalent to \[ P(A \mid B) = P(A) \] and also to \(P(B \mid A) = P(B)\).
Prosecutor's Fallacy
Misunderstanding conditional probabilities can lead to significant errors in reasoning. A well-known example is the Prosecutor's Fallacy.
The Prosecutor's Fallacy is the confusion of \(P(A \mid B)\) with \(P(B \mid A)\). Conversely, the Defense Attorney's Fallacy involves neglecting to condition on all relevant evidence.
Example: Sally Clark Case
In 1998, Sally Clark was tried for murder after the sudden deaths of her two sons shortly after birth. During the trial, an expert witness for the prosecution claimed that the probability of a newborn dying from sudden infant death syndrome (SIDS) was \(1 / 8500\), so the probability of two deaths due to SIDS in one family would be \((1 / 8500)^2 \approx 1\) in 73 million. Based on this, the expert concluded that the probability of Clark's innocence was one in 73 million.
Issues with This Reasoning
Independence Assumption: The expert assumed that the two deaths due to SIDS were independent. This assumption would not hold if there were genetic or familial risk factors affecting both children.
Confusion of Conditional Probabilities: The expert confused \(P(\text{evidence} \mid \text{innocence})\) with \(P(\text{innocence} \mid \text{evidence})\). Specifically, the expert calculated \(P(\text{evidence} \mid \text{innocence})\), but what is needed is \(P(\text{innocence} \mid \text{evidence})\), which by Bayes' rule is:
\[ P(\text{innocence} \mid \text{evidence}) = \frac{P(\text{evidence} \mid \text{innocence}) \cdot P(\text{innocence})}{P(\text{evidence})} \]