Mutual Information
Sources:
- Elements of Information Theory
- An Introduction to Single-User Information Theory
Definition
Given two random variables
It can be expressed in terms relative entropy between their joint distribution
Properties of Mutual Information
Basics
. . . with equality holding iff is a function of (i.e., for some function . with equality holding iff and are independent. .
Proof:
Properties 1, 2, 3, and 4 follow immediately from the definition.
Property 5 is a direct consequence of
Property 6 holds iff
To show the first inequality, we write
The relationships between
Expanding
Then
So mutual information is symmetric.
Now let’s ask an interesting question: How much does X tell us about itself? In other words, what is I (X; X)? Using our first definition, we have:
We note that
For any two random variables,
Proof:
- We know that
. - See property 3 of Relative Entropy,
for all distributions with equality holding iff . - So
, with equality if and only if (i.e., and are independent).
Corollary:
Conditional Mutual Information

Venn diagram of information theoretic measures for three variables
, and , represented by the lower left, lower right, and upper circles, respectively. The conditional mutual informations
and are represented by the yellow, cyan, and magenta regions, respectively.
The conditional mutual information, denoted by
可以这么想像:
Joint Mutual Information
Lemma: Defining the joint mutual information between
可以这么想像: 把
Proof: Without loss of generality, we only prove the first equality:
The above lemma can be read as follows: the information that
Properties of Entropy and Mutual Information for Multiple Random Variables
Chain rule for joint entropy
Theorem: Let
Then
For example, for three random variables
, , and ,
Proof:
From chain rule for 2 r.v. ,
Once again, applying chain rule for 2 r.v. to the first term of the right-hand side of this equation, we have
The desired result can then be obtained by repeatedly applying chain rule for 2 r.v. .
Chain rule for conditional entropy
Theorem:
Proof:
The theorem can be proved similarly to Chain Rule for Entropy(2 Variables). If
Chain rule for mutual information
Theorem:
Proof:
This can be proved by first expressing mutual information in terms of entropy and conditional entropy, and then applying the chain rules for entropy and conditional entropy.
Independence bound on entropy
Theorem:
Equality holds iff all the
Proof:
By applying the chain rule for entropy,
Equality holds iff each conditional entropy is equal to its associated entropy, that iff
Bound on mutual information
Theorem: If
Proof:
From the independence bound on entropy, we have
By the conditional independence assumption, we have
Hence,