Relative Entropy (or KL divergence)
Sources:
- Elements of Information Theory
- An Introduction to Single-User Information Theory
Definition
The relative entropy (or Kullback-Leibler divergence)
The relative entropy (or Kullback-Leibler divergence, KL divergence)
In the above definition, we use the convention that:
and the convention (based on continuity arguments) that and . Thus, if there is any symbol such that and , then .
You may see some people use these symbols to represent relative entropy. They are interchangeable.
Properties of Relative Entropy
- In general, relative entropy is asymmetric
, and does not satisfy the triangle inequality. Therefore, it is not a metric. . for all distributions with equality holding iff . with equality if and only if for all and such that .
Property (3) is proved using Jensen’s inequality.
Property (4) is proved using property (3).
Relative Entropy is Not Symmetric
In the following problem and solution, we give an counterexample of relative entropy's symmericity.
Relative entropy is not symmetric. Let the random variable
Symbol | ||
---|---|---|
a | 1/2 | 1/3 |
b | 1/4 | 1/3 |
c | 1/4 | 1/3 |
Calculate
Conditional Eelative Entropy
We define a conditional version of the relative entropy.
Definition: For joint probability mass functions
The notation for conditional relative entropy is not explicit since it omits mention of the distribution
The relative entropy between two joint distributions on a pair of random variables can be expanded as the sum of a relative entropy and a conditional relative entropy.
For miltivariance Gaussian distributions
Suppose
Their relative entropy (or KL divergence) is: