//NOTE: This article is not finished yet and contains many errors. I am always ready to edit it.
Sources:
- Thomas M. Cover & Joy A. Thomas. (2006). Chapter 8. Differential Entropy. Elements of Information Theory (2nd ed., pp. 243-255). Wiley-Interscience.
- Fady Alajaji & Po-Ning Chen. (2018). Chapter 5. Differential Entropy and Gaussian Channels. An Introduction to Single-User Information Theory (1st ed., pp. 165-218). Springer.
Differential Entropy
Definition Let be a random variable with cumulative distribution function . If is continuous, the random variable is said to be continuous. Let when the derivative is defined. If is called the probability density function for . The set where is called the support set of .
Definition The differential entropy of a continuous random variable with density is defined as where is the support set of the random variable. As in the discrete case, the differential entropy depends only on the probability density of the random variable, and therefore the differential entropy is sometimes written as rather than .
Example 8.1.1 (Uniform distribution) Consider a random variable distributed uniformly from 0 to so that its density is from 0 to and 0 elsewhere. Then its differential entropy is
Note: For , and the differential entropy is negative. Hence, unlike discrete entropy, differential entropy can be negative. However, is the volume of the support set, which is always nonnegative, as we expect.
Example: Normal distribution
Example 8.1.2 (Normal distribution) Let . Then calculating the differential entropy in nats, we obtain
Changing the base of the logarithm, we have
JOINT AND CONDITIONAL DIFFERENTIAL ENTROPY
As in the discrete case, we can extend the definition of differential entropy of a single random variable to several random variables.
Definition The differential entropy of a set of random variables with density is defined as
Definition If have a joint density function , we can define the conditional differential entropy as
Since in general , we can also write
But we must be careful if any of the differential entropies are infinite. The next entropy evaluation is used frequently in the text.
Entropy of a multivariate normal distribution
Theorem 8.4.1 (Entropy of a multivariate normal distribution) Let have a multivariate normal distribution with mean and covariance matrix . Then where denotes the determinant of .
Theorem 8.4.1 (Entropy of a multivariate normal distribution) Let have a multivariate normal distribution with mean and covariance matrix . Then where denotes the determinant of .
Properties of Differential Entropy
Theorem 8.6.1 with equality iff almost everywhere (a.e.). Proof: Let be the support set of . Then (by Jensen's inequality)
We have equality iff we have equality in Jensen's inequality, which occurs iff a.e.
Corollary with equality iff and are independent. Corollary with equality iff and are independent.
Theorem: Chain rule for differential entropy
Proof: Follows directly from the definitions.
Corollary
Corollary with equality iff are independent. Proof: Follows directly from Theorem 8.6.2 and the corollary to Theorem 8.6.1.
Application (Hadamard's inequality) If we let be a multivariate normal random variable, calculating the entropy in the above inequality gives us which is Hadamard's inequality. A number of determinant inequalities can be derived in this fashion from information-theoretic inequalities (Chapter 17).
Theorem:
Translation does not change the differential entropy. Proof: Follows directly from the definition of differential entropy.
Theorem:
Proof: Let . Then , and after a change of variables in the integral. Similarly, we can prove the following corollary for vector-valued random variables.
Corollary
We now show that the multivariate normal distribution maximizes the entropy over all distributions with the same covariance.
Theorem:
Theorem 8.6.5 Let the random vector have zero mean and covariance (i.e., ).
Then , with equality iff .
Proof: Let be any density satisfying for all . Let be the density of a vector as given in (8.35), where we set . Note that is a quadratic form and . Then