Chain Rules for Entropy, Relative Entropy and Mutual Information

We now show that the entropy of a collection of random variables is the sum of the conditional entropies.

Note: 本文的证明还比较粗糙, 后面会改进.

Ref: Elements of Information Theory

Chain rule for entropy

Let X1,X2,,Xn be drawn according to p(x1,x2,,xn). Then H(X1,X2,,Xn)=i=1nH(XiXi1,,X1).

Proof: Short

By repeated application of the two-variable expansion rule for entropies, we have H(X1,X2)=H(X1)+H(X2X1),H(X1,X2,X3)=H(X1)+H(X2,X3X1)=H(X1)+H(X2X1)+H(X3X2,X1),H(X1,X2,,Xn)=H(X1)+H(X2X1)++H(XnXn1,,X1)=i=1nH(XiXi1,,X1).

Proof: Long

We write p(x1,,xn)=i=1np(xixi1,,x1) and evaluate H(X1,X2,,Xn)=x1,x2,,xnp(x1,x2,,xn)logp(x1,x2,,xn)=x1,x2,,xnp(x1,x2,,xn)logi=1np(xixi1,,x1)=x1,x2,,xni=1np(x1,x2,,xn)logp(xixi1,,x1)=i=1nx1,x2,,xnp(x1,x2,,xn)logp(xixi1,,x1)=i=1nx1,x2,,xip(x1,x2,,xi)logp(xixi1,,x1)=i=1nH(XiXi1,,X1). We now define the conditional mutual information as the reduction in the uncertainty of X due to knowledge of Y when Z is given.

Chain rule for Mutual Information

Mutual information also satisfies a chain rule. I(X1,X2,,Xn;Y)=i=1nI(Xi;YXi1,Xi2,,X1).

Proof

I(X1,X2,,Xn;Y)=H(X1,X2,,Xn)H(X1,X2,,XnY)=i=1nH(XiXi1,,X1)i=1nH(XiXi1,,X1,Y)=i=1nI(Xi;YX1,X2,,Xi1).

Chain Rule for Relative Entropy

The chain rule for relative entropy is used in Section 4.4 to prove a version of the second law of thermodynamics.

D(p(x,y)q(x,y))=D(p(x)q(x))+D(p(yx)q(yx)).

Proof

D(p(x,y)q(x,y))=xyp(x,y)logp(x,y)q(x,y)=xyp(x,y)logp(x)p(yx)q(x)q(yx)=xyp(x,y)logp(x)q(x)+xyp(x,y)logp(yx)q(yx)=D(p(x)q(x))+D(p(yx)q(yx)).