Information Bottlenecks in Markov Chains

Sources:

  1. Thomas M. Cover & Joy A. Thomas. (2006). Chapter 8. Differential Entropy. Elements of Information Theory (2nd ed., pp. 243-255). Wiley-Interscience.
  2. Fady Alajaji & Po-Ning Chen. (2018). Chapter 5. Differential Entropy and Gaussian Channels. An Introduction to Single-User Information Theory (1st ed., pp. 165-218). Springer.

For discrete states

Suppose a (non-stationary) Markov chain starts in one of n states, necks down to k<n states, and then back to m>k states. Thus X1X2X3, i.e., p(x1,x2,x3)= p(x1)p(x2x1)p(x3x2), for all x1{1,2,,n},x2{1,2,,k},x3{1,2,,m}.

Questions:

  1. Show that the dependence of X1 and X3 is limited by the bottleneck by proving that I(X1;X3) logk.

  2. Evaluate I(X1;X3) for k=1, and conclude that no dependence can survive such a bottleneck.

Solution:

Since X1X2X3, from the data processing inequality we have: I(X1;X3)I(X1;X2).

By the definition of muual information, we know I(X1;X2)=H(X2)H(X2X1). Since entropy is non-negative, we ontain: I(X1;X2)=H(X2)H(X2X1)H(X2). Meanwhile, let X2 denote the number of elements in the range of X2, due to theorem H(X)log|X|, we have H(X2)X2 Finally, I(X1;X3)I(X1;X2)=H(X2)H(X2X1)H(X2)logk.

  1. For k=1,I(X1;X3)log1=0. Hence I(X1;X3)=0. Hence, for k=1,X1 and X3 are

For continual states