Information Bottlenecks in Markov Chains

Posted on 2024-01-25 Edited on 2025-06-17 In Mathematics Views: 36

Sources:

Thomas M. Cover & Joy A. Thomas. (2006). Chapter 8. Differential Entropy. Elements of Information Theory (2nd ed., pp. 243-255). Wiley-Interscience.
Fady Alajaji & Po-Ning Chen. (2018). Chapter 5. Differential Entropy and Gaussian Channels. An Introduction to Single-User Information Theory (1st ed., pp. 165-218). Springer.

For discrete states

Suppose a (non-stationary) Markov chain starts in one of $n$ states, necks down to $k < n$ states, and then back to $m > k$ states. Thus $X_{1} \to X_{2} \to X_{3}$ , i.e., $p (x_{1}, x_{2}, x_{3}) =$ $p (x_{1}) p (x_{2} ∣ x_{1}) p (x_{3} ∣ x_{2})$ , for all $x_{1} \in {1, 2, \dots, n}, x_{2} \in {1, 2, \dots, k}, x_{3} \in {1, 2, \dots, m}$ .

Questions:

Show that the dependence of $X_{1}$ and $X_{3}$ is limited by the bottleneck by proving that $I (X_{1}; X_{3}) \leq$ $\log k$ .
Evaluate $I (X_{1}; X_{3})$ for $k = 1$ , and conclude that no dependence can survive such a bottleneck.

Solution:

Since $X_{1} \to X_{2} \to X_{3}$ , from the data processing inequality we have: $I (X_{1}; X_{3}) \leq I (X_{1}; X_{2}) .$

By the definition of muual information, we know $I (X_{1}; X_{2}) = H (X_{2}) - H (X_{2} ∣ X_{1}) .$ Since entropy is non-negative, we ontain: $I (X_{1}; X_{2}) = H (X_{2}) - H (X_{2} ∣ X_{1}) \leq H (X_{2}) .$ Meanwhile, let $X_{2}$ denote the number of elements in the range of $X_{2}$ , due to theorem $H (X) \leq \log | X |$ , we have $H (X_{2}) \leq X_{2}$ Finally, $I (X_{1}; X_{3}) \leq I (X_{1}; X_{2}) = H (X_{2}) - H (X_{2} ∣ X_{1}) \leq H (X_{2}) \leq \log k .$

For $k = 1, I (X_{1}; X_{3}) \leq \log 1 = 0$ . Hence $I (X_{1}; X_{3}) = 0$ . Hence, for $k = 1, X_{1}$ and $X_{3}$ are

For discrete states

For continual states