Discrete-Time Markov Chains
Sources:
- Jeseph K. Blitzstein & Jessica Hwang. (2019). Markov chains. Introduction to Probability (2nd ed., pp. 497-533). CRC Press.
Discrete-Time Markov Chains
Sources:
- Joseph K. Blitzstein & Jessica Hwang. (2019). Markov chains. Introduction to Probability (2nd ed., pp. 497-533). CRC Press.
Notation
Symbol | Type | Description |
---|---|---|
\(\{X_t\}_{t \geq 0}\) | Random process | A discrete-time Markov chain, where \(X_t\) represents the state at time \(t\) |
\(\mathcal{S} = \{1, 2, \ldots, N\}\) | Set | Finite state space of the Markov chain, with \(N\) total states |
\(N\) | \(\in \mathbb{N}\) | Total number of states in the Markov chain’s state space \(\mathcal{S}\) |
\(\mathbf{P}\) | \(\mathbb{R}^{N \times N}\) | One-step transition matrix, where \(\mathbf{P} = \left(p_{i,j}\right)_{i,j=1}^N\) |
\(p_{i,j}\) | \(\in [0,1]\) | Transition probability \(\mathrm{P}\left[X_{t+1}=j \mid X_t=i\right]\), the probability of moving from state \(i\) to \(j\) in one step |
\(i, j\) | \(\in \{1, 2, \ldots, N\}\) | Indices representing states in the Markov chain's state space \(\mathcal{S}\) |
\(\mathbf{P}^{(m)}\) | \(\mathbb{R}^{N \times N}\) | \(m\)-step transition matrix, where \(p_{i,j}^{(m)} = \mathrm{P}\left[X_{t+m}=j \mid X_t=i\right]\) |
\(\vec{p}^{(t)}\) | \(\mathbb{R}^{1 \times N}\) | Probability distribution of the chain at time \(t\), where row vector \(\vec{p}^{(t)} = (p_1^{(t)}, p_2^{(t)}, \ldots, p_N^{(t)})\) |
\(\vec{\pi}\) | \(\mathbb{R}^{1 \times N}\) | Stationary distribution of the Markov chain, satisfying \(\vec{\pi} \mathbf{P} = \vec{\pi}\) |
\(M\) | \(\in \mathbb{N}\) | Number of components in a vector \(\vec{\pi}\) or \(\vec{p}^{(t)}\), generally equal to \(N\) |
Abbreviations
Abbreviation | Type | Description |
---|---|---|
DTMC | Discrete-Time Markov Chain | Refers to a Markov chain in discrete time steps |
LOTP | Law of Total Probability | A theorem stating that probabilities can be broken down based on partitions of the sample space |
r.v. | Random Variable | A variable whose possible values are numerical outcomes of a random phenomenon |
C-K Equations | Chapman-Kolmogorov Equations | Equations that provide relationships for \(m\)-step transition probabilities in a Markov chain |
Discrete-Time Markov Chains
Consider the random process \(\{X(t), t=0,1,2, \ldots\}\). This process is called a Markov chain if it satisfies the Markov property, meaning:
\[ \mathrm{P}\left[X_{t+1}=x_{t+1} \mid X_0=x_0, \ldots, X_t=x_t\right]=\mathrm{P}\left[X_{t+1}=x_{t+1} \mid X_t=x_t\right] \]
for all \(t \geq 1\) and states \(x_0, \ldots, x_{t+1}\). In a finite-state Markov chain, the values taken by \(X_0, X_1, \ldots\) form a finite set \(\mathcal{S}=\{1,2, \ldots, N\}\), called the state space. If \(|\mathcal{S}|=N\), then we have an \(N\)-state Markov chain. Here, \(X_t\) represents the state at time \(t\), and \(X_0\) is the initial state.
A Markov chain is called time-homogeneous if the transition probability \(\mathrm{P}\left[X_{t+1}=j \mid X_t=i\right]\) is independent of \(t\). Unless stated otherwise, we assume this time-homogeneity.
Transition probability matrices
one-step transition probability matrix
Define \(p_{i, j}=\mathrm{P}\left[X_{t+1}=j \mid X_t=i\right]\). Then the one-step transition probability matrix \(\mathbf{P}\) is:
\[ \mathbf{P}=\left(\begin{array}{cccc} p_{1,1} & p_{1,2} & \cdots & p_{1, N} \\ p_{2,1} & p_{2,2} & \cdots & p_{2, N} \\ \vdots & \vdots & \ddots & \vdots \\ p_{N, 1} & p_{N, 2} & \cdots & p_{N, N} \end{array}\right) \]
where \(\mathbf{P}\) is a stochastic matrix: all entries are non-negative, and each row sums to 1. A state transition diagram visually represents \(\mathbf{P}\) as a directed graph with edges \(p_{i, j}>0\) from state \(i\) to state \(j\). ### \(m\)-Step Transition Probability Matrix
The probability of transitioning from state \(i\) to state \(j\) in \(m\) steps, denoted \(p_{i, j}^{(m)}\), is: \[ p_{i, j}^{(m)}=\mathrm{P}\left[X_{t+m}=j \mid X_t=i\right] \]
The \(m\)-step transition matrix \(\mathbf{P}^{(m)}\) is defined as:
\[ \mathbf{P}^{(m)}=\left[p_{i, j}^{(m)}\right]_{i, j \in\{1,2, \ldots, N\}} \]
where \(\mathbf{P} \equiv \mathbf{P}^{(1)}\).
The Chapman-Kolmogorov Equations
The Chapman-Kolmogorov equations provide a way to compute \(m\)-step transitions: \[ \begin{equation} \label{eq_5_5} p_{i, j}^{(m+1)} =\sum_{l=1}^N p_{l, j} p_{i, l}^{(m)} \end{equation} \] or equivalently, in matrix form
\[ \begin{equation} \label{eq_5_6} \mathbf{P}^{m+1} = \mathbf{P}^{m} \mathbf{P} \end{equation} \]
For \(m, n \geq 1\), the expression \(\eqref{eq_5_6}\) implies that \[ \begin{equation} \label{eq_5_7} \mathbf{P}^{(m+n)}=\mathbf{P}^{(m)} \mathbf{P} ^{(n)}. \end{equation} \] which is the expression of the Chapman–Kolmogorov equation for Markov chains.
Proof
We only need to prove \(\eqref{eq_5_5}\) since \(\eqref{eq_5_6}\) is derived directly from it. Observe that
\[ \begin{aligned} p_{i, j}^{(m+1)} & =P\left(X_{t+m+1}=j \mid X_t=i\right) \\ & \stackrel{(a)}{=}\sum_{l=1}^N P\left(X_{t+m+1}=j, X_{t+m}=l \mid X_{t=i}\right) \\ & \stackrel{(b)}{=}\sum_{l=1}^N P\left(X_{t+m+1}=j \mid X_{t+m} l, X_{t-i}\right) P\left(X_{t+m}=l \mid X_{t-i}\right) \\ & \stackrel{(c)}{=}\sum_{l=1}^{(b)} P\left(X_{t+m+1}=j \mid X_{t+m}=l\right) P\left(X_{t+m}=l \mid X_{t-i}\right) \\ & =\sum_{l=1}^N p_{l, j} p_{i, l}^{(m)} \end{aligned} \]
where \((a)\) follows from the the Law of Total Probability, \((b)\) from the definition of conditional probabilities, and \((c)\) from the Markov property.
Probability distribution of a Markov chain
Let \(\vec{p}^{(t)}=\left(p_1^{(t)}, p_2^{(t)}, \ldots, p_N^{(t)}\right)\) be the probability distribution of a Markov chain at time \(t\), where \(p_i^{(t)}=\mathrm{P}\left[X_t=i\right]\). In particular, the components of \(\vec{p}^{(t)}\) are nonnegative and sum to 1. The distribution evolves as:
\[ \vec{p}^{(t+1)}=\vec{p}^{(t)} \mathbf{P} \]
so that at time \(t\) :
\[ \vec{p}^{(t)}=\vec{p}^{(0)} \mathbf{P}^t \]
Here, \(\vec{p}^{(0)}\) is the initial distribution.
Proof:
We consider first the \(j^{t h}\) component of \(\vec{p}^{(t+1)}\). By the Law of Total Probability:
\[ p_j^{(t+1)}=\mathrm{P}\left[X_{t+1}=j\right]=\sum_{i=1}^N \mathrm{P}\left[X_t=i\right] \mathrm{P}\left[X_{t+1}=j \mid X_t=i\right]=\sum_{i=1}^N p_i^{(t)} p_{i, j} = p^{(t)} \mathbf {P}_{\cdot, j} \]
Thus, by considering all values of \(j=1,2, \ldots, N\), we get
\[ \vec{p}^{(t+1)}=\vec{p}^{(t)} \mathbf{P} \]
It follows that at each step, the distribution gets multiplied by \(\mathbf{P}\). Going from time 0 to time \(t\) corresponds to \(t\) steps and, therefore, \[ \vec{p}^{(t)}=\vec{p}^{(0)} \mathbf{P}^t \]
The vector \(\vec{p}^{(0)}\) is called the initial distribution of the Markov chain. If we know this initial distribution and the transition matrix, we can compute the distribution at any time \(t \geq 1\).
Classification of states
In this section we introduce terminology for describing the various characteristics of a Markov chain. The states of a Markov chain can be classified as recurrent or transient, depending on whether they are visited over and over again in the long run or are eventually abandoned. States can also be classified according to their period, which is a positive integer summarizing the amount of time that can elapse between successive visits to a state. These characteristics are important because they determine the long-run behavior of the Markov chain, which we will study in stationary distributions.
Recurrent and transient states
State \(i\) of a Markov chain is recurrent if starting from \(i\), the probability is 1 that the chain will eventually return to \(i\). Otherwise, the state is transient, which means that if the chain starts from \(i\), there is a positive probability of never returning to \(i\).
Irreducible and reducible chains
A Markov chain with transition matrix \(Q\) is irreducible if for any two states \(i\) and \(j\), it is possible to go from \(i\) to \(j\) in a finite number of steps (with positive probability). That is, for any states \(i, j\) there is some positive integer \(n\) such that the \((i, j)\) entry of \(Q^n\) is positive. A Markov chain that is not irreducible is called reducible.
Proposition: In an irreducible Markov chain with a finite state space, all states are recurrent.
Proof. It is clear that at least one state must be recurrent; if all states were transient, the chain would eventually leave all states forever and have nowhere to go! So assume without loss of generality that state 1 is recurrent, and consider any other state \(i\). We know that \(p_{1 i}^{(n)}\) is positive for any \(n\), by definition of irreducibility. Thus, every time the chain is at state 1 , there is a positive probability that after \(n\) more steps it will be at state \(i\).
Since the chain visits state 1 infinitely often, we know the chain will eventually reach state \(i\) from state 1 ; think of each visit to state 1 as starting a trial, where "success" is defined as reaching state \(i\) in at most \(n\) steps. From state \(i\), the chain will return to state 1 because state 1 is recurrent, and by the same logic, it will eventually reach state \(i\) again. By induction, the chain will visit state \(i\) infinitely often. Since \(i\) was arbitrary, we conclude that all states are recurrent.
Periodic and aperiodic chains
The period of a state \(i\) in a Markov chain is defined as \[ d_i=\operatorname{gcd}\left\{m \geq 1: p_{i, i}^{(m)}>0\right\} \] In words, \(d_i\) is the greatest common divisor of the lengths of all paths starting from state \(i\) and returning to state \(i\). Otherwise, i.e., if \(p_{i, i}^{(m)}=0\) for all \(m \geq 1\), then we define \(d_i=\infty\).
A state is called aperiodic if its period equals 1 , and periodic otherwise. The chain itself is called aperiodic if all its states are aperiodic, and periodic otherwise.
A Necessary Condition for the Existence of a limiting distribution
If a Markov chain has a limiting distribution, then it has a unique stationary distribution and the limiting distribution is equal to this stationary distribution.
Stationary distributions
A distribution \(\vec{\pi}=\left(\pi_1, \pi_2, \ldots, \pi_N\right)\) is a stationary distribution if:
\[ \vec{\pi} \mathbf{P}=\vec{\pi} \]
if \(\vec{\pi} \mathbf{P}=\vec{\pi}\).
In particular, if the initial distribution \(\vec{p}^{(0)}=\vec{\pi}\), then \(\vec{p}^{(t)}=\vec{\pi}\) for all \(t \geq 1\). In words, if the initial distribution is stationary, then the distribution does not change with time.
Existence and uniqueness
Every Markov chain has at least one stationary distribution.
The proof is omitted.
For an irreducible chain, there exists a unique stationary distribution where each state has positive probability.
The proof is omitted.
Convergence
Let \(X_0, X_1, \ldots\) be an irreducible, aperiodic Markov chain with stationary distribution \(\vec{\pi}\) and transition matrix \(Q\). Then \(P\left(X_n=i\right)\) converges to \(s_i\) as \(n \rightarrow \infty\). In terms of the transition matrix, \(\mathbf{P}^n\) converges to a matrix in which each row is \(\vec{\pi}\).
The proof is omitted.
Limiting distributions
A probability distribution \(\vec{\pi}=\left(\pi_1, \pi_2, \ldots, \pi_N\right)\) is called a limiting distribution of the Markov chain if \(\vec{p}^{(t)} \rightarrow \vec{\pi}\) as \(t \rightarrow \infty\) regardless of the initial distribution \(\vec{p}^{(0)}\).
Reversibility
We have seen that the stationary distribution of a Markov chain is extremely useful for understanding its long-run behavior. Unfortunately, in general it may be computationally difficult to find the stationary distribution when the state space is large. This section addresses an important special case where working with eigenvalue equations for large matrices can be avoided.
Definition:
A chain with transition matrix \(\mathbf{P}\) is reversible with respect to a distribution \(\vec{\pi}=\left(\pi_1, \pi_2, \ldots, \pi_N\right)\) if:
\[ \pi_i p_{i, j}=\pi_j p_{j, i} \]
for all states \(i\) and \(j\). Here, \(\vec{\pi}\) is a probability distribution over the state space (with each \(\pi_i \geq 0\) and \(\sum_i \pi_i=1\) ) and is often the stationary distribution of the chain. When the chain is reversible with respect to \(\vec{\pi}\), and is started in this stationary distribution, it exhibits time-symmetric behavior, meaning the probability flow between any two states is the same in both directions.
Proposition: Suppose that \(\mathbf{P}=\left(p_{i j}\right)\) is the transition matrix of a Markov chain that is reversible with respect to a nonnegative vector \(\vec \pi=\left(\pi_1, \ldots, \pi_M\right)\) whose components sum to 1 . Then \(\mathbf{s}\) is a stationary distribution of the chain.
Proof. We have \[ \sum_i \pi_i p_{i j}=\sum_i \pi_j p_{j i}=\pi_j \sum_i p_{j i}=\pi_j \]
where the last equality is because each row sum of \(\mathbf{P}\) is 1 . So \(\vec \pi\) is stationary.