Mean Estimation Algorithms
Sources:
- Shiyu Zhao. Chapter 5: Monte Carlo Methods. Mathematical Foundations of Reinforcement Learning
- Shiyu Zhao. Chapter 6: Stochastic Approximation. Mathematical Foundations of Reinforcement Learning
Mean estimation: compute \(\mathbb{E}[X]\) using \(\left\{x_k\right\}\) \[ w_{k+1}=w_k-\frac{1}{k}\left(w_k-x_k\right) \]
Mean estimation problem
Consider a random variable \(X\) that can take values from a finite set of real numbers denoted as \(\mathcal{X}\), suppose that our task is to calculate the mean or expected value1 of \(X\), i.e., \(\mathbb{E}[X]\).
Two approaches can be used to calculate \(\mathbb{E}[X]\).
Model based case
The first approach is model-based. Here, the model refers to the probability distribution of \(X\). If the model is known, then the mean can be directly calculated based on the definition of the expected value: \[ \begin{equation} \label{eq_1} \mathbb{E}[X]=\sum_{x \in \mathcal{X}} p(x) x \end{equation} \]
Model free case
The second approach is model-free. When the probability distribution (i.e., the model) of \(X\) is unknown, suppose that we have some samples \(\left\{x_1, x_2, \ldots, x_n\right\}\) of \(X\). Then, the mean can be approximated as \[ \begin{equation} \label{eq_2} \mathbb{E}[X] \approx \bar{x} \triangleq \frac{1}{n} \sum_{j=1}^n x_j . \end{equation} \]
When \(n\) is small, this approximation may not be accurate. However, as \(n\) increases, the approximation becomes increasingly accurate. When \(n \rightarrow \infty\), we have \(\bar{x} \rightarrow \mathbb{E}[X]\).
This is guaranteed by the law of large numbers (L.L.N.): the average of a large number of samples is close to the expected value.
This is called the Monte Carlo method.
Law of large numbers
For a random variable \(X\), suppose that \(\left\{x_i\right\}_{i=1}^n\) are some i.i.d. samples. Let \(\bar{x}=\) \(\frac{1}{n} \sum_{i=1}^n x_i\) be the average of the samples. Then, \[ \begin{aligned} \mathbb{E}[\bar{x}] & =\mathbb{E}[X], \\ \operatorname{var}[\bar{x}] & =\frac{1}{n} \operatorname{var}[X] . \end{aligned} \]
The above two equations indicate that \(\bar{x}\) is an unbiased estimate of \(\mathbb{E}[X]\), and its variance decreases to zero as \(n\) increases to infinity.
The proof is given below.
First, \(\mathbb{E}[\bar{x}]=\mathbb{E}\left[\sum_{i=1}^n x_i / n\right]=\sum_{i=1}^n \mathbb{E}\left[x_i\right] / n=\mathbb{E}[X]\), where the last equability is due to the fact that the samples are identically distributed (that is, \(\mathbb{E}\left[x_i\right]=\mathbb{E}[X]\) ).
Second, \[ \operatorname{var}(\bar{x})=\operatorname{var}\left[\sum_{i=1}^n x_i / n\right]=\sum_{i=1}^n \operatorname{var}\left[x_i\right] / n^2=(n \cdot \operatorname{var}[X]) / n^2= \operatorname{var}[X] / n , \] where the second equality is due to the fact that the samples are independent, and the third equability is a result of the samples being identically distributed (that is, \(\operatorname{var}\left[x_i\right]=\operatorname{var}[X]\)).
Iterative mean estimation
There are two methods to compute equation \(\eqref{eq_2}\). The first method is non-incremental method, we have to collect all the samples first and then calculates the average. If the number of samples is large, we may have to wait for a long time until all of the samples are collected.
Alternatively, we can take the second method. It is called iterative mean estimation and is incremental. Specifically, suppose that \[ w_{k+1} \triangleq \frac{1}{k} \sum_{i=1}^k x_i, \quad k=1,2, \ldots \] and hence, \[ w_k=\frac{1}{k-1} \sum_{i=1}^{k-1} x_i, \quad k=2,3, \ldots \]
By its definition, we know \(w_{k+1} = \bar{x} \approx\mathbb{E}[X]\) and \(w_{k+1} \rightarrow \mathbb{E}[X]\). Meanwhile, \(w_{k+1}\) can be expressed in terms of \(w_k\) as \[ w_{k+1}=\frac{1}{k} \sum_{i=1}^k x_i=\frac{1}{k}\left(\sum_{i=1}^{k-1} x_i+x_k\right)=\frac{1}{k}\left((k-1) w_k+x_k\right)=w_k-\frac{1}{k}\left(w_k-x_k\right) . \] Therefore, we obtain the following incremental algorithm: \[ \begin{equation} \label{eq_3} \textcolor{red}{w_{k+1} = w_k - \frac{1}{k} \left(w_k - x_k\right)} . \end{equation} \]
This algorithm can be used to calculate the mean \(\bar{x}\) in an incremental manner. It can be verified that \[ \begin{aligned} w_1 & =x_1, \nonumber \\ w_2 & =w_1-\frac{1}{1}\left(w_1-x_1\right)=x_1, \\ w_3 & =w_2-\frac{1}{2}\left(w_2-x_2\right)=x_1-\frac{1}{2}\left(x_1-x_2\right)=\frac{1}{2}\left(x_1+x_2\right), \\ w_4 & =w_3-\frac{1}{3}\left(w_3-x_3\right)=\frac{1}{3}\left(x_1+x_2+x_3\right), \\ & \vdots \\ \textcolor{red} {w_{k+1}} & \textcolor{red}{ =\frac{1}{k} \sum_{i=1}^k x_i} \end{aligned} \]
The advantage of \(\eqref{eq_3}\) is that the average can be immediately calculated every time we receive a sample. This average can be used to approximate \(\bar{x}\) and hence \(\mathbb{E}[X]\).
Furthermore, consider an algorithm with a more general expression: \[ w_{k+1}=w_k-\alpha_k\left(w_k-x_k\right) . \]
It is the same as \(\eqref{eq_3}\) except that the coefficient \(1 / k\) is replaced by \(\alpha_k>0\). Since the expression of \(\alpha_k\) is not given, we are not able to obtain the explicit expression of \(w_k\) as in \(\eqref{eq_3_1}\). However, we will show in the next articlethat, if \(\left\{\alpha_k\right\}\) satisfies some mild conditions, \(w_k \rightarrow \mathbb{E}[X]\) as \(k \rightarrow \infty\). This illustrates that the iterative mean estimation is a special form of RM algorithm.
In this RL serie, we use the terms expected value, mean, and average interchangeably.↩︎