Mean Estimation Algorithms

Sources:

  1. Shiyu Zhao. Chapter 5: Monte Carlo Methods. Mathematical Foundations of Reinforcement Learning
  2. Shiyu Zhao. Chapter 6: Stochastic Approximation. Mathematical Foundations of Reinforcement Learning

Mean estimation: compute \(\mathbb{E}[X]\) using \(\left\{x_k\right\}\) \[ w_{k+1}=w_k-\frac{1}{k}\left(w_k-x_k\right) \]

Mean estimation problem

Consider a random variable \(X\) that can take values from a finite set of real numbers denoted as \(\mathcal{X}\), suppose that our task is to calculate the mean or expected value1 of \(X\), i.e., \(\mathbb{E}[X]\).

Two approaches can be used to calculate \(\mathbb{E}[X]\).

Model based case

The first approach is model-based. Here, the model refers to the probability distribution of \(X\). If the model is known, then the mean can be directly calculated based on the definition of the expected value: \[ \begin{equation} \label{eq_1} \mathbb{E}[X]=\sum_{x \in \mathcal{X}} p(x) x \end{equation} \]

Model free case

The second approach is model-free. When the probability distribution (i.e., the model) of \(X\) is unknown, suppose that we have some samples \(\left\{x_1, x_2, \ldots, x_n\right\}\) of \(X\). Then, the mean can be approximated as \[ \begin{equation} \label{eq_2} \mathbb{E}[X] \approx \bar{x} \triangleq \frac{1}{n} \sum_{j=1}^n x_j . \end{equation} \]

When \(n\) is small, this approximation may not be accurate. However, as \(n\) increases, the approximation becomes increasingly accurate. When \(n \rightarrow \infty\), we have \(\bar{x} \rightarrow \mathbb{E}[X]\).

This is guaranteed by the law of large numbers (L.L.N.): the average of a large number of samples is close to the expected value.

This is called the Monte Carlo method.

Law of large numbers

For a random variable \(X\), suppose that \(\left\{x_i\right\}_{i=1}^n\) are some i.i.d. samples. Let \(\bar{x}=\) \(\frac{1}{n} \sum_{i=1}^n x_i\) be the average of the samples. Then, \[ \begin{aligned} \mathbb{E}[\bar{x}] & =\mathbb{E}[X], \\ \operatorname{var}[\bar{x}] & =\frac{1}{n} \operatorname{var}[X] . \end{aligned} \]

The above two equations indicate that \(\bar{x}\) is an unbiased estimate of \(\mathbb{E}[X]\), and its variance decreases to zero as \(n\) increases to infinity.

The proof is given below.

First, \(\mathbb{E}[\bar{x}]=\mathbb{E}\left[\sum_{i=1}^n x_i / n\right]=\sum_{i=1}^n \mathbb{E}\left[x_i\right] / n=\mathbb{E}[X]\), where the last equability is due to the fact that the samples are identically distributed (that is, \(\mathbb{E}\left[x_i\right]=\mathbb{E}[X]\) ).

Second, \(\operatorname{var}(\bar{x})=\operatorname{var}\left[\sum_{i=1}^n x_i / n\right]=\sum_{i=1}^n \operatorname{var}\left[x_i\right] / n^2=(n \cdot \operatorname{var}[X]) / n^2=\) \(\operatorname{var}[X] / n\), where the second equality is due to the fact that the samples are independent, and the third equability is a result of the samples being identically distributed (that is, \(\operatorname{var}\left[x_i\right]=\operatorname{var}[X]\) ).

Iterative mean estimation

There are two methods to compute equation \(\eqref{eq_2}\). The first method is non-incremental method, we have to collect all the samples first and then calculates the average. If the number of samples is large, we may have to wait for a long time until all of the samples are collected.

Alternatively, we can take the second method. It is called iterative mean estimation and is incremental. Specifically, suppose that \[ w_{k+1} \triangleq \frac{1}{k} \sum_{i=1}^k x_i, \quad k=1,2, \ldots \] and hence, \[ w_k=\frac{1}{k-1} \sum_{i=1}^{k-1} x_i, \quad k=2,3, \ldots \]

By its definition, we know \(w_{k+1} = \bar{x} \approx\mathbb{E}[X]\) and \(w_{k+1} \rightarrow \mathbb{E}[X]\). Meanwhile, \(w_{k+1}\) can be expressed in terms of \(w_k\) as \[ w_{k+1}=\frac{1}{k} \sum_{i=1}^k x_i=\frac{1}{k}\left(\sum_{i=1}^{k-1} x_i+x_k\right)=\frac{1}{k}\left((k-1) w_k+x_k\right)=w_k-\frac{1}{k}\left(w_k-x_k\right) . \] Therefore, we obtain the following incremental algorithm: \[ \begin{equation} \label{eq_3} \textcolor{red}{w_{k+1} = w_k - \frac{1}{k} \left(w_k - x_k\right)} . \end{equation} \]

This algorithm can be used to calculate the mean \(\bar{x}\) in an incremental manner. It can be verified that \[ \begin{align} w_1 & =x_1, \nonumber \\ w_2 & =w_1-\frac{1}{1}\left(w_1-x_1\right)=x_1, \nonumber \\ w_3 & =w_2-\frac{1}{2}\left(w_2-x_2\right)=x_1-\frac{1}{2}\left(x_1-x_2\right)=\frac{1}{2}\left(x_1+x_2\right), \nonumber \\ w_4 & =w_3-\frac{1}{3}\left(w_3-x_3\right)=\frac{1}{3}\left(x_1+x_2+x_3\right), \nonumber\\ & \vdots \nonumber \\ \textcolor{red} {w_{k+1}} & \textcolor{red}{ =\frac{1}{k} \sum_{i=1}^k x_i} \label{eq_3_1} . \end{align} \]

The advantage of \(\eqref{eq_3}\) is that the average can be immediately calculated every time we receive a sample. This average can be used to approximate \(\bar{x}\) and hence \(\mathbb{E}[X]\).

Furthermore, consider an algorithm with a more general expression: \[ w_{k+1}=w_k-\alpha_k\left(w_k-x_k\right) . \]

It is the same as \(\eqref{eq_3}\) except that the coefficient \(1 / k\) is replaced by \(\alpha_k>0\). Since the expression of \(\alpha_k\) is not given, we are not able to obtain the explicit expression of \(w_k\) as in \(\eqref{eq_3_1}\). However, we will show in the next articlethat, if \(\left\{\alpha_k\right\}\) satisfies some mild conditions, \(w_k \rightarrow \mathbb{E}[X]\) as \(k \rightarrow \infty\). This illustrates that the iterative mean estimation is a special form of RM algorithm.


  1. In this RL serie, we use the terms expected value, mean, and average interchangeably.↩︎