Mean Estimation Algorithms

Posted on 2024-01-27 Edited on 2025-06-17 In Computer Science Views: 64

Sources:

Shiyu Zhao. Chapter 5: Monte Carlo Methods. Mathematical Foundations of Reinforcement Learning
Shiyu Zhao. Chapter 6: Stochastic Approximation. Mathematical Foundations of Reinforcement Learning

Mean estimation: compute $E [X]$ using ${x_{k}}$ $w_{k + 1} = w_{k} - \frac{1}{k} (w_{k} - x_{k})$

Mean estimation problem

Consider a random variable $X$ that can take values from a finite set of real numbers denoted as $X$ , suppose that our task is to calculate the mean or expected value¹ of $X$ , i.e., $E [X]$ .

Two approaches can be used to calculate $E [X]$ .

Model based case

The first approach is model-based. Here, the model refers to the probability distribution of $X$ . If the model is known, then the mean can be directly calculated based on the definition of the expected value: $\begin{matrix} (1) & E [X] = \sum_{x \in X} p (x) x \end{matrix}$

Model free case

The second approach is model-free. When the probability distribution (i.e., the model) of $X$ is unknown, suppose that we have some samples ${x_{1}, x_{2}, \dots, x_{n}}$ of $X$ . Then, the mean can be approximated as $\begin{matrix} (2) & E [X] \approx \bar{x} ≜ \frac{1}{n} \sum_{j = 1}^{n} x_{j} . \end{matrix}$

When $n$ is small, this approximation may not be accurate. However, as $n$ increases, the approximation becomes increasingly accurate. When $n \to \infty$ , we have $\bar{x} \to E [X]$ .

This is guaranteed by the law of large numbers (L.L.N.): the average of a large number of samples is close to the expected value.

This is called the Monte Carlo method.

Law of large numbers

For a random variable $X$ , suppose that ${x_{i}}_{i = 1}^{n}$ are some i.i.d. samples. Let $\bar{x} =$ $\frac{1}{n} \sum_{i = 1}^{n} x_{i}$ be the average of the samples. Then, $\begin{aligned} E [\bar{x}] & = E [X], \\ var [\bar{x}] & = \frac{1}{n} var [X] . \end{aligned}$

The above two equations indicate that $\bar{x}$ is an unbiased estimate of $E [X]$ , and its variance decreases to zero as $n$ increases to infinity.

The proof is given below.

First, $E [\bar{x}] = E [\sum_{i = 1}^{n} x_{i} / n] = \sum_{i = 1}^{n} E [x_{i}] / n = E [X]$ , where the last equability is due to the fact that the samples are identically distributed (that is, $E [x_{i}] = E [X]$ ).

Second, $var (\bar{x}) = var [\sum_{i = 1}^{n} x_{i} / n] = \sum_{i = 1}^{n} var [x_{i}] / n^{2} = (n \cdot var [X]) / n^{2} = var [X] / n,$ where the second equality is due to the fact that the samples are independent, and the third equability is a result of the samples being identically distributed (that is, $var [x_{i}] = var [X]$ ).

Iterative mean estimation

There are two methods to compute equation $(2)$ . The first method is non-incremental method, we have to collect all the samples first and then calculates the average. If the number of samples is large, we may have to wait for a long time until all of the samples are collected.

Alternatively, we can take the second method. It is called iterative mean estimation and is incremental. Specifically, suppose that $w_{k + 1} ≜ \frac{1}{k} \sum_{i = 1}^{k} x_{i}, k = 1, 2, \dots$ and hence, $w_{k} = \frac{1}{k - 1} \sum_{i = 1}^{k - 1} x_{i}, k = 2, 3, \dots$

By its definition, we know $w_{k + 1} = \bar{x} \approx E [X]$ and $w_{k + 1} \to E [X]$ . Meanwhile, $w_{k + 1}$ can be expressed in terms of $w_{k}$ as $w_{k + 1} = \frac{1}{k} \sum_{i = 1}^{k} x_{i} = \frac{1}{k} (\sum_{i = 1}^{k - 1} x_{i} + x_{k}) = \frac{1}{k} ((k - 1) w_{k} + x_{k}) = w_{k} - \frac{1}{k} (w_{k} - x_{k}) .$ Therefore, we obtain the following incremental algorithm: $\begin{matrix} (3) & w_{k + 1} = w_{k} - \frac{1}{k} (w_{k} - x_{k}) . \end{matrix}$

This algorithm can be used to calculate the mean $\bar{x}$ in an incremental manner. It can be verified that $\begin{aligned} w_{1} & = x_{1}, \\ w_{2} & = w_{1} - \frac{1}{1} (w_{1} - x_{1}) = x_{1}, \\ w_{3} & = w_{2} - \frac{1}{2} (w_{2} - x_{2}) = x_{1} - \frac{1}{2} (x_{1} - x_{2}) = \frac{1}{2} (x_{1} + x_{2}), \\ w_{4} & = w_{3} - \frac{1}{3} (w_{3} - x_{3}) = \frac{1}{3} (x_{1} + x_{2} + x_{3}), \\ ⋮ \\ w_{k + 1} & = \frac{1}{k} \sum_{i = 1}^{k} x_{i} \end{aligned}$

The advantage of $(3)$ is that the average can be immediately calculated every time we receive a sample. This average can be used to approximate $\bar{x}$ and hence $E [X]$ .

Furthermore, consider an algorithm with a more general expression: $w_{k + 1} = w_{k} - α_{k} (w_{k} - x_{k}) .$

It is the same as $(3)$ except that the coefficient $1 / k$ is replaced by $α_{k} > 0$ . Since the expression of $α_{k}$ is not given, we are not able to obtain the explicit expression of $w_{k}$ as in $(???)$ . However, we will show in the next articlethat, if ${α_{k}}$ satisfies some mild conditions, $w_{k} \to E [X]$ as $k \to \infty$ . This illustrates that the iterative mean estimation is a special form of RM algorithm.

In this RL serie, we use the terms expected value, mean, and average interchangeably.↩︎