Sources:
- Shiyu Zhao. Chapter 5: Monte Carlo Methods. Mathematical Foundations of Reinforcement Learning
- Shiyu Zhao. Chapter 6: Stochastic Approximation. Mathematical Foundations of Reinforcement Learning
Mean estimation: compute using
Mean estimation problem
Consider a random variable that can take values from a finite set of real numbers denoted as , suppose that our task is to calculate the mean or expected value of , i.e., .
Two approaches can be used to calculate .
Model based case
The first approach is model-based. Here, the model refers to the probability distribution of . If the model is known, then the mean can be directly calculated based on the definition of the expected value:
Model free case
The second approach is model-free. When the probability distribution (i.e., the model) of is unknown, suppose that we have some samples of . Then, the mean can be approximated as
When is small, this approximation may not be accurate. However, as increases, the approximation becomes increasingly accurate. When , we have .
This is guaranteed by the law of large numbers (L.L.N.): the average of a large number of samples is close to the expected value.
This is called the Monte Carlo method.
Law of large numbers
For a random variable , suppose that are some i.i.d. samples. Let be the average of the samples. Then,
The above two equations indicate that is an unbiased estimate of , and its variance decreases to zero as increases to infinity.
The proof is given below.
First, , where the last equability is due to the fact that the samples are identically distributed (that is, ).
Second, where the second equality is due to the fact that the samples are independent, and the third equability is a result of the samples being identically distributed (that is, ).
Iterative mean estimation
There are two methods to compute equation . The first method is non-incremental method, we have to collect all the samples first and then calculates the average. If the number of samples is large, we may have to wait for a long time until all of the samples are collected.
Alternatively, we can take the second method. It is called iterative mean estimation and is incremental. Specifically, suppose that and hence,
By its definition, we know and . Meanwhile, can be expressed in terms of as Therefore, we obtain the following incremental algorithm:
This algorithm can be used to calculate the mean in an incremental manner. It can be verified that
The advantage of is that the average can be immediately calculated every time we receive a sample. This average can be used to approximate and hence .
Furthermore, consider an algorithm with a more general expression:
It is the same as except that the coefficient is replaced by . Since the expression of is not given, we are not able to obtain the explicit expression of as in . However, we will show in the next articlethat, if satisfies some mild conditions, as . This illustrates that the iterative mean estimation is a special form of RM algorithm.