Mean Estimation Algorithms

Sources:

  1. Shiyu Zhao. Chapter 5: Monte Carlo Methods. Mathematical Foundations of Reinforcement Learning
  2. Shiyu Zhao. Chapter 6: Stochastic Approximation. Mathematical Foundations of Reinforcement Learning

Mean estimation: compute E[X] using {xk} wk+1=wk1k(wkxk)

Mean estimation problem

Consider a random variable X that can take values from a finite set of real numbers denoted as X, suppose that our task is to calculate the mean or expected value1 of X, i.e., E[X].

Two approaches can be used to calculate E[X].

Model based case

The first approach is model-based. Here, the model refers to the probability distribution of X. If the model is known, then the mean can be directly calculated based on the definition of the expected value: (1)E[X]=xXp(x)x

Model free case

The second approach is model-free. When the probability distribution (i.e., the model) of X is unknown, suppose that we have some samples {x1,x2,,xn} of X. Then, the mean can be approximated as (2)E[X]x¯1nj=1nxj.

When n is small, this approximation may not be accurate. However, as n increases, the approximation becomes increasingly accurate. When n, we have x¯E[X].

This is guaranteed by the law of large numbers (L.L.N.): the average of a large number of samples is close to the expected value.

This is called the Monte Carlo method.

Law of large numbers

For a random variable X, suppose that {xi}i=1n are some i.i.d. samples. Let x¯= 1ni=1nxi be the average of the samples. Then, E[x¯]=E[X],var[x¯]=1nvar[X].

The above two equations indicate that x¯ is an unbiased estimate of E[X], and its variance decreases to zero as n increases to infinity.

The proof is given below.

First, E[x¯]=E[i=1nxi/n]=i=1nE[xi]/n=E[X], where the last equability is due to the fact that the samples are identically distributed (that is, E[xi]=E[X] ).

Second, var(x¯)=var[i=1nxi/n]=i=1nvar[xi]/n2=(nvar[X])/n2=var[X]/n, where the second equality is due to the fact that the samples are independent, and the third equability is a result of the samples being identically distributed (that is, var[xi]=var[X]).

Iterative mean estimation

There are two methods to compute equation (2). The first method is non-incremental method, we have to collect all the samples first and then calculates the average. If the number of samples is large, we may have to wait for a long time until all of the samples are collected.

Alternatively, we can take the second method. It is called iterative mean estimation and is incremental. Specifically, suppose that wk+11ki=1kxi,k=1,2, and hence, wk=1k1i=1k1xi,k=2,3,

By its definition, we know wk+1=x¯E[X] and wk+1E[X]. Meanwhile, wk+1 can be expressed in terms of wk as wk+1=1ki=1kxi=1k(i=1k1xi+xk)=1k((k1)wk+xk)=wk1k(wkxk). Therefore, we obtain the following incremental algorithm: (3)wk+1=wk1k(wkxk).

This algorithm can be used to calculate the mean x¯ in an incremental manner. It can be verified that w1=x1,w2=w111(w1x1)=x1,w3=w212(w2x2)=x112(x1x2)=12(x1+x2),w4=w313(w3x3)=13(x1+x2+x3),wk+1=1ki=1kxi

The advantage of (3) is that the average can be immediately calculated every time we receive a sample. This average can be used to approximate x¯ and hence E[X].

Furthermore, consider an algorithm with a more general expression: wk+1=wkαk(wkxk).

It is the same as (3) except that the coefficient 1/k is replaced by αk>0. Since the expression of αk is not given, we are not able to obtain the explicit expression of wk as in (???). However, we will show in the next articlethat, if {αk} satisfies some mild conditions, wkE[X] as k. This illustrates that the iterative mean estimation is a special form of RM algorithm.


  1. In this RL serie, we use the terms expected value, mean, and average interchangeably.↩︎