Multivariate Gaussian Distributions

Sources:

  1. The Multivariate Gaussian Distribution

Notation

The notations of this article is exactly the same as these in Univariate Gaussian Distributions. For the multi variate case, we add additional rules:

  1. The multivariate normal distribution of a n-dimensional random vector X=[X1,,Xk]T can be written in the following notation: XN(μ,Σ) or to make it explicitly known that X is k-dimensional, XNk(μ,Σ) where μ and Σ are the expectation and variance of X. Since X is a random vector, Σ is a variance-covariance matrix (or simply covariance matrix).

  2. The PDF1 fX(x) is often denoted as pX(x), fX(x;μ,σ2) or pX(x;μ,σ2) where X=[X1,,Xk]T. We sometimes omit the subscript X.

  3. We use underline to show the importance of some symbols. For instance, X to show the importance of X.

Multivariate Gaussian distributions

Figure 1

Figure 1: The figure on the left shows a univariate Gaussian density for a single variable X. The figure on the right shows a multivariate Gaussian density over two variables X1 and X2.

The multivariate normal distribution of a k-dimensional random vector X=[X1,,Xk]T can be written in the following notation: XN(μ,Σ) or to make it explicitly known that X is k-dimensional, XNk(μ,Σ) The inverse of the covariance matrix is called the precision matrix, denoted by Q=Σ1.

The PDF is: fX(x;μ,Σ)=1(2π)n/2|Σ|1/2exp(12(xμ)TΣ1(xμ))

where exp denotes the exponential function.

Note:

  • det(Σ) is the determinant of the covariance matrix.
  • Σ1 is the inverse of the covariance matrix.

Isocontours

One way to understand a multivariate Gaussian conceptually is to understand the shape of its isocontours. For a function f:R2R, an isocontour is a set of the form {xR2:f(x)=c}. for some cR4.

Shape of isocontours

What do the isocontours of a multivariate Gaussian look like? As before, let's consider the case where n=2, and Σ is diagonal, i.e., all the random variables Xi in X are independent (see my post).

Let's take the example of x=[x1x2]μ=[μ1μ2]Σ=[σ1200σ22]

The PDF is f(x;μ,Σ)=12πσ1σ2exp(12σ12(x1μ1)212σ22(x2μ2)2).

Now, let's consider the level set consisting of all points where p(x;μ,Σ)=c for some constant cR. In particular, consider the set of all x1,x2R such that c=12πσ1σ2exp(12σ12(x1μ1)212σ22(x2μ2)2)2πcσ1σ2=exp(12σ12(x1μ1)212σ22(x2μ2)2)log(2πcσ1σ2)=12σ12(x1μ1)212σ22(x2μ2)2log(12πcσ1σ2)=12σ12(x1μ1)2+12σ22(x2μ2)21=(x1μ1)22σ12log(12πcσ1σ2)+(x2μ2)22σ22log(12πcσ1σ2).

Defining r1=2σ12log(12πcσ1σ2)r2=2σ22log(12πcσ1σ2), it follows that (1)1=(x1μ1r1)2+(x2μ2r2)2.

Equation (1) is the equation of an axis-aligned ellipse, with center (μ1,μ2), where the x1 axis has length 2r1 and the x2 axis has length 2r2. ## Length of axes

Figure 2

To get a better understanding of how the shape of the level curves vary as a function of the variances of the multivariate Gaussian distribution, suppose that we are interested in

The figure on the left shows a heatmap indicating values of the density function for an axis-aligned multivariate Gaussian with mean μ=[32] and diagonal covariance matrix Σ= [25009]. Notice that the Gaussian is centered at (3,2), and that the isocontours are all elliptically shaped with major/minor axis lengths in a 5:3 ratio.

The figure on the right shows a heatmap indicating values of the density function for a non axis-aligned multivariate Gaussian with mean μ=[32] and covariance matrix Σ=[10555]. Here, the ellipses are again centered at (3,2), but now the major and minor axes have been rotated via a linear transformation (because its covariance matrix isn't diagonal), the values of r1 and r2 at which c is equal to a fraction 1/e of the peak height of Gaussian density.

First, observe that maximum of Equation (4) occurs where x1=μ1 and x2=μ2. Substituting these values into Equation (4), we see that the peak height of the Gaussian density is 12πσ1σ2. Second, we substitute c=1e(12πσ1σ2) into the equations for r1 and r2 to obtain r1=2σ12log(12πσ1σ21e(12πσ1σ2))=σ12r2=2σ22log(12πσ1σ21e(12πσ1σ2))=σ22.

From this, it follows that the axis length needed to reach a fraction 1/e of the peak height of the Gaussian density in the i th dimension grows in proportion to the standard deviation σi. Intuitively, this again makes sense: the smaller the variance of some random variable xi, the more "tightly" peaked the Gaussian distribution in that dimension, and hence the smaller the radius ri.

Linear Linear transformation interpretation

Theorem: Let XN(μ,Σ) for some μRn and ΣS++n. Then, there exists a matrix BRn×n such that if we define Z=B1(Xμ), then ZN(0,I).

Proof:

  1. As before said, if ZN(0,I), then it can be thought of as a collection of n independent standard normal random variables (i.e., ZiN(0,1) ).
  2. Furthermore, if Z=B1(Xμ) then X=BZ+μ follows from simple algebra.
  3. Consequently, the theorem states that any random variable X with a multivariate Gaussian distribution can be interpreted as the result of applying a linear transformation (X= BZ+μ) to some collection of n independent standard normal random variables (Z).

  1. Probability density function↩︎