Multivariate Gaussian Distributions

Posted on 2024-01-18 Edited on 2025-06-17 In Mathematics Views: 53

Sources:

The Multivariate Gaussian Distribution

Notation

The notations of this article is exactly the same as these in Univariate Gaussian Distributions. For the multi variate case, we add additional rules:

The multivariate normal distribution of a $n$ -dimensional random vector $\vec{X} = {[X_{1}, \dots, X_{k}]}^{T}$ can be written in the following notation: $\vec{X} \sim N (\vec{μ}, Σ)$ or to make it explicitly known that $\vec{X}$ is $k$ -dimensional, $\vec{X} \sim N_{k} (\vec{μ}, Σ)$ where $\vec{μ}$ and $Σ$ are the expectation and variance of $\vec{X}$ . Since $\vec{X}$ is a random vector, $Σ$ is a variance-covariance matrix (or simply covariance matrix).
The PDF¹ $f_{\vec{X}} (\vec{x})$ is often denoted as $p_{\vec{X}} (\vec{x})$ , $f_{\vec{X}} (\vec{x}; \vec{μ}, σ^{2})$ or $p_{\vec{X}} (\vec{x}; \vec{μ}, σ^{2})$ where $\vec{X} = {[X_{1}, \dots, X_{k}]}^{T}$ . We sometimes omit the subscript $\vec{X}$ .
We use underline to show the importance of some symbols. For instance, $\underset{―}{X}$ to show the importance of $X$ .

Multivariate Gaussian distributions

Figure 1: The figure on the left shows a univariate Gaussian density for a single variable X. The figure on the right shows a multivariate Gaussian density over two variables X1 and X2.

The multivariate normal distribution of a $k$ -dimensional random vector $\vec{X} = {[X_{1}, \dots, X_{k}]}^{T}$ can be written in the following notation: $\vec{X} \sim N (\vec{μ}, Σ)$ or to make it explicitly known that $X$ is $k$ -dimensional, $\vec{X} \sim N_{k} (\vec{μ}, Σ)$ The inverse of the covariance matrix is called the precision matrix, denoted by $Q = Σ^{- 1}$ .

The PDF is: $f_{\vec{X}} (\vec{x}; μ, Σ) = \frac{1}{(2 π)^{n / 2} | Σ |^{1 / 2}} \exp (- \frac{1}{2} (\vec{x} - \vec{μ})^{T} Σ^{- 1} (\vec{x} - \vec{μ}))$

where $\exp$ denotes the exponential function.

Note:

$det (Σ)$ is the determinant of the covariance matrix.
$Σ^{- 1}$ is the inverse of the covariance matrix.

Isocontours

One way to understand a multivariate Gaussian conceptually is to understand the shape of its isocontours. For a function $f : R^{2} \to R$ , an isocontour is a set of the form ${\vec{x} \in R^{2} : f (x) = c} .$ for some $c \in R^{4}$ .

Shape of isocontours

What do the isocontours of a multivariate Gaussian look like? As before, let's consider the case where $n = 2$ , and $Σ$ is diagonal, i.e., all the random variables $X_{i}$ in $\vec{X}$ are independent (see my post).

Let's take the example of $\vec{x} = [\begin{array}{l} x_{1} \\ x_{2} \end{array}] \vec{μ} = [\begin{array}{l} μ_{1} \\ μ_{2} \end{array}] Σ = [\begin{array}{cc} σ_{1}^{2} & 0 \\ 0 & σ_{2}^{2} \end{array}]$

The PDF is $f (\vec{x}; μ, Σ) = \frac{1}{2 π σ_{1} σ_{2}} \exp (- \frac{1}{2 σ_{1}^{2}} {(x_{1} - μ_{1})}^{2} - \frac{1}{2 σ_{2}^{2}} {(x_{2} - μ_{2})}^{2}) .$

Now, let's consider the level set consisting of all points where $p (x; μ, Σ) = c$ for some constant $c \in R$ . In particular, consider the set of all $x_{1}, x_{2} \in R$ such that $\begin{aligned} c & = \frac{1}{2 π σ_{1} σ_{2}} \exp (- \frac{1}{2 σ_{1}^{2}} {(x_{1} - μ_{1})}^{2} - \frac{1}{2 σ_{2}^{2}} {(x_{2} - μ_{2})}^{2}) \\ 2 π c σ_{1} σ_{2} & = \exp (- \frac{1}{2 σ_{1}^{2}} {(x_{1} - μ_{1})}^{2} - \frac{1}{2 σ_{2}^{2}} {(x_{2} - μ_{2})}^{2}) \\ \log (2 π c σ_{1} σ_{2}) & = - \frac{1}{2 σ_{1}^{2}} {(x_{1} - μ_{1})}^{2} - \frac{1}{2 σ_{2}^{2}} {(x_{2} - μ_{2})}^{2} \\ \log (\frac{1}{2 π c σ_{1} σ_{2}}) & = \frac{1}{2 σ_{1}^{2}} {(x_{1} - μ_{1})}^{2} + \frac{1}{2 σ_{2}^{2}} {(x_{2} - μ_{2})}^{2} \\ 1 & = \frac{{(x_{1} - μ_{1})}^{2}}{2 σ_{1}^{2} \log (\frac{1}{2 π c σ_{1} σ_{2}})} + \frac{{(x_{2} - μ_{2})}^{2}}{2 σ_{2}^{2} \log (\frac{1}{2 π c σ_{1} σ_{2}})} . \end{aligned}$

Defining $r_{1} = \sqrt{2 σ_{1}^{2} \log (\frac{1}{2 π c σ_{1} σ_{2}})} r_{2} = \sqrt{2 σ_{2}^{2} \log (\frac{1}{2 π c σ_{1} σ_{2}})},$ it follows that $\begin{matrix} (1) & 1 = {(\frac{x_{1} - μ_{1}}{r_{1}})}^{2} + {(\frac{x_{2} - μ_{2}}{r_{2}})}^{2} . \end{matrix}$

Equation $(1)$ is the equation of an axis-aligned ellipse, with center $(μ_{1}, μ_{2})$ , where the $x_{1}$ axis has length $2 r_{1}$ and the $x_{2}$ axis has length $2 r_{2}$ . ## Length of axes

To get a better understanding of how the shape of the level curves vary as a function of the variances of the multivariate Gaussian distribution, suppose that we are interested in

The figure on the left shows a heatmap indicating values of the density function for an axis-aligned multivariate Gaussian with mean $μ = [\begin{array}{l} 3 \\ 2 \end{array}]$ and diagonal covariance matrix $Σ =$ $[\begin{array}{cr} 25 & 0 \\ 0 & 9 \end{array}]$ . Notice that the Gaussian is centered at $(3, 2)$ , and that the isocontours are all elliptically shaped with major/minor axis lengths in a 5:3 ratio.

The figure on the right shows a heatmap indicating values of the density function for a non axis-aligned multivariate Gaussian with mean $μ = [\begin{array}{l} 3 \\ 2 \end{array}]$ and covariance matrix $Σ = [\begin{array}{cc} 10 & 5 \\ 5 & 5 \end{array}]$ . Here, the ellipses are again centered at $(3, 2)$ , but now the major and minor axes have been rotated via a linear transformation (because its covariance matrix isn't diagonal), the values of $r_{1}$ and $r_{2}$ at which $c$ is equal to a fraction $1 / e$ of the peak height of Gaussian density.

First, observe that maximum of Equation (4) occurs where $x_{1} = μ_{1}$ and $x_{2} = μ_{2}$ . Substituting these values into Equation (4), we see that the peak height of the Gaussian density is $\frac{1}{2 π σ_{1} σ_{2}}$ . Second, we substitute $c = \frac{1}{e} (\frac{1}{2 π σ_{1} σ_{2}})$ into the equations for $r_{1}$ and $r_{2}$ to obtain $\begin{aligned} r_{1} = \sqrt{2 σ_{1}^{2} \log (\frac{1}{2 π σ_{1} σ_{2} \cdot \frac{1}{e} (\frac{1}{2 π σ_{1} σ_{2}})})} = σ_{1} \sqrt{2} \\ r_{2} = \sqrt{2 σ_{2}^{2} \log (\frac{1}{2 π σ_{1} σ_{2} \cdot \frac{1}{e} (\frac{1}{2 π σ_{1} σ_{2}})})} = σ_{2} \sqrt{2} . \end{aligned}$

From this, it follows that the axis length needed to reach a fraction 1/e of the peak height of the Gaussian density in the $i$ th dimension grows in proportion to the standard deviation $σ_{i}$ . Intuitively, this again makes sense: the smaller the variance of some random variable $x_{i}$ , the more "tightly" peaked the Gaussian distribution in that dimension, and hence the smaller the radius $r_{i}$ .

Linear Linear transformation interpretation

Theorem: Let $X \sim N (μ, Σ)$ for some $μ \in R^{n}$ and $Σ \in S_{+ +}^{n}$ . Then, there exists a matrix $B \in R^{n \times n}$ such that if we define $Z = B^{- 1} (X - μ)$ , then $Z \sim N (0, I)$ .

Proof:

As before said, if $Z \sim N (0, I)$ , then it can be thought of as a collection of $n$ independent standard normal random variables (i.e., $Z_{i} \sim N (0, 1)$ ).
Furthermore, if $Z = B^{- 1} (X - μ)$ then $X = B Z + μ$ follows from simple algebra.
Consequently, the theorem states that any random variable $X$ with a multivariate Gaussian distribution can be interpreted as the result of applying a linear transformation $(X =$ $B Z + μ)$ to some collection of $n$ independent standard normal random variables $(Z)$ .

Probability density function↩︎