Random Vectors

Posted on 2024-07-09 Edited on 2025-01-10 In Mathematics Views:

Source:

Random Vectors from the textbook Introduction to Probability, Statistics, and Random Processes by Hossein Pishro-Nik.
Random Vectors and the Variance–Covariance Matrix

Random Vectors

Notation

Symbol	Type	Description
\(\vec{X}\)	Random vector	A vector of jointly distributed random variables \(\vec{X} = [X_1, X_2, \ldots, X_p]^T\)
\(\mathrm{E}[\vec{X}]\)	Vector	Expectation of the random vector \(\vec{X}\)
\(\vec{\mu}_{\vec{X}}\), \(\vec{\mu}\)	Vector	Alternative notation for the expectation \(\mathrm{E}[\vec{X}]\)
\(\operatorname{Var}(\vec{X})\)	Matrix	Variance-covariance matrix (or simply covariance matrix) of \(\vec{X}\)
\(\Sigma_{\vec{X}}\), \(\Sigma\), \(\operatorname{Cov}(\vec{X})\)	Matrix	Alternative notations for the variance-covariance matrix
\(\operatorname{Cov}(\vec{X}, \vec{Y})\)	Matrix	Covariance matrix between two random vectors \(\vec{X}\) and \(\vec{Y}\)
\(A \vec{X}\)	Linear transformation	Transformation of the random vector \(\vec{X}\) by a \(k \times p\) matrix \(A\)
\(\operatorname{Cov}\left(X_i, Y_j\right)\)	Scalar	Covariance between the \(i\)-th component of \(\vec{X}\) and the \(j\)-th component of \(\vec{Y}\)
\(\left[\mathrm{E}[\vec{X}]\right]\)	Column vector	Expectation of the random vector \(\vec{X}\) expressed as a column matrix
\(\mathrm{E}[\vec{X} \vec{Y}^T]\)	Matrix	Matrix of expected pairwise products between components of \(\vec{X}\) and \(\vec{Y}\)
\(X_1, X_2, \ldots, X_p\)	Random variables	Components of the random vector \(\vec{X}\)

Distinction: Variance-Covariance Matrix vs. Covariance Matrix:

The terms variance-covariance matrix and covariance matrix are often used interchangeably in many contexts, but they can have subtle distinctions depending on the scenario:

Variance-Covariance Matrix: Refers specifically to the covariance matrix of a single random vector.
Covariance Matrix: A more general term that applies to the covariance between two random vectors.

Abbreviations

Abbreviation	Description
r.v.	Random variable

Notation

The expected value of a random vector \(\vec{X}\) is often denoted by \(\mathrm{E}(\vec{X})\), \(\mathrm{E}[\vec{X}]\), \(\mathrm{E} \vec{X}\), with E also often stylized as \(\mathrm{E}\) or \(E\), or symbolically as \(\vec{\mu}_{\vec{X}}\) or simply \(\vec{\mu}\).
The variance of random vector \(\vec{X}\) is typically designated as \(\operatorname{Var}(\vec{X})\), or sometimes as \(\operatorname{Cov}(\vec{X})\). Since the variance is a variance-covariance matrix, it's also denoted as \(\Sigma_{\vec{X}}\) or \(\Sigma\). The element at i-th row j-th column is \(\Sigma_{ij}\)
The covariance of two random vectors \(\vec{X}\) and \(\vec{Y}\) is typically designated as \(\operatorname{Cov}(\vec{X}, \vec{Y})\). Since the covariance is a variance-covariance matrix, it's also denoted as \(\Sigma_{(\vec{X}, \vec{Y})}\).

Abbrevations

Definition

Definition: A random vector \(\vec{X}\) is a vector \[ \vec{X}=\left[\begin{array}{c} X_1 \\ X_2 \\ \cdot \\ \cdot \\ X_p \end{array}\right] \] of jointly distributed random variables \(X_1, \cdots, X_p\). As is customary in linear algebra, we will write vectors as column matrices whenever convenient.

Expectation of a random vector

Definition: The expectation \(E \vec{X}\) of a random vector \(\vec{X}=\left[X_1, X_2, \ldots, X_p\right]^T\) is given by \[ \mathrm{E} [\vec{X}] =\left[\begin{array}{c} \mathrm{E} [X_1] \\ \mathrm{E} [X_2] \\ \vdots \\ \mathrm{E} [X_p] \end{array}\right] \] It's also denoted as \(\vec{\mu}_{\vec{X}}\) or \(\vec{\mu}\).

Linearity of expectation

Recalling that, the expectation for random variables is a linear operation, this linearity also holds for random vectors.

The linearity properties of the expectation can be expressed compactly by stating that for any \(k \times p\)-matrix \(A\) and any \(1 \times j\)-matrix \(B\), \[ \mathrm{E}[A \vec{X}]=A \mathrm{E}[\vec{X}] \quad \text { and } \quad \mathrm{E}[\vec{X} B]=\mathrm{E} [\vec{X}] B \]

Variance of a random vector

The variance of a random vector \(\vec{X}\) is represented as a matrix, known as the variance-covariance matrix (often simply referred to as the covariance matrix in some literature). \[ \operatorname{Var}(\vec{X}) = \operatorname{Cov}(\vec{X}, \vec{X})=\mathrm{E}\left[(\vec{X}-\mathrm{E}[\vec{X}])(\vec{X}-\mathrm{E}[\vec{X}])^T\right] . \] It's also denoted as \(\Sigma_{\vec{X}}\), \(\Sigma\), or \(\operatorname{Cov}(\vec{X})\).

Expectation --> Variance

One important property is that, \[ \begin{aligned} \operatorname{Var}(\vec{X}) \triangleq \operatorname{Cov}(\vec{X}, \vec{X}) &= \mathrm{E} \left[ (\vec{X} - \mathrm{E}[\vec{X}]) (\vec{X} - \mathrm{E}[\vec{X}])^T \right] \\ &= \mathrm{E}[\vec{X} \vec{X}^T] - \mathrm{E}[\vec{X}] \mathrm{E}[\vec{X}]^T \end{aligned} . \] The proof is easily derived from covariance operator \(\operatorname{Cov}(\vec{X}, \vec{Y})\).

Covariance between two random vectors

For two jointly distributed real-valued random vectors \(\vec{X}\) and \(\vec{Y}\), the covariance is represented as a matrix, called the covariance matrix: \[ \operatorname{Cov}(\vec{X}, \vec{Y})=\mathrm{E}\left[(\vec{X}-\mathrm{E}[\vec{X}])(\vec{Y}-\mathrm{E}[\vec{Y}])^T\right] \] It's also denoted as \(\Sigma_{(\vec{X}, \vec{Y})}\).

The covariance matrix

For two random vectors \(\vec{X}=\left[X_1, X_2, \ldots, X_p\right]^T \in \mathbb{R}^p\) and \(\vec{Y}=\left[Y_1, Y_2, \ldots, Y_q\right]^T \in \mathbb{R}^q\), their covariance matrix is a \(p \times q\) matrix defined as:

\[ \operatorname{Cov}(\vec{X}, \vec{Y})=\left[\begin{array}{cccc} \operatorname{Cov}\left(X_1, Y_1\right) & \operatorname{Cov}\left(X_1, Y_2\right) & \cdots & \operatorname{Cov}\left(X_1, Y_q\right) \\ \operatorname{Cov}\left(X_2, Y_1\right) & \operatorname{Cov}\left(X_2, Y_2\right) & \cdots & \operatorname{Cov}\left(X_2, Y_q\right) \\ \vdots & \vdots & \ddots & \vdots \\ \operatorname{Cov}\left(X_p, Y_1\right) & \operatorname{Cov}\left(X_p, Y_2\right) & \cdots & \operatorname{Cov}\left(X_p, Y_q\right) \end{array}\right] \]

Here: - \(\operatorname{Cov}\left(X_i, Y_j\right)\) represents the covariance between the random variables \(X_i(\) from \(\vec{X})\) and \(Y_j\) (from \(\vec{Y}\) ). - If \(\vec{X}=\vec{Y}\), this matrix reduces to the variance-covariance matrix of \(\vec{X}\), which is symmetric because \(\operatorname{Cov}\left(X_i, Y_j\right) = \operatorname{Cov}\left(X_j, Y_i\right)\) by the definition of covariance for random variables.

Expectation --> Covariance

\[ \begin{aligned} \operatorname{Cov}(\vec{X}, \vec{Y}) &= \mathrm{E} \left[ (\vec{X} - \mathrm{E}[\vec{X}]) (\vec{Y} - \mathrm{E}[\vec{Y}])^T \right] \\ &= \mathrm{E} \left[ \vec{X} \vec{Y}^T - \vec{X} \mathrm{E}[\vec{Y}]^T - \mathrm{E}[\vec{X}] \vec{Y}^T + \mathrm{E}[\vec{X}] \mathrm{E}[\vec{Y}]^T \right] \\ &= \mathrm{E}[\vec{X} \vec{Y}^T] - \mathrm{E}[\vec{X} \mathrm{E}[\vec{Y}]^T] - \mathrm{E}[\mathrm{E}[\vec{X}] \vec{Y}^T] + \mathrm{E}[\mathrm{E}[\vec{X}] \mathrm{E}[\vec{Y}]^T] \\ &= \mathrm{E}[\vec{X} \vec{Y}^T] - \mathrm{E}[\vec{X}] \mathrm{E}[\vec{Y}]^T - \mathrm{E}[\vec{X}] \mathrm{E}[\vec{Y}]^T + \mathrm{E}[\vec{X}] \mathrm{E}[\vec{Y}]^T \\ &= \mathrm{E}[\vec{X} \vec{Y}^T] - \mathrm{E}[\vec{X}] \mathrm{E}[\vec{Y}]^T \end{aligned} \]

Linear combinations of random variables

Consider random variables \(X_1, \ldots, X_p\). We want to find the expectation and variance of a new random variable \(L\left(X_1, \ldots, X_p\right)\) obtained as a linear combination of \(X_1, \ldots, X_p\); that is, \[ L\left(X_1, \ldots, X_p\right)=\sum_{i=1}^p a_i X_i . \]

Using vector-matrix notation we can write this in a compact way: \[ L(\vec{X})=\vec{a}^T \vec{X}, \] where \(\vec{a}^T=\left[a_1, \ldots, a_p\right]\). Then we get: \[ E[L(\vec{X})]=E\left[\vec{a}^T \vec{X}\right]=\vec{a}^T E \vec{X}, \] and \[ \begin{aligned} \operatorname{Var}[L(\vec{X})] &= \mathrm{E}[L(\vec{X}) L(\vec{X})^T] - \mathrm{E}[L(\vec{X})] \mathrm{E}[L(\vec{X})]^T \\ & =E\left[\vec{a}^T \vec{X} \vec{X}^T \vec{a}\right]-E\left(\vec{a}^T \vec{X}\right)\left[E\left(\vec{a}^T \vec{X}\right)\right]^T \\ & =\vec{a}^T E\left[\vec{X} \vec{X}^T\right] \vec{a}-\vec{a}^T E \vec{X}(E \vec{X})^T \vec{a} \\ & =\vec{a}^T\left(E\left[\vec{X} \vec{X}^T\right]-E \vec{X}(E \vec{X})^T\right) \vec{a} \\ & =\vec{a}^T \operatorname{Cov}(\vec{X}) \vec{a} \end{aligned} \]

Thus, knowing \(E \vec{X}\) and \(\operatorname{Cov}(\vec{X})\), we can easily find the expectation and variance of any linear combination of \(X_1, \ldots, X_p\).

Collary: \(\Sigma\) is positive semi-definite

Corollary: If \(\Sigma\) is the covariance matrix of a random vector, then for any constant vector \(\vec{a}\) we have \[ \vec{a}^T \Sigma \vec{a} \geq 0 . \]

That is, \(\Sigma\) satisfies the property of being a positive semi-definite (PSD) matrix.

Proof: According to the previous section, \(\vec{a}^T \Sigma \vec{a}\) is the variance of a random variable. We know that variance is always non-negative.

This suggests the question: Given a symmetric, positive semi-definite matrix, is it the covariance matrix of some random vector? The answer is yes.

#TODO

Linear transform of a random vector

Consider a random vector \(\vec{X}\) with covariance matrix \(\Sigma\). Then, for any \(k\) dimensional constant vector \(\vec{c}\) and any \(p \times k\)-matrix \(A\), the \(k\) dimensional random vector \(\vec{c}+A^T \vec{X}\) has mean \(\vec{c}+A^T E \vec{X}\) and has covariance matrix \[ \operatorname{Cov}\left(\vec{c}+A^T \vec{X}\right)=A^T \Sigma A \text {. } \]

The proof is quite simple:

Let \(\vec{Y} = \vec{c}+A^T \vec{X}\), due to the linearity expectation operator, its expectation is \[ \mathrm{E}[\vec{Y}] = \mathrm{E} [ \vec{c}+A^T \vec{X} ] = \vec{c}+ A^T \mathrm{E} [\vec{X}] . \] Thus, \[ \vec{Y} - \mathrm{E}[\vec{Y}] = (\vec{c}+A^T \vec{X}) - (\vec{c}+ A^T \mathrm{E} [\vec{X}]) = A^T (\vec{X} - \mathrm{E} [\vec{X}]) . \] Therefore, \[ \begin{aligned} \operatorname{Cov}\left(\vec{c}+A^T \vec{X}\right) & = \operatorname{Cov}\left(\vec{Y}\right) \\ &= \mathrm{E}\left[(\vec{Y}-\mathrm{E}[\vec{Y}])(\vec{Y}-\mathrm{E}[\vec{Y}])^T\right] \\ &= \mathrm{E}\left[(A^T (\vec{X} - \mathrm{E} [\vec{X}]))(A^T (\vec{X} - \mathrm{E} [\vec{X}]))^T\right] \\ &= \mathrm{E}\left[A^T (\vec{X} - \mathrm{E} [\vec{X}]) (\vec{X} - \mathrm{E} [\vec{X}])^T A \right] \\ &= A^T \mathrm{E}\left[(\vec{X} - \mathrm{E} [\vec{X}]) (\vec{X} - \mathrm{E} [\vec{X}])^T \right] A \\ &= A^T \Sigma A \end{aligned} \] Remember that \(\Sigma \triangleq \operatorname{Cov}\left(\vec{X}\right)\) .

What if all elements are independent?

If \(X_1, X_2, \ldots, X_p\) are i.i.d. (independent identically distributed), then \(\operatorname{Cov}\left(\left[X_1, X_2, \ldots, X_p\right]^T\right)\), or the covariance matrix \(\Sigma\), is a diagonal matrix with \(\sigma^2\) on the diagonal and zeros elsewhere: \[ \Sigma=\left[\begin{array}{cccc} \sigma^2 & 0 & \cdots & 0 \\ 0 & \sigma^2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma^2 \end{array}\right]=\sigma^2 I_p \] where \(I_p\) is the \(p \times p\) identity matrix.

Proof:

The diagonal elements \(\Sigma_{i i}\) represent the variance of each \(X_i\) :

\[ \Sigma_{i i}=\operatorname{Var}\left(X_i\right)=\sigma^2 \quad \text { for all } i \]

The off-diagonal elements \(\Sigma_{i j}\) represent the covariance between different \(X_i\) and \(X_j\). Since \(X_i\) and \(X_j\) are independent, we have:

\[ \Sigma_{i j}=\operatorname{Cov}\left(X_i, X_j\right)=0 \quad \text { for } i \neq j \]