Random Vectors
Source:
- Random Vectors from the textbook Introduction to Probability, Statistics, and Random Processes by Hossein Pishro-Nik.
- Random Vectors and the Variance–Covariance Matrix
Notation
- The expected value of a random vector \(\vec{X}\) is often denoted by \(\mathrm{E}(\vec{X})\), \(\mathrm{E}[\vec{X}]\), \(\mathrm{E} \vec{X}\), with E also often stylized as \(\mathrm{E}\) or \(E\), or symbolically as \(\vec{\mu}_{\vec{X}}\) or simply \(\vec{\mu}\).
- The variance of random vector \(\vec{X}\) is typically designated as \(\operatorname{Var}(\vec{X})\), or sometimes as \(\operatorname{Cov}(\vec{X})\). Since the variance is a variance-covariance matrix, it's also denoted as \(\Sigma_{\vec{X}}\) or \(\Sigma\). The element at i-th row j-th column is \(\Sigma_{ij}\)
- The covariance of two random vectors \(\vec{X}\) and \(\vec{Y}\) is typically designated as \(\operatorname{Cov}(\vec{X}, \vec{Y})\). Since the covariance is a variance-covariance matrix, it's also denoted as \(\Sigma_{(\vec{X}, \vec{Y})}\).
Random vectors
Definition: A random vector \(\vec{X}\) is a vector \[ \vec{X}=\left[\begin{array}{c} X_1 \\ X_2 \\ \cdot \\ \cdot \\ X_p \end{array}\right] \] of jointly distributed random variables \(X_1, \cdots, X_p\). As is customary in linear algebra, we will write vectors as column matrices whenever convenient.
Expected value
Definition: The expectation \(E \vec{X}\) of a random vector \(\vec{X}=\left[X_1, X_2, \ldots, X_p\right]^T\) is given by \[ \mathrm{E} [\vec{X}] =\left[\begin{array}{c} \mathrm{E} [X_1] \\ \mathrm{E} [X_2] \\ \vdots \\ \mathrm{E} [X_p] \end{array}\right] \] It's also denoted as \(\vec{\mu}_{\vec{X}}\) or \(\vec{\mu}\).
Linearity of expectation
Recalling that, the expectation for random variables is a linear operator, this linearity also holds for random vectors.
The linearity properties of the expectation can be expressed compactly by stating that for any \(k \times p\)-matrix \(A\) and any \(1 \times j\)-matrix \(B\), \[ \mathrm{E}[A \vec{X}]=A \mathrm{E}[\vec{X}] \quad \text { and } \quad \mathrm{E}[\vec{X} B]=\mathrm{E} [\vec{X}] B \]
Variance
The variance of a random vector \(\vec{X}\) is represented as a matrix, called the variance-covariance matrix (or simply the covariance matrix) \[ \operatorname{Var}(\vec{X}) = \operatorname{Cov}(\vec{X}, \vec{X})=\mathrm{E}\left[(\vec{X}-\mathrm{E}[\vec{X}])(\vec{X}-\mathrm{E}[\vec{X}])^T\right] . \] It's also denoted as \(\Sigma_{\vec{X}}\), \(\Sigma\), or \(\operatorname{Cov}(\vec{X})\).
Expectation --> Variance
One important property is that, \[ \begin{aligned} \operatorname{Var}(\vec{X}) \triangleq \operatorname{Cov}(\vec{X}, \vec{X}) &= \mathrm{E} \left[ (\vec{X} - \mathrm{E}[\vec{X}]) (\vec{X} - \mathrm{E}[\vec{X}])^T \right] \\ &= \mathrm{E}[\vec{X} \vec{X}^T] - \mathrm{E}[\vec{X}] \mathrm{E}[\vec{X}]^T \end{aligned} . \] The proof is easily derived from covariance operator \(\operatorname{Cov}(\vec{X}, \vec{Y})\).
The variance–covariance matrix
The variance-covariance matrix of a random variable \(\vec{X}\) is \[ \operatorname{Cov}(\vec{X})=\left[\begin{array}{cccc} \operatorname{Var}\left(X_1\right) & \operatorname{Cov}\left(X_1, X_2\right) & \cdots & \operatorname{Cov}\left(X_1, X_p\right) \\ \operatorname{Cov}\left(X_2, X_1\right) & \operatorname{Var}\left(X_2\right) & \cdots & \operatorname{Cov}\left(X_2, X_p\right) \\ \vdots & \vdots & \ddots & \vdots \\ \operatorname{Cov}\left(X_p, X_1\right) & \operatorname{Cov}\left(X_p, X_2\right) & \cdots & \operatorname{Var}\left(X_p\right) \end{array}\right] \] Thus, \(\operatorname{Cov}(\bar{X})\) is a symmetric matrix, since \(\operatorname{Cov}(X, Y)=\operatorname{Cov}(Y, X)\).
Covariance
For two jointly distributed real-valued random vectors \(\vec{X}\) and \(\vec{Y}\), the covariance is represented as a matrix, called the variance-covariance matrix (or simply the covariance matrix) \[ \operatorname{Cov}(\vec{X}, \vec{Y})=\mathrm{E}\left[(\vec{X}-\mathrm{E}[\vec{X}])(\vec{Y}-\mathrm{E}[\vec{Y}])^T\right] \] It's also denoted as \(\Sigma_{(\vec{X}, \vec{Y})}\).
Expectation --> Covariance
\[ \begin{aligned} \operatorname{Cov}(\vec{X}, \vec{Y}) &= \mathrm{E} \left[ (\vec{X} - \mathrm{E}[\vec{X}]) (\vec{Y} - \mathrm{E}[\vec{Y}])^T \right] \\ &= \mathrm{E} \left[ \vec{X} \vec{Y}^T - \vec{X} \mathrm{E}[\vec{Y}]^T - \mathrm{E}[\vec{X}] \vec{Y}^T + \mathrm{E}[\vec{X}] \mathrm{E}[\vec{Y}]^T \right] \\ &= \mathrm{E}[\vec{X} \vec{Y}^T] - \mathrm{E}[\vec{X} \mathrm{E}[\vec{Y}]^T] - \mathrm{E}[\mathrm{E}[\vec{X}] \vec{Y}^T] + \mathrm{E}[\mathrm{E}[\vec{X}] \mathrm{E}[\vec{Y}]^T] \\ &= \mathrm{E}[\vec{X} \vec{Y}^T] - \mathrm{E}[\vec{X}] \mathrm{E}[\vec{Y}]^T - \mathrm{E}[\vec{X}] \mathrm{E}[\vec{Y}]^T + \mathrm{E}[\vec{X}] \mathrm{E}[\vec{Y}]^T \\ &= \mathrm{E}[\vec{X} \vec{Y}^T] - \mathrm{E}[\vec{X}] \mathrm{E}[\vec{Y}]^T \end{aligned} \]
Linear combinations of random variables
Consider random variables \(X_1, \ldots, X_p\). We want to find the expectation and variance of a new random variable \(L\left(X_1, \ldots, X_p\right)\) obtained as a linear combination of \(X_1, \ldots, X_p\); that is, \[ L\left(X_1, \ldots, X_p\right)=\sum_{i=1}^p a_i X_i . \]
Using vector-matrix notation we can write this in a compact way: \[ L(\vec{X})=\vec{a}^T \vec{X}, \] where \(\vec{a}^T=\left[a_1, \ldots, a_p\right]\). Then we get: \[ E[L(\vec{X})]=E\left[\vec{a}^T \vec{X}\right]=\vec{a}^T E \vec{X}, \] and \[ \begin{aligned} \operatorname{Var}[L(\vec{X})] &= \mathrm{E}[L(\vec{X}) L(\vec{X})^T] - \mathrm{E}[L(\vec{X})] \mathrm{E}[L(\vec{X})]^T \\ & =E\left[\vec{a}^T \vec{X} \vec{X}^T \vec{a}\right]-E\left(\vec{a}^T \vec{X}\right)\left[E\left(\vec{a}^T \vec{X}\right)\right]^T \\ & =\vec{a}^T E\left[\vec{X} \vec{X}^T\right] \vec{a}-\vec{a}^T E \vec{X}(E \vec{X})^T \vec{a} \\ & =\vec{a}^T\left(E\left[\vec{X} \vec{X}^T\right]-E \vec{X}(E \vec{X})^T\right) \vec{a} \\ & =\vec{a}^T \operatorname{Cov}(\vec{X}) \vec{a} \end{aligned} \]
Thus, knowing \(E \vec{X}\) and \(\operatorname{Cov}(\vec{X})\), we can easily find the expectation and variance of any linear combination of \(X_1, \ldots, X_p\).
Collary: \(\Sigma\) is positive semi-definite
Corollary: If \(\Sigma\) is the covariance matrix of a random vector, then for any constant vector \(\vec{a}\) we have \[ \vec{a}^T \Sigma \vec{a} \geq 0 . \]
That is, \(\Sigma\) satisfies the property of being a positive semi-definite (PSD) matrix.
Proof: According to the previous section, \(\vec{a}^T \Sigma \vec{a}\) is the variance of a random variable. We know that variance is always non-negative.
This suggests the question: Given a symmetric, positive semi-definite matrix, is it the covariance matrix of some random vector? The answer is yes.
#TODO
Linear transform of a random vector
Consider a random vector \(\vec{X}\) with covariance matrix \(\Sigma\). Then, for any \(k\) dimensional constant vector \(\vec{c}\) and any \(p \times k\)-matrix \(A\), the \(k\) dimensional random vector \(\vec{c}+A^T \vec{X}\) has mean \(\vec{c}+A^T E \vec{X}\) and has covariance matrix \[ \operatorname{Cov}\left(\vec{c}+A^T \vec{X}\right)=A^T \Sigma A \text {. } \]
The proof is quite simple:
Let \(\vec{Y} = \vec{c}+A^T \vec{X}\), due to the linearity expectation operator, its expectation is \[ \mathrm{E}[\vec{Y}] = \mathrm{E} [ \vec{c}+A^T \vec{X} ] = \vec{c}+ A^T \mathrm{E} [\vec{X}] . \] Thus, \[ \vec{Y} - \mathrm{E}[\vec{Y}] = (\vec{c}+A^T \vec{X}) - (\vec{c}+ A^T \mathrm{E} [\vec{X}]) = A^T (\vec{X} - \mathrm{E} [\vec{X}]) . \] Therefore, \[ \begin{aligned} \operatorname{Cov}\left(\vec{c}+A^T \vec{X}\right) & = \operatorname{Cov}\left(\vec{Y}\right) \\ &= \mathrm{E}\left[(\vec{Y}-\mathrm{E}[\vec{Y}])(\vec{Y}-\mathrm{E}[\vec{Y}])^T\right] \\ &= \mathrm{E}\left[(A^T (\vec{X} - \mathrm{E} [\vec{X}]))(A^T (\vec{X} - \mathrm{E} [\vec{X}]))^T\right] \\ &= \mathrm{E}\left[A^T (\vec{X} - \mathrm{E} [\vec{X}]) (\vec{X} - \mathrm{E} [\vec{X}])^T A \right] \\ &= A^T \mathrm{E}\left[(\vec{X} - \mathrm{E} [\vec{X}]) (\vec{X} - \mathrm{E} [\vec{X}])^T \right] A \\ &= A^T \Sigma A \end{aligned} \] Remember that \(\Sigma \triangleq \operatorname{Cov}\left(\vec{X}\right)\) .
What if all elements are independent?
If \(X_1, X_2, \ldots, X_p\) are i.i.d. (independent identically distributed), then \(\operatorname{Cov}\left(\left[X_1, X_2, \ldots, X_p\right]^T\right)\), or the covariance matrix \(\Sigma\), is a diagonal matrix with \(\sigma^2\) on the diagonal and zeros elsewhere: \[ \Sigma=\left[\begin{array}{cccc} \sigma^2 & 0 & \cdots & 0 \\ 0 & \sigma^2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma^2 \end{array}\right]=\sigma^2 I_p \] where \(I_p\) is the \(p \times p\) identity matrix.
Proof:
- The diagonal elements \(\Sigma_{i i}\) represent the variance of each \(X_i\) :
\[ \Sigma_{i i}=\operatorname{Var}\left(X_i\right)=\sigma^2 \quad \text { for all } i \]
- The off-diagonal elements \(\Sigma_{i j}\) represent the covariance between different \(X_i\) and \(X_j\). Since \(X_i\) and \(X_j\) are independent, we have:
\[ \Sigma_{i j}=\operatorname{Cov}\left(X_i, X_j\right)=0 \quad \text { for } i \neq j \]