Conventions for Notation in Probability Theory and Statistics

在不同的统计学(包括概率论, 信息论, 随机过程)教材中常常会见到不同的符号约定, 这里整理一下并作出说明.

Letters

  • Samples are denoted as lower-case italicized Roman letters, such as \(x, y, z\).
  • Sample Spaces (or alphabets) are denoted by upper-case calligraphic fonts, such as \(\mathcal X, \mathcal Y, \mathcal Z\).
  • Random variables are denoted by upper-case italicized Roman letters, such as \(X, Y, Z\).
    • 随机变量一般采用字母表最后几个字母(例如\(X, Y, Z\)), 很少用 \(A\), \(B\), \(C\).

Stochastic process

  • A stochastic process \(X^n\) or \(\{X_i\}^n\) is an indexed sequence of \(n\) random variables \(X_i\): \[ X^n = (X_1, X_2, \cdots, X^n{}) \]

    • In general, there can be an arbitrary dependence among the random variables.
    • \(\operatorname{Pr}\left\{\left(X_1, X_2, \ldots, X_n\right)=\left(x_1, x_2, \ldots, x_n\right)\right\}=p\left(x_1, x_2, \ldots, x_n\right).\)

Probability function

  • \(\mathbb P\): the probability measurement funcition, also denoted as \(p, \text{Pr}, Pr, \text{P}, P\).

    • 类似的, 各种大写字母符号和其\mathbb版本也是互换的, 比如随机变量\(X\)的数学期望\(E(X) \triangleq \mathbb E(X)\).
  • 概率度量函数\(\mathbb P(E)\)的参数\(E\)事件(event), \(E \in \mathcal F\). 在概率空间\(\left(\mathbb{R}, \mathcal B, \mu_X\right)\) 中, \(B\) 就是一个事件(\(B \in \mathcal B\)), 因此有\(\mu_X(B)\). 同样的, 如果随机变量\(X, Y\)的取值为\(\mathcal X, \mathcal Y\), 那么\(\mathcal X, \mathcal Y\)也就是新的事件空间, 其中的元素\(x \in \mathcal X, y \in \mathcal Y\)也是事件, 因此有\(\mathbb P(x), \mathbb P(y)\).

  • 当然了, 我们总是用statement来定义事件, 比如用\(X=x\)这个statement指代事件\(x\), 因此\(\mathbb P(X=x) \triangleq \mathbb P(x)\).

    • \(p(x)\) is the short hand for \(\mathbb P(x), p(X = x), p_X(x), \mu_X(x), \mu_X(B)\).
    • \(p(x|y)\) is the short hand for \(p(X = x|Y = y)\).
  • 有时参数的圆括号会被写成方括号: \(p(X=x) \triangleq p[X=x]\).

Entropy

  • Given a probability distribution \(p\) and a random variable \(X\), \(X \sim p\), then \(H(X)\) can also be expressed as \(H(p)\).

    Therefore, the entropy of all random variables \(X,Y,Z,…\) that follow the distribution \(p\) is \(H(p)\). This is unambiguous because random variables that follow the same probability distribution have the same entropy.

  • Also, given r.v. \(X_1, X_2, \ldots\) iid \(\sim X\), where we use \(X\) to denote one PMF of arbitary \(X_i\), then we can use \(H(X)\) to denote the entropy of arbitrary \(X_i\).

  • \(\mathbb{E}_{X \sim p}(X)\): denotes that \(X \sim p\).

Abbrevation

  • w.p.: with probability.
  • r.v.: random variable.
  • w.r.t.: with respect to.