Optimization-Methods-Lecture2

Outline:

  • Norms
  • Analysis
  • Functions
  • Derivatives
  • Linear Algebra

Norms

Inner product

  • Inner product on Rn \[ <x,y>=x^Ty=\sum\limits_{i=1}^{n}x_iy_i \in R^n \]

  • Euclidean norm, or \(l_2\) - norm \[ ||x||_2=(x^Tx)^{1/2}=(x_1^2+\dots+x_n^2)^{1/2}, x \in R^n \]

  • Cauchy-Schwartz inequality

\[ |x^Ty| \leq ||x||_2||y||_2,x,y \in R^n \]

  • Angle between nonzero vectors \(x,y \in R^n\) \[ \angle (x,y) = cos^{-1}(\frac {x^Ty}{||x||_2||y||_2}),x,y \in R^n,\angle (x,y) \in (0,\pi \]

  • Inner product on $R^{mn}, X,Y R^{mn} $​ \[ <X,Y>=\tr (X^TY)=\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n}X_{ij}Y_{ij} \] Here \(\tr()\) denotes trace of a matrix

  • Frobenius norm of a matrix \(x \in R^{m \times n}\)​ 和向量的二范数对应 \[ ||X||_F = (\tr(X^TX))^{1/2}=(\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n}X_{ij}^2)^{1/2} \]

  • Inner product on \(S^n\) \[ <X,Y>=tr(XY)=\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}X_{ij}Y_{ij} = \sum\limits_{i=1}^{n}X_{ii}Y_{ii} + 2 \sum\limits_{i<j}X_{ij}Y_{ij} \] Here \(S^n\)​ denotes symmetrical matrices on \(R^{n \times n}\)


Norms

  • A function \(f: R^n \rarr R\) with dom \(f = R^n\)is called a norm if

    • \(f\) is nonnegative: \(f(x) \geq 0\) for all \(x \in R^n\)
    • \(f\) is definite( 正定的 ): \(f(x)=0\) only if \(x=0\)
    • \(f\)​ is homogeneous( 同质的 ): \(f(tx)=|t|f(x)\) , for all \(x \in R^n\)​ and \(t \in R\)
    • \(f\) satisfies the triangle inequality: \(f(x+y) \leq f(x) + f(y)\) , for all \(x,y \in R^n\)
  • Distance

    • Between vectors \(x\) and \(y\) as the length of their difference, i.e., \[ \mathrm {dist}(x,y) = ||x-y|| \]

    没有加下标, 表示抽象的范数


  • Unit ball

    • The set of all vectors with norm less than or equal to one, \[ \Beta = {x \in R^n \mid ||x|| \leq 1} \] is called the unit ball of the norm \(||\cdot||\) ( 单位球不唯一, 还需要指定一个范数 )

    • The unit ball satisfies the following properties:

      • \(\Beta\) is symmetric about the origin, i.e., \(x \in \Beta\) if and only if $ -x $
      • \(\Beta\) is convex
      • \(\Beta\) is closed, bounded, and has nonempty interior
    • Conversely, if \(C \subseteq R^n\)​​​ is any set satisfying these three conditions, then it is the unit ball of a norm: \[ ||x|| = (\sup\{t \geq 0 | tx \in C\})^{-1} \]


  • Spme common norms on \(R^n\)

    • Sum-absolute-value, or \(l_1\) - norm \[ ||x||_1 = |x|_1 + \dots + |x|_n, x \in R^n \]

    • Chebyshev or \(l_\infty\) - norm \[ ||x||_{\infty} = \max\{|x_1|, \dots, |x_n|\} \]

    • \(l_p\) - norm, \(p \geq 1\) \[ ||x||_p = (|x_1|^p + \dots + |x_n|^p)^{1/p} \]

    • For \(P \in S_{++}^n\) ( 对称的 \(n \times n\) 的正定矩阵 ), \(P\) - quadratic norm is \[ ||x||_P = (x^TPx)^{1/2}=||P^{1/2}x||_2 \]


  • Some common norms on \(R^{m \times n}\)

    • Sum-absolute-value norm 对应向量的一范数

      \(||X||_\mathrm{sav} = \sum\limits_{i=1}^m\sum\limits_{j=1}^n |X_{ij}|\)

    • Maximum-absolute-value norm 对应向量的无穷范数 \[ ||X||_{\mathrm{mav}} = \max\{ |X_{ij}| \mid i=1, \dots,m,j=1,\dots,n \} \]


  • Equivalence of norms

    • Suppose that \(||\cdot||_a\) and \(||\cdot||_b\) are norms on \(R^n\) , there exist positive constants \(\alpha\) and \(\beta\) , for all \(x \in R^n\) \[ \alpha ||x||_a \leq ||x||_b \leq ||x||_a \]

    • if \(||\cdot||\) is any norm on \(R^n\) , then there exists a quadratic norm \(||\cdot||_P\) for which \[ ||\cdot||_P \leq ||x|| \leq \sqrt{n}{ ||x||_P } \]

    holds for all \(x\)


  • Operator norms 算子范数

    • Suppose \(||\cdot||_a\) and \(||\cdot||_b\) are norms on \(R^m\) and \(R^n\) , respectively. Operator norm of \(X \in R^{m \times n}\) induced by \(||\cdot||_a\) and \(||\cdot||_b\)​​​ is \[ ||X||_{a,b}=\sup\{||Xu||_a \mid ||u|| \leq 1\} \]

    • When \(||\cdot||_a\) and \(||\cdot||_b\) are Euclidean norms, the operator norm of \(X\) is its maximum singular value ( 最大奇异值 ) , and is denoted \(||X||_2\) \[ ||X||_2 = \sigma_{\max}(X) = ( \lambda_{\max}{ X^TX } )^1/2 \]

    Spectral norm( 谱范数 ) or \(l_2\)​ - norm

    • The norm induced by the \(l_{\infty}\)​​ - norm on \(R^m\)​ and \(R^n\)​​ , denoted \(||X||_\infty\)​ is the max-row-sum norm, \[ ||X||_\infty = \sup\{ ||Xu||_\infty \mid ||u||_\infty \leq 1 \} = \max_{i=1,\dots,m}\sum\limits_{j=1}^{n}|X_{ij}| \]

    • The norm induced by the \(l_1\)​​​ - norm on \(R^m\)​​​ and \(R^n\)​​​ , denoted \(||X||_1\)​​​​ is the max-column-sum norm, \[ ||X||_1 = \max_{j=1,\dots,n}\sum\limits_{i=1}^{m}|X_{ij}| \]

  • Dual norm 对偶范数

    • Let \[||\cdot||\] be a norm \(R^n\)

    • The associated dual norm, denoted \(||\cdot||_*\) , is defined as \[ ||z||_* = \sup \{ z^Tx \mid ||x|| \leq 1 \} \]

    • We have the inequality \[ z^Tx \leq ||x|| ||z||_* \\ \because z^T \frac{x}{||x||} \leq \sup \{ z^Tx \mid ||x|| \leq 1 \} = ||z||_* \\ \therefore z^Tx=z^T \frac {x}{||x||}\cdot||x|| \leq ||z||_*||x|| \]

    • The dual of Euclidean norm 二范数互为对偶 \[ \sup \{z^Tx \mid ||x||_2 \leq 1 \} = ||z||_2 \]

    • The dual norm of the \(l_{\infty}\) norm 无穷范数的对偶是一范数, 反之亦然 \[ \sup \{z^Tx \mid ||x||_{\infty} \leq 1 \} = ||z||_1 \]

    • The dual norm of the dual norm 对偶范数的对偶范数是其本身 \[ ||\cdot||_{*_{*}} = ||\cdot|| \]

    • The dual norm of \(l_p\) - norm is the \(l_q\) - norm such that \[ \frac {1}{p} + \frac {1}{q} = 1 \]

    • The dual of the \(l_2\) - norm on \(R^{m \times n}\)​ is the nuclear norm( 核范数 ) \[ ||Z||_{2*} = \sup \{\tr(Z^TX) \mid ||Z|| \leq 1\} \\ = \sigma_1(Z) + \dots + \sigma_r(Z) = \tr[ (Z^TZ)^{1/2} ] \]

Analysis

  • Interior and Open Set

    • An element \(x \in C \subseteq R^n\) is called an interior point of \(C\) if there exists an \(\varepsilon > 0\) for which \[ \{y \mid ||y-x||_2 \leq \varepsilon\} \subseteq C \] i.e., there exists a ball centered at \[x\] that lies entirely in \(C\)​ ( 可以用任意范数,只要找到一个范数满足即可 )

    • The set of all points interior to \(C\) is called the interior of \(C\) and is denoted \(\mathrm{int}C\)

    • A set \(C\) is open if $C=C $

  • Closed Set and Boundary

    • A set \(C \subseteq R^n\) is closed if its complement is open \[ R^n \setminus C = \{ x \in R^n \mid x \notin C \} \]

    • The closure of a set \(C\) is defined as \[ \mathrm{cl \ } {C} = R^n \setminus \mathrm{int \ }(R^n \setminus C) \]

    • The boundary of the set \(C\) is defined as \[ \mathrm {bd \ C} = \mathrm { cl \ } C \setminus \mathrm {int \ } C \]

      • \(C\) is closed if it contains its boundary. It is open if it contains no boundary points

  • Supremun and infimum

    • The least upper bound or supremum of the set \(C\) is denoted \(\sup {C}\)

    • The greatest upper bound or infimum of the set \(C\)​ is denoted \(\inf {C}\)

    \[ \inf C = -(\sup (-C)) \]

Functions

  • Notation \[ f: A \rarr B \]

    • \(\mathrm {dom} f \subseteq A\)
  • Continuity

    • A function $f:R^n R^m $​ is continuous at \(𝑥 \in \mathrm{dom} \ f\)​ if for all \(\varepsilon > 0\)​ there exists a \(\delta\)​ such that    𝑓
    • is continuous if it is continuous at every point
  • Closed functions

    • A function $f:R^n R \(​ is closed if, for each\)R$, the sublevel set \[ \{x \in \mathrm{dom}f \mid f(x) \leq \alpha\} \] is closed. This is equivalent to \(\mathrm{epi}f=\{ (x,t) \in R^{n+1} \mid x \in \mathrm{dom}f, f(x) \leq t \}\) is closed

Derivatives

  • Def

    • Suppose $f:R^n R $​ and \(x \in \mathrm {int} \ \mathrm {dom}\ f\)​​. The function \(f\)​ is differentiable( 可微的 ) at \(x\)​​ if there exists a matrix \(Df(x) \in R^{m \times n}\)​​ that satisfies \[ \lim\limits_{z \in \mathrm{dom}f, z \neq x, z\rarr x} \frac {||f(z)-f(x)-Df(x)(z-x)||_2}{||z-x||_2}=0 \] in which case we refer to \(Df(x)\) derivative (or Jacobian) of \(f\) at \(x\)
  • \(f\) is differentiable if \(\mathrm{dom}f\) is open, and it is differentiable at every point


  • Def

    • The affine function( 仿射函数 ) of \(z\)​ given by \[ f(x) + Df(x)(z-x) \] is called the first-order approximation of \(f\)​ at (or near) \(x\) \[ Df(x)_{ij}= \frac{\partial f_i(x)}{\partial x_j} , i=1,\dots,m,j=1,\dots,n \]

Gradient

  • When f is real-valued (i.e., $ f:R^n R $) the derivative \(Df(x)\) is a \(1 \times n\)​ matrix (it is a row vector). Its transpose is called the gradient of the function:

    \[ \nabla f(x)=Df(x)^T \] which is a column vector (in \(R^n\)). Its components are the partial derivatives of 𝑓: \[ \nabla f(x)_i=\frac{ \partial f(x) }{\partial x_i}, i=1,\dots,n \]

  • The first-order approximation of \(f\)​​​​ at a point $x $ can be expressed as ( the affine function of \(z\)​ ) \[ f(x) + \nabla f(x)^T(z-x) \]

常见凸函数

$$

$$

常见凹函数

  • \(log\)

随机变量的期望

概率密度函数pdf概率质量函数pmf

高斯分布/正态分布

\[ x ~ N(\miu, \sigma^2) \]

混合高斯分布

GMM

样本X的每个维度\(X_i\)服从一个高斯分布, 假设各个维度间独立同分布

那么我们有混合高斯分布\(Z = {Z_1, Z_2, \dots, Z_n}\), 每次从中属性中选择一个维度,计算其高斯分布, 最后计算出所有高斯分布,线性组合起来

两步法:

推荐阅读

《Convex optimization》