Negative Log-Likelihood as a Loss Function

Posted on 2024-08-14 Edited on 2025-06-17 In Mathematics Views: 92

TL;DR: - For categorical outcomes (e.g., classification), the negative log-likelihood corresponds to the cross-entropy loss. - For continuous outcomes (e.g., regression), assuming a Gaussian distribution, the negative log-likelihood corresponds to the Mean Squared Error (MSE) loss.

Notation

Symbol	Type	Explanation
$p (x ∣ θ)$	Function	Likelihood of data $x$ under model parameters $θ$ . In VQ-VAE, $θ$ corresponds to $z$ , the quantized latent variable.
$x$	$\in R^{H \times W \times C}$	Observed data or input
$θ$	$\in R^{d}$	Parameters of the model. In VQ-VAE, $θ$ is often represented by the quantized latent variable $z$ .
$z$	$\in R^{d}$	Latent representation in the model, serving as the effective model parameters $θ$ in visually generative models such as VQ-VAE.
$f$	Function	Decoder function in visually generative models such as VQ-VAE. $f (z) = \hat{x}$ .
$\hat{x}$	$\in R^{H \times W \times C}$	Reconstructed image or output, equal to $f (z)$
$N (f (z), I)$	$\in R$	Assumed distribution of $x$ around $f (z)$ with variance $I$
$y$	$\in {1, \dots, K}$	Actual label in classification
${\hat{p}}_{y}$	$\in [0, 1]$	Model's predicted probability for the true class $y$
$K$	$\in N$	Number of classes in multi-class classification
$y_{k}$	$\in {0, 1}$	One-hot encoded true label for class $k$
$‖ \cdot ‖_{2}^{2}$	Function	Squared L2 norm. For a vector $v = (v_{1}, v_{2}, \dots, v_{n})$ , the squred L2 norm is $‖ v ‖_{2}^{2} = v_{1}^{2} + v_{2}^{2} + \dots + v_{n}^{2}$

Likelihood Function

The likelihood function $p (x ∣ θ)$ represents the probability of the observed data $x$ under the model with parameters $θ$ .

Negative Log-Likelihood and MSE

->Source

If the conditional distribution of $x$ given $z$ follows a Gaussian distribution $p (x ∣ z) \sim N (f (z), I)$ , the log-likelihood is: $\log p (x ∣ z) \propto - \frac{1}{2} ‖ x - f (z) ‖_{2}^{2} \propto - ‖ x - \hat{x} ‖_{2}^{2}$ where $\hat{x} = f (z)$ is the reconstructed image, representing the mean of the distribution.

MSE is simply the squared $L_{2}$ norm divided by the the dimension $n = H \times W \times C$ , which is contant: $MSE = \frac{1}{n} ‖ x - \hat{x} ‖_{2}^{2}$ So negative Log-Likelihood can be thought as MSE.

Negative Log-Likelihood and Cross-Entropy

For multi-class classification, the log-likelihood for a single observation is: $\log p (y ∣ θ, x) = \log {\hat{p}}_{y}$

Here, ${\hat{p}}_{y}$ is the predicted probability of the true class $y$ .

The cross-entropy loss for a single observation is $Cross-Entropy = - \log {\hat{p}}_{y}$

Thus, the cross-entropy loss is exactly the negative log-likelihood of the true class: $Cross-Entropy = - \log {\hat{p}}_{y}$