Negative Log-Likelihood as a Loss Function

TL;DR: - For categorical outcomes (e.g., classification), the negative log-likelihood corresponds to the cross-entropy loss. - For continuous outcomes (e.g., regression), assuming a Gaussian distribution, the negative log-likelihood corresponds to the Mean Squared Error (MSE) loss.

Notation

Symbol Type Explanation
p(xθ) Function Likelihood of data x under model parameters θ. In VQ-VAE, θ corresponds to z, the quantized latent variable.
x RH×W×C Observed data or input
θ Rd Parameters of the model. In VQ-VAE, θ is often represented by the quantized latent variable z.
z Rd Latent representation in the model, serving as the effective model parameters θ in visually generative models such as VQ-VAE.
f Function Decoder function in visually generative models such as VQ-VAE. f(z)=x^.
x^ RH×W×C Reconstructed image or output, equal to f(z)
N(f(z),I) R Assumed distribution of x around f(z) with variance I
y {1,,K} Actual label in classification
p^y [0,1] Model's predicted probability for the true class y
K N Number of classes in multi-class classification
yk {0,1} One-hot encoded true label for class k
22 Function Squared L2 norm. For a vector v=(v1,v2,,vn), the squred L2 norm is v22=v12+v22++vn2

Likelihood Function

The likelihood function p(xθ) represents the probability of the observed data x under the model with parameters θ.

Negative Log-Likelihood and MSE

->Source

If the conditional distribution of x given z follows a Gaussian distribution p(xz)N(f(z),I), the log-likelihood is: logp(xz)12xf(z)22xx^22 where x^=f(z) is the reconstructed image, representing the mean of the distribution.

MSE is simply the squared L2 norm divided by the the dimension n=H×W×C, which is contant: MSE=1nxx^22 So negative Log-Likelihood can be thought as MSE.

Negative Log-Likelihood and Cross-Entropy

For multi-class classification, the log-likelihood for a single observation is: logp(yθ,x)=logp^y

Here, p^y is the predicted probability of the true class y.

The cross-entropy loss for a single observation is  Cross-Entropy =logp^y

Thus, the cross-entropy loss is exactly the negative log-likelihood of the true class:  Cross-Entropy =logp^y