Variance-Invariance-Covariance Regularization
Sources:
- VICReg 2022 paper
- VC Reg, a follow-up paper of VICReg

Image source: https://arxiv.org/pdf/2306.13292
Variance-Invariance-Covariance Regularization
Self-supervised learning methods aim to learn meaningful representations without relying on labels. VICReg (Variance-Invariance-Covariance Regularization) is one such method, which learns representations by optimizing three key objectives: maintaining variance, reducing covariance, and ensuring invariance between augmented views of the same input.
In this article, we focus solely on the core idea of VICReg—the design of its loss function—excluding discussions about network architectures and implementation details.
Notation
Symbol | Type | Description |
---|---|---|
The |
||
Two augmented versions of the original input |
||
Function | Neural network parameterized by |
|
Representations of |
||
Batch embeddings for augmented inputs, where |
||
The |
||
Variance-covariance matrix of |
||
Variance of the |
||
Threshold for variance regularization (e.g., |
||
Overall VICReg loss function | ||
Variance loss, covariance loss, and invariance loss, respectively | ||
Hyperparameters controlling the weight of variance, covariance, and invariance terms |
Abbreviations
Abbreviation | Description |
---|---|
VICReg | Variance-Invariance-Covariance Regularization |
Cov | Covariance |
Var | Variance |
NN | Neural network |
Problem setting
We consider a batch of data
where
where
The variance-covariance matrix of
Expanding this:
VICReg loss
VICReg optimizes three goals:
High Variance: Encourage
to prevent collapse, where all embeddings become identical. For example, if all embeddings are mapped to the same vector : In this case, each dimension (e.g., ) has no variation, resulting in: To prevent this, VICReg introduces the variance loss1:Low Covariance: Minimize off-diagonal elements of
. The covariance loss: This reduces redundancy by minimizing overlap between dimensions.Invariance: Ensure embeddings
and of the same input are similar. The invariance loss: This loss term is where contrastive learning resides: pushing two positive embeddings closer.
The overall VICReg loss:
My comment
Invariance term is contrastive learning. Other two terms are heuristic tricks.
In the original paper,
is replaced by , where is a small constant for numerical stability. This is an engineering choice and does not affect the core idea. In this article, we use the simpler version for clarity and intuitiveness.↩︎