Self-attention Mechanism
Sources:
- Transformer from scratch
- Attention Is All You Need 2017 paper
TL;DR: - For categorical outcomes (e.g., classification), the negative log-likelihood corresponds to the cross-entropy loss. - For continuous outcomes (e.g., regression), assuming a Gaussian distribution, the negative log-likelihood corresponds to the Mean Squared Error (MSE) loss.
This post elaborates the boot process of a Linux system. For details, read the bootup
man page or Click here to read it online.