Cross Entropy Loss
Sources:
- Adjrej Karpathy's video Building makemore Part 4: Becoming a Backprop Ninja.
Cross entropy
The cross entropy loss
where
For classification tasks where there is only one true class for a sample, i.e.,
Note that, in this case,
Derivation of cross entropy
Question: What is the gradient (more strictly, the derivation) of the cross-entropy loss w.r.t. a logit
To get the gradient, we need to differentiate the loss
Using the expression for
Given
For
When
, the derivative of the softmax function w.r.t. is:When
, the derivative of the softmax function w.r.t. is:
Putting it all together, the gradient of the loss w.r.t.
If
(for the true class):If
(for the other classes):
To conclude: