Batch Normalization
Sources:
- Adjrej Karpathy's video Building makemore Part 4: Becoming a Backprop Ninja.
- Paper: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- PyTorch: BatchNorm1d
Batch Normalization
Given samples
is the output of the batch normalization layer. is the normalized input is a small constant added for avoiding division by 0 error. and are parameters learned during training for each feature, representing the scale and shift to be applied after normalization, respectively.
The detailed process is
$$$$
Derivation of batch norm
We have:
Since
Recall that
Therefore, we compute the derivation
$$
For the latter: $$$$
The transition from 3rd line to 4th line is because
$$
As a result: