ResNet
Source:
Architecture
The popular ResNet architecture consists of three main components:
- An initial convolutional layer that downsamples the input by a factor of 2.
- A max pooling layer that further downsamples the input by a factor of 2.
- Groups (or stages) of ResNet blocks, where all blocks within each group have the same output shape. The first block of the 2nd stage onwards applies downsampling by a factor of 2.
Note: Notations such as [3,3,3]
are used to represent the ResNet block structure. [3,3,3]
indicates that there are 3 stages, with downsampling occurring in the first block of the 2nd and 3rd stages, specifically at the fourth and seventh blocks. The visualization below shows the ResNet with [3,3,3]
blocks on CIFAR-10.
The initial convolutional layer
The initial convolutional layer of Resnet has kernel size 7, stride 2, and padding 3. Therefore,
The maxpooling layer
Meanwhile, there is a maxpooling layer after it, with kernel size (or so-called window size) 3, stride 2, and padding 1. Therefore,
Stacked ResNet blocks
The ResNet with [3,3,3]
blocks on CIFAR10 is visualized below.
In the implementations, all the variant of ResNet, including ResNet18, ResNet50 and ResNet101, have 4 stages.
1 | # Source: https://github.com/matthias-wright/flaxmodels/blob/600ce8a6b6bf2926ccfc948e7c1ff35edc330d5b/flaxmodels/resnet/resnet.py#L17-L21 |
Example: ResNet 50
Initial Convolution and Max Pooling:
- The input image first goes through a
convolution with a stride of 2, which reduces the spatial dimensions by a factor of 2. - This is followed by a
max pooling layer with a stride of 2, which further reduces the spatial dimensions by a factor of 2.
- The input image first goes through a
After these initial layers, the spatial dimensions are reduced by a factor of 4.
- ResNet50 consists of 4 stages of convolutional blocks, with each stage containing multiple residual blocks.
- Downsampling occurs at the beginning of each stage (except the first stage) using a convolution with a stride of 2.
Let's break down each stage:
- Stage 1: No downsampling (input dimensions remain the same).
- Stage 2: Downsampling by a factor of 2 .
- Stage 3: Downsampling by a factor of 2 .
- Stage 4: Downsampling by a factor of 2 .
To calculate the overall downsampling ratio of ResNet50:
- Initial downsampling:
(due to initial convolution and max pooling) - Downsampling at each stage:
(due to downsampling at the beginning of stages 2 , 3, and 4)
Overall downsampling ratio: