Convolutional Neural Networks
TL;DR: In a convolution layer,
Sources:
The cross-correlation operation
Here is one example. In the below figure, we have:
- Input: A two-dimensional tensor with a height of 3 and width of 3.
- Kernel: A two-dimensional tensor with a height of 2 and width of 2.
- Output: A two-dimensional tensor with a height of 2 and width of 2.
When computing the cross-correlation, we start with the convolution window at the upper-left corner of the input tensor, then we slide it across the input tensor both from left to right and top to bottom.
The calculation is:
Pseudocode:
1 | def corr2d(X, K): |
The figure represents the code:
1 | x = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]]) |
The output is:
1 | tensor([[19., 25.], |
Padding
One tricky issue when applying convolutional layers is that we tend to lose pixels on the perimeter of our image. Consider Fig. 2 that depicts the pixel utilization as a function of the convolution kernel size and the position within the image. The pixels in the corners are hardly used at all.
Since we typically use small kernels, for any given convolution we might only lose a few pixels but this can add up as we apply many successive convolutional layers. One straightforward solution to this problem is to add extra pixels of filler around the boundary of our input image, thus increasing the effective size of the image. Typically, we set the values of the extra pixels to zero.
In Fig. 3, we pad a
Fig. 7.3.2 Two-dimensional cross-correlation with padding.
Stride
Stride is the number of pixels shifts over the input matrix
Fig. 4 Cross-correlation with strides of 3 and 2 for height and width, respectively.