Class Activation Mapping (CAM) Methods
Sources:
Code: Jax Feature Attribution Methods
Notation
Suppose we have a convolutional neural network (CNN) that takes an image as input and outputs a scalar target.
| Symbol | Type | Explanation |
|---|---|---|
| Number of convolutional layers, or number of feature maps in a CNN | ||
| Index of the convolutional layer, or feature map in a CNN | ||
| Height, width, and number of channels of the |
||
| The |
||
| Set of convolutional layers or feature maps in a CNN | ||
| Integer indices for height, width, and channel | ||
| The activation value of the |
||
| The spatial average of the |
||
| The activation value of |
||
| The number of classes in the CNN prediction | ||
| The integer index for a class | ||
| The CAM weights corresponding to |
||
| The class score for |
||
| The output of the CNN, i.e., the output of the softmax | ||
| The Grad-CAM weights corresponding to |
CAM
Source: Grad CAM explanation by CampusAI
Suppose we want to perform a classification task with an input image and an output, such as the probability of each class (
Forward pass
The forward pass of the CNN follows these steps:
Forward Pass: The input image is passed through the CNN model to obtain the feature map
from the last convolutional layer.Global Average Pooling: For each channel of
, compute the spatial average where is a vector with the shape .Score Computation: For a given class
, the input to the softmax, , is computed as: where is the scalar weight corresponding to for . Essentially, indicates the importance of for class .Softmax Output: Finally, the output of the softmax is given by:
Generating CAM
We define
Grad-CAM
Source: Grad CAM explanation by CampusAI
CAM relies on a specific CNN architecture that includes a Global Average Pooling (GAP) layer and one Fully Connected layer before the softmax layer.
Grad-CAM extends the original CAM method, making it applicable to a broader range of CNN architectures. The Grad-CAM is defined as:
For CNN architectures like those required for CAM, i.e., CNNs with a GAP layer and a Fully Connected layer before softmax, the weights used in Grad-CAM are equivalent to those in CAM. Here is the proof:
The score
Computing the partial derivative:
Up to a proportionality constant
Meanwhile, the Grad-CAM method, by its principle, can be applied to any convolutional layer, as long as an average is taken across multiple Grad-CAM maps.
However, in the original Grad-CAM (and Guided Grad-CAM) paper from 2015, this method was only applied to the last convolutional layer. The authors note:
"Although our technique is fairly general in that it can be used to explain activations in any layer of a deep network, in this work, we focus on explaining output layer decisions only."