Class Activation Mapping (CAM) Methods
Sources:
Code: Jax Feature Attribution Methods
Notation
Suppose we have a convolutional neural network (CNN) that takes an image as input and outputs a scalar target.
Symbol | Type | Explanation |
---|---|---|
Number of convolutional layers, or number of feature maps in a CNN | ||
Index of the convolutional layer, or feature map in a CNN | ||
Height, width, and number of channels of the |
||
The |
||
Set of convolutional layers or feature maps in a CNN | ||
Integer indices for height, width, and channel | ||
The activation value of the |
||
The spatial average of the |
||
The activation value of |
||
The number of classes in the CNN prediction | ||
The integer index for a class | ||
The CAM weights corresponding to |
||
The class score for |
||
The output of the CNN, i.e., the output of the softmax | ||
The Grad-CAM weights corresponding to |
CAM
Source: Grad CAM explanation by CampusAI
Suppose we want to perform a classification task with an input image and an output, such as the probability of each class (
Forward pass
The forward pass of the CNN follows these steps:
Forward Pass: The input image is passed through the CNN model to obtain the feature map
from the last convolutional layer.Global Average Pooling: For each channel of
, compute the spatial average where is a vector with the shape .Score Computation: For a given class
, the input to the softmax, , is computed as: where is the scalar weight corresponding to for . Essentially, indicates the importance of for class .Softmax Output: Finally, the output of the softmax is given by:
Generating CAM
We define
Grad-CAM
Source: Grad CAM explanation by CampusAI
CAM relies on a specific CNN architecture that includes a Global Average Pooling (GAP) layer and one Fully Connected layer before the softmax layer.
Grad-CAM extends the original CAM method, making it applicable to a broader range of CNN architectures. The Grad-CAM is defined as:
For CNN architectures like those required for CAM, i.e., CNNs with a GAP layer and a Fully Connected layer before softmax, the weights used in Grad-CAM are equivalent to those in CAM. Here is the proof:
The score
Computing the partial derivative:
Up to a proportionality constant
Meanwhile, the Grad-CAM method, by its principle, can be applied to any convolutional layer, as long as an average is taken across multiple Grad-CAM maps.
However, in the original Grad-CAM (and Guided Grad-CAM) paper from 2015, this method was only applied to the last convolutional layer. The authors note:
"Although our technique is fairly general in that it can be used to explain activations in any layer of a deep network, in this work, we focus on explaining output layer decisions only."