of Convolutional Neural Network (CNN)-based models, making them more transparent. Our
approach-Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of
any target concept (say logits for'dog'or even a caption), flowing into the final convolutional
layer to produce a coarse localization map highlighting the important regions in the image
for predicting the concept. Unlike previous approaches, Grad-CAM is applicable to a wide …
We propose a technique for producing 'visual explanations' for decisions from a large class
of Convolutional Neural Network (CNN)-based models, making them more transparent and
explainable. Our approach—Gradient-weighted Class Activation Mapping (Grad-CAM), uses
the gradients of any target concept (say 'dog'in a classification network or a sequence of
words in captioning network) flowing into the final convolutional layer to produce a coarse
localization map highlighting the important regions in the image for predicting the concept …