项目作者: yizt

项目描述 :
pytorch实现Grad-CAM和Grad-CAM++,可以可视化任意分类网络的Class Activation Map (CAM)图,包括自定义的网络;同时也实现了目标检测faster r-cnn和retinanet两个网络的CAM图;欢迎试用、关注并反馈问题...
高级语言: Python
项目地址: git://github.com/yizt/Grad-CAM.pytorch.git


Grad-CAM.pytorch

​ pytorch 实现Grad-CAM:Visual Explanations from Deep Networks via Gradient-based Localization

Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks

  1. 依赖
  2. 使用方法
  3. 样例分析

    3.1 单个对象

    3.3 多个对象
  4. 总结
  5. 目标检测-faster-r-cnn

    5.1 detectron2安装

    5.2 测试

    5.3 Grad-CAM结果

    5.4 总结
  6. 目标检测-retinanet

    6.1 detectron2安装

    6.2 测试

    6.3 Grad-CAM结果

    6.4 总结
  7. 目标检测-fcos

    7.1 AdelaiDet安装

    7.2 测试

    7.3 Grad-CAM结果

    7.4 总结

Grad-CAM整体架构

Grad-CAM++与Grad-CAM的异同

依赖

  1. python 3.6.x
  2. pytoch 1.0.1+
  3. torchvision 0.2.2
  4. opencv-python
  5. matplotlib
  6. scikit-image
  7. numpy

使用方法

  1. python main.py --image-path examples/pic1.jpg \
  2. --network densenet121 \
  3. --weight-path /opt/pretrained_model/densenet121-a639ec97.pth

参数说明

  • image-path:需要可视化的图像路径(可选,默认./examples/pic1.jpg)

  • network: 网络名称(可选,默认resnet50)

  • weight-path: 网络对应的与训练参数权重路径(可选,默认从pytorch官网下载对应的预训练权重)
  • layer-name: Grad-CAM使用的层名(可选,默认最后一个卷积层)
  • class-id:Grad-CAM和Guided Back Propagation反向传播使用的类别id(可选,默认网络预测的类别)
  • output-dir:可视化结果图像保存目录(可选,默认results目录)

样例分析

单个对象

原始图像

效果

network HeatMap Grad-CAM HeatMap++ Grad-CAM++ Guided backpropagation Guided Grad-CAM
vgg16
vgg19
resnet50
resnet101
densenet121
inception_v3
mobilenet_v2
shufflenet_v2

多个对象

​ 对应多个图像Grad-CAM++比Grad-CAM覆盖要更全面一些,这也是Grad-CAM++最主要的优势

原始图像

效果

network HeatMap Grad-CAM HeatMap++ Grad-CAM++ Guided backpropagation Guided Grad-CAM
vgg16
vgg19
resnet50
resnet101
densenet121
inception_v3
mobilenet_v2
shufflenet_v2

总结

  • vgg模型的Grad-CAM并没有覆盖整个对象,相对来说resnet和denset覆盖更全,特别是densenet;从侧面说明就模型的泛化和鲁棒性而言densenet>resnet>vgg
  • Grad-CAM++相对于Grad-CAM也是覆盖对象更全面,特别是对于同一个类别有多个实例的情况下,Grad-CAM可能只覆盖部分对象,Grad-CAM++基本覆盖所有对象;但是这仅仅对于vgg而言,想densenet直接使用Grad-CAM也基本能够覆盖所有对象
  • MobileNet V2的Grad-CAM覆盖也很全面
  • Inception V3和MobileNet V2的Guided backpropagation图轮廓很模糊,但是ShuffleNet V2的轮廓则比较清晰

目标检测-faster-r-cnn

​ 有位网友SHAOSIHAN问道怎样在目标检测中使用Grad-CAM;在Grad-CAM和Grad-CAM++论文中都没有提及对目标检测生成CAM图。我想主要有两个原因:

a) 目标检测不同于分类,分类网络只有一个分类损失,而且所有网络都是一样的(几个类别最后一层就是几个神经元),最后的预测输出都是单一的类别得分分布。目标检测则不同,输出都不是单一的,而且不同的网络如Faster R-CNN, CornerNet,CenterNet,FCOS,它们的建模方式不一样,输出的含义都不相同。所以不会有统一的生成Grad-CAM图的方法。

b) 分类属于弱监督,通过CAM可以了解网络预测时主要关注的空间位置,也就是”看哪里”,对分析问题有实际的价值;而目标检测,本身是强监督,预测边框就直接指示了“看哪里”。

​ 这里以detetron2中的faster-rcnn网络为例,生成Grad-CAM图。主要思路是直接获取预测分值最高的边框;将该边框的预测分值反向传播梯度到,该边框对应的proposal 边框的feature map上,生成此feature map的CAM图。

detectron2安装

a) 下载

  1. git clone https://github.com/facebookresearch/detectron2.git

b) 修改detectron2/modeling/roi_heads/fast_rcnn.py文件中的fast_rcnn_inference_single_image函数,主要是增加索引号,记录分值高的预测边框是由第几个proposal边框生成的;修改后的fast_rcnn_inference_single_image函数如下:

  1. def fast_rcnn_inference_single_image(
  2. boxes, scores, image_shape, score_thresh, nms_thresh, topk_per_image
  3. ):
  4. """
  5. Single-image inference. Return bounding-box detection results by thresholding
  6. on scores and applying non-maximum suppression (NMS).
  7. Args:
  8. Same as `fast_rcnn_inference`, but with boxes, scores, and image shapes
  9. per image.
  10. Returns:
  11. Same as `fast_rcnn_inference`, but for only one image.
  12. """
  13. valid_mask = torch.isfinite(boxes).all(dim=1) & torch.isfinite(scores).all(dim=1)
  14. indices = torch.arange(start=0, end=scores.shape[0], dtype=int)
  15. indices = indices.expand((scores.shape[1], scores.shape[0])).T
  16. if not valid_mask.all():
  17. boxes = boxes[valid_mask]
  18. scores = scores[valid_mask]
  19. indices = indices[valid_mask]
  20. scores = scores[:, :-1]
  21. indices = indices[:, :-1]
  22. num_bbox_reg_classes = boxes.shape[1] // 4
  23. # Convert to Boxes to use the `clip` function ...
  24. boxes = Boxes(boxes.reshape(-1, 4))
  25. boxes.clip(image_shape)
  26. boxes = boxes.tensor.view(-1, num_bbox_reg_classes, 4) # R x C x 4
  27. # Filter results based on detection scores
  28. filter_mask = scores > score_thresh # R x K
  29. # R' x 2. First column contains indices of the R predictions;
  30. # Second column contains indices of classes.
  31. filter_inds = filter_mask.nonzero()
  32. if num_bbox_reg_classes == 1:
  33. boxes = boxes[filter_inds[:, 0], 0]
  34. else:
  35. boxes = boxes[filter_mask]
  36. scores = scores[filter_mask]
  37. indices = indices[filter_mask]
  38. # Apply per-class NMS
  39. keep = batched_nms(boxes, scores, filter_inds[:, 1], nms_thresh)
  40. if topk_per_image >= 0:
  41. keep = keep[:topk_per_image]
  42. boxes, scores, filter_inds = boxes[keep], scores[keep], filter_inds[keep]
  43. indices = indices[keep]
  44. result = Instances(image_shape)
  45. result.pred_boxes = Boxes(boxes)
  46. result.scores = scores
  47. result.pred_classes = filter_inds[:, 1]
  48. result.indices = indices
  49. return result, filter_inds[:, 0]

c) 安装;如遇到问题,请参考detectron2;不同操作系统安装有差异

  1. cd detectron2
  2. pip install -e .

测试

a) 预训练模型下载

  1. wget https://dl.fbaipublicfiles.com/detectron2/PascalVOC-Detection/faster_rcnn_R_50_C4/142202221/model_final_b1acc2.pkl

b) 测试Grad-CAM图像生成

​ 在本工程目录下执行如下命令

  1. export KMP_DUPLICATE_LIB_OK=TRUE
  2. python detection/demo.py --config-file detection/faster_rcnn_R_50_C4.yaml \
  3. --input ./examples/pic1.jpg \
  4. --opts MODEL.WEIGHTS /Users/yizuotian/pretrained_model/model_final_b1acc2.pkl MODEL.DEVICE cpu

Grad-CAM结果

原始图像 检测边框 Grad-CAM HeatMap Grad-CAM++ HeatMap 边框预测类别
Dog
Aeroplane
Person
Horse

总结

​ 对于目标检测Grad-CAM++的效果并没有比Grad-CAM效果好,推测目标检测中预测边框已经是单个对象了,Grad-CAM++在多个对象的情况下优于Grad-CAM

目标检测-retinanet

​ 在目标检测网络faster r-cnn的Grad-CAM完成后,有两位网友abhigoku10wangzyon问道怎样在retinanet中实现Grad-CAM。retinanet与faster r-cnn网络结构不同,CAM的生成也有一些差异;以下是详细的过程:

detectron2安装

a) 下载

  1. git clone https://github.com/facebookresearch/detectron2.git

b) 修改detectron2/modeling/meta_arch/retinanet.py 文件中的inference_single_image函数,主要是增加feature level 索引,记录分值高的预测边框是由第几层feature map生成的;修改后的inference_single_image函数如下:

  1. def inference_single_image(self, box_cls, box_delta, anchors, image_size):
  2. """
  3. Single-image inference. Return bounding-box detection results by thresholding
  4. on scores and applying non-maximum suppression (NMS).
  5. Arguments:
  6. box_cls (list[Tensor]): list of #feature levels. Each entry contains
  7. tensor of size (H x W x A, K)
  8. box_delta (list[Tensor]): Same shape as 'box_cls' except that K becomes 4.
  9. anchors (list[Boxes]): list of #feature levels. Each entry contains
  10. a Boxes object, which contains all the anchors for that
  11. image in that feature level.
  12. image_size (tuple(H, W)): a tuple of the image height and width.
  13. Returns:
  14. Same as `inference`, but for only one image.
  15. """
  16. boxes_all = []
  17. scores_all = []
  18. class_idxs_all = []
  19. feature_level_all = []
  20. # Iterate over every feature level
  21. for i, (box_cls_i, box_reg_i, anchors_i) in enumerate(zip(box_cls, box_delta, anchors)):
  22. # (HxWxAxK,)
  23. box_cls_i = box_cls_i.flatten().sigmoid_()
  24. # Keep top k top scoring indices only.
  25. num_topk = min(self.topk_candidates, box_reg_i.size(0))
  26. # torch.sort is actually faster than .topk (at least on GPUs)
  27. predicted_prob, topk_idxs = box_cls_i.sort(descending=True)
  28. predicted_prob = predicted_prob[:num_topk]
  29. topk_idxs = topk_idxs[:num_topk]
  30. # filter out the proposals with low confidence score
  31. keep_idxs = predicted_prob > self.score_threshold
  32. predicted_prob = predicted_prob[keep_idxs]
  33. topk_idxs = topk_idxs[keep_idxs]
  34. anchor_idxs = topk_idxs // self.num_classes
  35. classes_idxs = topk_idxs % self.num_classes
  36. box_reg_i = box_reg_i[anchor_idxs]
  37. anchors_i = anchors_i[anchor_idxs]
  38. # predict boxes
  39. predicted_boxes = self.box2box_transform.apply_deltas(box_reg_i, anchors_i.tensor)
  40. boxes_all.append(predicted_boxes)
  41. scores_all.append(predicted_prob)
  42. class_idxs_all.append(classes_idxs)
  43. feature_level_all.append(torch.ones_like(classes_idxs) * i)
  44. boxes_all, scores_all, class_idxs_all, feature_level_all = [
  45. cat(x) for x in [boxes_all, scores_all, class_idxs_all, feature_level_all]
  46. ]
  47. keep = batched_nms(boxes_all, scores_all, class_idxs_all, self.nms_threshold)
  48. keep = keep[: self.max_detections_per_image]
  49. result = Instances(image_size)
  50. result.pred_boxes = Boxes(boxes_all[keep])
  51. result.scores = scores_all[keep]
  52. result.pred_classes = class_idxs_all[keep]
  53. result.feature_levels = feature_level_all[keep]
  54. return result

c) 修改detectron2/modeling/meta_arch/retinanet.py 文件增加predict函数,具体如下:

  1. def predict(self, batched_inputs):
  2. """
  3. Args:
  4. batched_inputs: a list, batched outputs of :class:`DatasetMapper` .
  5. Each item in the list contains the inputs for one image.
  6. For now, each item in the list is a dict that contains:
  7. * image: Tensor, image in (C, H, W) format.
  8. * instances: Instances
  9. Other information that's included in the original dicts, such as:
  10. * "height", "width" (int): the output resolution of the model, used in inference.
  11. See :meth:`postprocess` for details.
  12. Returns:
  13. dict[str: Tensor]:
  14. mapping from a named loss to a tensor storing the loss. Used during training only.
  15. """
  16. images = self.preprocess_image(batched_inputs)
  17. features = self.backbone(images.tensor)
  18. features = [features[f] for f in self.in_features]
  19. box_cls, box_delta = self.head(features)
  20. anchors = self.anchor_generator(features)
  21. results = self.inference(box_cls, box_delta, anchors, images.image_sizes)
  22. processed_results = []
  23. for results_per_image, input_per_image, image_size in zip(
  24. results, batched_inputs, images.image_sizes
  25. ):
  26. height = input_per_image.get("height", image_size[0])
  27. width = input_per_image.get("width", image_size[1])
  28. r = detector_postprocess(results_per_image, height, width)
  29. processed_results.append({"instances": r})
  30. return processed_results

d) 安装;如遇到问题,请参考detectron2;不同操作系统安装有差异

  1. cd detectron2
  2. pip install -e .

测试

a) 预训练模型下载

  1. wget https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_50_FPN_3x/137849486/model_final_4cafe0.pkl

b) 测试Grad-CAM图像生成

​ 在本工程目录下执行如下命令:

  1. export KMP_DUPLICATE_LIB_OK=TRUE
  2. python detection/demo_retinanet.py --config-file detection/retinanet_R_50_FPN_3x.yaml \
  3. --input ./examples/pic1.jpg \
  4. --layer-name head.cls_subnet.0 \
  5. --opts MODEL.WEIGHTS /Users/yizuotian/pretrained_model/model_final_4cafe0.pkl MODEL.DEVICE cpu

Grad-CAM结果

图像1 图像2 图像3 图像4
原图
预测边框
GradCAM-cls_subnet.0
GradCAM-cls_subnet.1
GradCAM-cls_subnet.2
GradCAM-cls_subnet.3
GradCAM-cls_subnet.4
GradCAM-cls_subnet.5
GradCAM-cls_subnet.6
GradCAM-cls_subnet.7
GradCAM++-cls_subnet.0
GradCAM++-cls_subnet.1
GradCAM++-cls_subnet.2
GradCAM++-cls_subnet.3
GradCAM++-cls_subnet.4
GradCAM++-cls_subnet.5
GradCAM++-cls_subnet.6
GradCAM++-cls_subnet.7

注:以上分别对head.cls_subnet.0~head.cls_subnet.7共8个层生成Grad-CAM图,这8层分别对应retinanet分类子网络的4层卷积feature map及ReLu激活后的feature map

总结

a) retinanet的Grad-CAM图效果都不算好,相对来说中间层head.cls_subnet.2~head.cls_subnet.4相对好一点

b) 个人认为retinanet效果不要的原因是,retinanet最后的分类是卷积层,卷积核实3*3,也就是说反向传播到最后一个卷积层的feature map上,只有3*3个单元有梯度。而分类网络或者faster r-cnn分类都是全连接层,感受全局信息,最后一个卷积层的feature map上所有单元都有梯度。

c) 反向传播到浅层的feature map上,有梯度的单元会逐渐增加,但是就像Grad-CAM论文中说的,越浅层的feature map语义信息越弱,所以可以看到head.cls_subnet.0的CAM图效果很差。

目标检测-fcos

​ 在目标检测网络faster r-cnn和retinanet的Grad-CAM完成后,有位网友linsy-ai 问道怎样在fcos中实现Grad-CAM。fcos与retinanet基本类似,因为它们整体网络结构类似;这里使用AdelaiDet 工程中的fcos网络,以下是详细的过程:

AdelaiDet安装

a) 下载

  1. git clone https://github.com/aim-uofa/AdelaiDet.git

b) 安装

  1. cd AdelaiDet
  2. python setup.py build develop

注意:1. AdelaiDet安装依赖detectron2,需要首先安装$\color{red}{detectron2}$

​ 2. fcos的不支持CPU,只支持GPU,请确保在$\color{red}{GPU环境}$下安装和测试

测试

a) 预训练模型下载

  1. wget https://cloudstor.aarnet.edu.au/plus/s/glqFc13cCoEyHYy/download -O fcos_R_50_1x.pth

b) 测试Grad-CAM图像生成

​ 在本工程目录下执行如下命令:

  1. export CUDA_DEVICE_ORDER="PCI_BUS_ID"
  2. export CUDA_VISIBLE_DEVICES="0"
  3. python AdelaiDet/demo_fcos.py --config-file AdelaiDet/R_50_1x.yaml \
  4. --input ./examples/pic1.jpg \
  5. --layer-name proposal_generator.fcos_head.cls_tower.8 \
  6. --opts MODEL.WEIGHTS /path/to/fcos_R_50_1x.pth MODEL.DEVICE cuda

Grad-CAM结果

图像1 图像2 图像3 图像4
原图
预测边框
GradCAM-cls_tower.0
GradCAM-cls_tower.1
GradCAM-cls_tower.2
GradCAM-cls_tower.3
GradCAM-cls_tower.4
GradCAM-cls_tower.5
GradCAM-cls_tower.6
GradCAM-cls_tower.7
GradCAM-cls_tower.8
GradCAM-cls_tower.9
GradCAM-cls_tower.10
GradCAM-cls_tower.11
GradCAM++-cls_tower.0
GradCAM++-cls_tower.1
GradCAM++-cls_tower.2
GradCAM++-cls_tower.3
GradCAM++-cls_tower.4
GradCAM++-cls_tower.5
GradCAM++-cls_tower.6
GradCAM++-cls_tower.7
GradCAM++-cls_tower.8
GradCAM++-cls_tower.9
GradCAM++-cls_tower.10
GradCAM++-cls_tower.11

注:以上分别对proposal_generator.fcos_head.cls_tower..0~head.cls_subnet.11共12个层生成Grad-CAM图,这12层分别对应fcos分类子网络的4层卷积feature map、组标准化后的feature map及ReLu激活后的feature map

总结

​ 不总结了,看图效果吧!