📝
Awesome reviews
  • Welcome
  • Paper review
    • [2022 Spring] Paper review
      • RobustNet [Eng]
      • DPT [Kor]
      • DALL-E [Kor]
      • VRT: A Video Restoration Transformer [Kor]
      • Barbershop [Kor]
      • Barbershop [Eng]
      • REFICS [ENG]
      • Deep texture manifold [Kor]
      • SlowFast Networks [Kor]
      • SCAN [Eng]
      • DPT [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Eng]
      • Patch Cratf : Video Denoising by Deep Modeling and Patch Matching [Eng]
      • LAFITE: Towards Language-Free Training for Text-to-Image Generation [Kor]
      • RegSeg [Eng]
      • D-NeRF [Eng]
      • SimCLR [Kor]
      • LabOR [Kor]
      • LabOR [Eng]
      • SegFormer [Kor]
      • Self-Calibrating Neural Radiance Fields [Kor]
      • Self-Calibrating Neural Radiance Fields [Eng]
      • GIRAFFE [Kor]
      • GIRAFFE [Eng]
      • DistConv [Kor]
      • SCAN [Eng]
      • slowfastnetworks [Kor]
      • Nesterov and Scale-Invariant Attack [Kor]
      • OutlierExposure [Eng]
      • TSNs [Kor]
      • TSNs [Eng]
      • Improving the Transferability of Adversarial Samples With Adversarial Transformations [Kor]
      • VOS: OOD detection by Virtual Outlier Synthesis [Kor]
      • MultitaskNeuralProcess [Kor]
      • RSLAD [Eng]
      • Deep Learning for 3D Point Cloud Understanding: A Survey [Eng]
      • BEIT [Kor]
      • Divergence-aware Federated Self-Supervised Learning [Eng]
      • NeRF-W [Kor]
      • Learning Multi-Scale Photo Exposure Correction [Eng]
      • ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [Eng]
      • ViT [Eng]
      • CrossTransformer [Kor]
      • NeRF [Kor]
      • RegNeRF [Kor]
      • Image Inpainting with External-internal Learning and Monochromic Bottleneck [Eng]
      • CLIP-NeRF [Kor]
      • CLIP-NeRF [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Kor]
      • DatasetGAN [Eng]
      • MOS [Kor]
      • MOS [Eng]
      • PlaNet [Eng]
      • MAE [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Eng]
      • Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning [Kor]
      • PointNet [Kor]
      • PointNet [Eng]
      • MSD AT [Kor]
      • MM-TTA [Kor]
      • MM-TTA [Eng]
      • M-CAM [Eng]
      • MipNerF [Kor]
      • The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [Eng]
      • Calibration [Eng]
      • CenterPoint [Kor]
      • YOLOX [Kor]
    • [2021 Fall] Paper review
      • DenseNet [Kor]
      • Time series as image [Kor]
      • mem3d [Kor]
      • GraSP [Kor]
      • DRLN [Kor]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Eng]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Kor]
      • NeSyXIL [Kor]
      • NeSyXIL [Eng]
      • RCAN [Kor]
      • RCAN [Eng]
      • MI-AOD [Kor]
      • MI-AOD [Eng]
      • DAFAS [Eng]
      • HyperGAN [Eng]
      • HyperGAN [Kor]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Eng]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Kor]
      • UPFlow [Eng]
      • GFP-GAN [Kor]
      • Federated Contrastive Learning [Kor]
      • Federated Contrastive Learning [Eng]
      • BGNN [Kor]
      • LP-KPN [Kor]
      • Feature Disruptive Attack [Kor]
      • Representative Interpretations [Kor]
      • Representative Interpretations [Eng]
      • Neural Discrete Representation Learning [KOR]
      • Neural Discrete Representation Learning [ENG]
      • Video Frame Interpolation via Adaptive Convolution [Kor]
      • Separation of hand motion and pose [kor]
      • pixelNeRF [Kor]
      • pixelNeRF [Eng]
      • SRResNet and SRGAN [Eng]
      • MZSR [Kor]
      • SANforSISR [Kor]
      • IPT [Kor]
      • Swin Transformer [kor]
      • CNN Cascade for Face Detection [Kor]
      • CapsNet [Kor]
      • Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Kor]
      • CSRNet [Kor]
      • ScrabbleGAN [Kor]
      • CenterTrack [Kor]
      • CenterTrack [Eng]
      • STSN [Kor]
      • STSN [Eng]
      • VL-BERT:Visual-Linguistic BERT [Kor]
      • VL-BERT:Visual-Linguistic BERT [Eng]
      • Squeeze-and-Attention Networks for Semantic segmentation [Kor]
      • Shot in the dark [Kor]
      • Noise2Self [Kor]
      • Noise2Self [Eng]
      • Dynamic Head [Kor]
      • PSPNet [Kor]
      • PSPNet [Eng]
      • CUT [Kor]
      • CLIP [Eng]
      • Local Implicit Image Function [Kor]
      • Local Implicit Image Function [Eng]
      • MetaAugment [Eng]
      • Show, Attend and Tell [Kor]
      • Transformer [Kor]
      • DETR [Eng]
      • Multimodal Versatile Network [Eng]
      • Multimodal Versatile Network [Kor]
      • BlockDrop [Kor]
      • MDETR [Kor]
      • MDETR [Eng]
      • FSCE [Kor]
      • waveletSR [Kor]
      • DAN-net [Eng]
      • Boosting Monocular Depth Estimation [Eng]
      • Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow [Kor]
      • Syn2real-generalization [Kor]
      • Syn2real-generalization [Eng]
      • GPS-Net [Kor]
      • Frustratingly Simple Few Shot Object Detection [Eng]
      • DCGAN [Kor]
      • RealSR [Kor]
      • AMP [Kor]
      • AMP [Eng]
      • RCNN [Kor]
      • MobileNet [Eng]
  • Author's note
    • [2022 Spring] Author's note
      • Pop-Out Motion [Kor]
    • [2021 Fall] Author's note
      • Standardized Max Logits [Eng]
      • Standardized Max Logits [Kor]
  • Dive into implementation
    • [2022 Spring] Implementation
      • Supervised Contrastive Replay [Kor]
      • Pose Recognition with Cascade Transformers [Eng]
    • [2021 Fall] Implementation
      • Diversity Input Method [Kor]
        • Source code
      • Diversity Input Method [Eng]
        • Source code
  • Contributors
    • [2022 Fall] Contributors
    • [2021 Fall] Contributors
  • How to contribute?
    • (Template) Paper review [Language]
    • (Template) Author's note [Language]
    • (Template) Implementation [Language]
  • KAIST AI
Powered by GitBook
On this page
  • 1. Problem definition
  • 2. Motivation
  • Related Work
  • Idea
  • 3. Method
  • Setting
  • Representative Interpretations
  • Finding Representative Interpretations
  • Submodular Cost Submodular Cover problem
  • Ranking Similar Images
  • 4. Experiment & Result
  • Experimental setup
  • Result
  • 5. Conclusion
  • Take home message
  • Author / Reviewer information
  • Author
  • Reviewer
  • Reference & Additional materials

Was this helpful?

  1. Paper review
  2. [2021 Fall] Paper review

Representative Interpretations [Eng]

Lam et al. / Finding Representative Interpretations on Convolutional Neural Networks / ICCV 2021

PreviousRepresentative Interpretations [Kor]NextNeural Discrete Representation Learning [KOR]

Last updated 3 years ago

Was this helpful?

한국어로 쓰인 리뷰를 읽으려면 를 누르세요.

1. Problem definition

Despite the success of deep learning models on various tasks, there is a lack of interpretability to understand the decision logic behind deep learning models. In fields where decision-making is critical to results, such as process systems and healthcare, it is hard to use models that lack interpretability due to reliability issues. It requires sufficient interpretability to make deep learning models widely applicable.

In this paper, the authors propose a new framework to interpret the decision-making process of Deep convolutional neural networks(CNNs) which are the basic architecture of many deep learning models. The goal is to develop representative interpretations of a trained CNN to reveal the common semantics data that contribute to many closely related predictions.

How can we find such representative interpretations of a trained CNN? Before reviewing the details, I introduce the summary for the paper.

  1. Consider a function that maps the feature map produced by the last convolutional layer to the logits that denote the final decision.

  2. Since this function is a piecewise linear function, it applies different decision logics for regions separated by linear boundaries.

  3. For each image, the authors propose to solve the optimization problem to construct a subset of linear boundaries that provides good representative interpretations.

[Opinion]

It is reasonable to determine a CNN model with ReLU activation functions as a target to interpret, since this architecture has been sufficiently validated to perform well. Furthermore, the proposed method use optimization rather than heuristics so that it can give trustworthy solutions.

2. Motivation

Related Work

There are various types of existing interpretation methods for CNNs.

  1. Conceptual interpretation methods

    • e.g.

    • identify a set of concepts that contribute to the predictions on a pre-defined group of conceptually similar images.

    • These methods require sophisticated customization on deep neural networks.

  2. Example-based methods

    • e.g. ,

    • Find exemplar images to interpret the decision of a deep neural network.

    • Prototype-based methods summarize the entire model using a small number of instances as prototypes.

    • The selection of prototypes considers little about the decision process of the model.

Idea

In this paper, the goal is to provide representative interpretations in a general CNN model by considering decision boundaries.

  • Find the linear decision boundaries of the convex polytopes that encode the decision logic of a trained CNN.

  • This problem can be formulated as a co-clustering problem. The co-clustering problem means that it finds one cluster for the set of similar images and the other cluster for the set of linear boundaries that cover the similar images.

  • Convert the co-clustering problem into a submodular cost submodular cover (SCSC) problem to make the problem feasible.

3. Method

Setting

Consider image classification using a CNN with ReLU activation functions.

  • X\cal{X}X: the space of images

  • CCC: the number of classes

  • F:X→RCF:\mathcal{X}\rightarrow\mathbb{R}^CF:X→RC: a trained CNN, and Class(x)=arg max⁡iFi(x)Class(x)=\argmax_i F_i(x)Class(x)=argmaxi​Fi​(x)

  • a set of reference images R⊆XR\subseteq\mathcal{X}R⊆X

  • ψ(x)\psi(x)ψ(x): the feature map produced by the last convolutional layer of FFF

  • Ω={ψ(x)  ∣  x∈X}\Omega=\{\psi(x)\;|\;x\in\mathcal{X} \}Ω={ψ(x)∣x∈X} the space of feature maps

  • G:Ω→RCG:\Omega\rightarrow\mathbb{R}^CG:Ω→RC, the mapping from the feature map ψ(x)\psi(x)ψ(x) to Class(x)Class(x)Class(x)

  • P\mathcal{P}P: the set of the linear boundaries (hyperplanes) of GGG

Reference images denote unlabeled images that we want to interpret by this method.

Representative Interpretations

Before formulating our problem, we have to specify a goal to find representative interpretations.

[Representative interpretation]

  • For an input image x∈Xx\in\mathcal{X}x∈X, a representative interpretation on xxx is an interpretation that reveals the common decision logic of FFF.

  • It is a general approach to explain a decision logic by using GGG, which is the function from the feature map of the last convolutional layer to the class of xxx, when analyzing predictions of a trained DNN.

[Linear boundaries]

  • The decision logic of GGG can be characterized by a piecewise linear decision boundary that consists of connected pieces of decision hyperplanes. Denote the set of linear boundaries of GGG by P\cal{P}P.

  • The linear boundaries in P\cal PP partition the space of feature maps Ω\OmegaΩ into a large number of convex polytopes. Each convex polytope defines a decision region that predicts all the images contained in the region to be the same class.

  • However, not all convex polytopes play an important role in distinguishing labels. Therefore, finding a good decision region, which is a subset of P\cal PP and includes xxx, provides a representative interpretation. That is, the goal is to find a good representative interpretation P(x)⊆PP(x)\subseteq\mathcal{P}P(x)⊆P.

[Goal]

For an input image xxx, find a representative interpretation that provides a good decision region P(x)⊆PP(x)\subseteq\mathcal{P}P(x)⊆P.

Finding Representative Interpretations

What is a 'good' representative interpretation? It requires two conditions:

  1. Maximize the representativeness of P(x)P(x)P(x).

    → A decision region P(x)P(x)P(x) has to cover a large number of reference images.

    → maximize ∣P(x)∩R∣|P(x)\cap R|∣P(x)∩R∣

  2. Avoid covering images in different classes.

    → ∣P(x)∩D(x)∣=0|P(x)\cap D(x)|=0∣P(x)∩D(x)∣=0 where D(x)={x′∈R  ∣  Class(x′)≠Class(x)}D(x)=\{x'\in R\;|\;Class(x')\neq Class(x)\}D(x)={x′∈R∣Class(x′)=Class(x)}

It can be formulated as the following optimization problem. The authors call this problem as the co-clustering problem, since it finds one cluster for the set of similar images and the other cluster for the set of linear boundaries that cover the similar images simultaneously.

[Co-clustering Problem]

max⁡P(x)⊆P∣P(x)∩R∣s.t.∣P(x)∩D(x)∣=0\max_{P(x)\subseteq\mathcal{P}}|P(x)\cap R|\\ \mathsf{s.t.}\quad|P(x)\cap D(x)|=0P(x)⊆Pmax​∣P(x)∩R∣s.t.∣P(x)∩D(x)∣=0

However, a set optimization problem such as the co-clustering problem is computationally complex to optimize. Therefore, in this paper, the authors:

  1. sample Q\cal QQ from P\cal PP to reduce the size;

  2. define submodular optimization problem to make the problem feasible.

What is Submodular Optimization?

  • A set optimization problem that finds the optimal subset from candidates is computationally complex, since the computational cost increases exponentially as the number of candidates increases.

  • When the objective function satisfies submodularity, the greedy algorithm achieves at least a constant fraction of the objective value obtained by the optimal solution.

  • Therefore, submodular optimization makes a set optimization problem feasible with guaranteeing a sufficiently good performance.

[Opinion]

Even though the authors randomly sampled linear boundaries for Q\cal QQ to reduce complexity, we should verify whether important linear boundaries are omitted or not.

Submodular Cost Submodular Cover problem

[SCSC Problem]

max⁡P(x)⊆Q∣P(x)∩R∣s.t.∣P(x)∩D(x)∣≤δ\max_{P(x)\subseteq\mathcal{Q}}|P(x)\cap R|\\ \mathsf{s.t.}\quad|P(x)\cap D(x)|\leq\deltaP(x)⊆Qmax​∣P(x)∩R∣s.t.∣P(x)∩D(x)∣≤δ
  • Due to sampling, the images covered in the same convex polytope may not be predicted by FFF as the same class → Relax the constraint ∣P(x)∩D(x)∣=0|P(x)\cap D(x)|=0∣P(x)∩D(x)∣=0 into ∣P(x)∩D(x)∣≤δ|P(x)\cap D(x)|\leq\delta∣P(x)∩D(x)∣≤δ.

  • Finally, the SCSC problem can be solved by iteratively selecting a linear boundary through the following greedy algorithm.

Ranking Similar Images

Define a new semantic distance to evaluate images x′∈P(x)x'\in P(x)x′∈P(x).

[Semantic Distance]

Dist(x.x′)=∑h∈P(x)∣⟨W→h,ψ(x)⟩−⟨W→h,ψ(x′)⟩∣Dist(x.x')=\sum_{\mathbf{h}\in P(x)}\Big\vert \langle \overrightarrow{W}_\mathbf{h},\psi(x)\rangle -\langle \overrightarrow{W}_\mathbf{h},\psi(x')\rangle \Big\vertDist(x.x′)=h∈P(x)∑​​⟨Wh​,ψ(x)⟩−⟨Wh​,ψ(x′)⟩​
  • W→h\overrightarrow{W}_\mathbf{h}Wh​ is the normal vector of the hyperplane of a linear boundary h∈P(x)\mathbf{h}\in P(x)h∈P(x).

  • That is, it measures how far x′x'x′ is from xxx in terms of hyperplanes in P(x)P(x)P(x). Unlike the Euclidean distance, it can quantify the distance between x′x'x′ and xxx in terms of the decision region.

  • Rank the images covered by P(x)P(x)P(x) according to their semantic distance to xxx in ascending order.

The figure describes the difference between the semantic distance and the Euclidean distance. Even though the Euclidean distances are the same, the semantic distance between x2x_2x2​ and xxx is larger than the case of x1x_1x1​ in terms of the decision region.

4. Experiment & Result

Experimental setup

The authors compare representative interpretation (RI) method with Automatic Concept-based Explanation(ACE), CAM-based methods(Grad-CAM, Grad-CAM++, Score-CAM).

  • Apply sampling with ∣Q∣=50|\mathcal{Q}|=50∣Q∣=50.

  • Such methodologies use channel weights to provide interpretability. Reuse the channel weights computed from the input image x∈Xx\in\mathcal{X}x∈X, and follow the same heat map to generate the interpretation for xnewx_{new}xnew​. Compare the results from the methodologies.

    • In the case of RI, use the semantic distance to find a set of similar images xnewx_{new}xnew​.

    • In the other cases, use the Euclidean distance in the space of Ω\OmegaΩ to find a set of similar images xnewx_{new}xnew​.

  • Dataset: Gender Classification (GC), ASIRRA, Retinal OCT Images (RO), FOOD datasets

  • Target model: VGG-19

Result

Case Study

  • This experiment evaluates if each method provides a proper interpretation for similar images.

  • The first row shows the result retrieved by RI method. Unlike the other methods, the heat maps in images indicate consistent semantics in the images.

  • RI method can successfully find the interpretation for the input image, as well as a set of images sharing the same interpretation.

Quantitative Experiment

In this experiment, the authors quantitatively evaluate how computed interpretations can be used to classify unseen dataset. The following two measures:

[Average Drop (AD)]

1∣S∣∑e∈Smax⁡(0,Yc(e)−Yc(e′))Yc(e)\frac{1}{|S|}\sum_{e\in S}\frac{\max(0,Y_c(e)-Y_c(e'))}{Y_c(e)}∣S∣1​e∈S∑​Yc​(e)max(0,Yc​(e)−Yc​(e′))​

[Average Increase (AI)]

1∣S∣∑e∈S1Yc(e)<Yc(e′)\frac{1}{|S|}\sum_{e\in S}\mathbb{1}_{Y_c(e)<Y_c(e')}∣S∣1​e∈S∑​1Yc​(e)<Yc​(e′)​
  • S⊆XS\subseteq \mathcal{X}S⊆X: a set of unseen images

  • Yc(e)Y_c(e)Yc​(e): the prediction score for class ccc in an image e∈Se\in Se∈S

  • e′e'e′: a masked image produced by keeping 20% of the most important pixels in eee

When keeping only important pixels in the image, AD indicates a decrease in accuracy and AI indicates the percentage of samples that increases in accuracy. A small mean AD(mAD) and a large mean AI(mAI) say that the interpretation can be validly reused to accurately identify important regions on the unseen images in SSS. In the figure, we can see that RI method achieves the best performances in most cases.

5. Conclusion

  • In this paper, a co-clustering problem is formulated to interpret the decision-making process of CNN by considering decision boundaries.

  • To solve the co-clustering problem, the greedy algorithm can be applied by converting it into the SCSC problem.

  • It has been experimentally shown that proposed representative interpretations reflect common semantics in the unseen images.

Take home message

As deep neural networks have been widely used in various fields, it is more important to interpret a decision logic of DNNs. In this spirit, it is impressive to suggest representative interpretations by considering decision boundaries and I hope to extend such studies further.

Author / Reviewer information

Author

장원준 (Wonjoon Chang)

  • KAIST AI, Statistical Artificial Intelligence Lab.

  • one_jj@kaist.ac.kr

  • Research Topics: Explainable AI, Time series analysis.

  • https://github.com/onejoon

Reviewer

Reference & Additional materials

  1. Lam, P. C. H., Chu, L., Torgonskiy, M., Pei, J., Zhang, Y., & Wang, L. (2021). Finding representative interpretations on convolutional neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision.

  2. Ghorbani, A., Wexler, J., Zou, J., & Kim, B. (2019). Towards automatic concept-based explanations.

  3. Kim, B., Khanna, R., & Koyejo, O. O. (2016). Examples are not enough, learn to criticize! criticism for interpretability. Advances in neural information processing systems, 29.

  4. Cho, S., Chang, W., Lee, G., & Choi, J. (2021, August). Interpreting Internal Activation Patterns in Deep Temporal Neural Networks by Finding Prototypes. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.

  5. Chu, L., Hu, X., Hu, J., Wang, L., & Pei, J. (2018, July). Exact and consistent interpretation for piecewise linear neural networks: A closed form solution. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

Since GGG is a piecewise linear function, it applies different decision logics for regions separated by linear boundaries. I recommend reading to understand the details.

Submodularity requires diminishing return property. You can check the details .

We can construct a set of linear boundaries P\cal PP from function GGG by . Then, sample a subset of linear boundaries Q\cal QQ from P\cal PP.

This formulation satisfies conditions for submodular cost and submodular cover. You can check it in Appendix A of .

여기
Automated Concept-based Explanation (ACE)
MMD-critic
Prototypes of Temporally Activated Patterns (PTAP)
the paper
here
the method introduced in this paper
the paper
Wikipedia: Submodular set function
Decision logic of a CNN
Finding the optimal subset of linear boundaries
The greedy algorithm to find representative interpretations.
Semantic distances are different in terms of the decision region.
A case study on the GC dataset.
Quantative results.