๐Ÿ“
Awesome reviews
  • Welcome
  • Paper review
    • [2022 Spring] Paper review
      • RobustNet [Eng]
      • DPT [Kor]
      • DALL-E [Kor]
      • VRT: A Video Restoration Transformer [Kor]
      • Barbershop [Kor]
      • Barbershop [Eng]
      • REFICS [ENG]
      • Deep texture manifold [Kor]
      • SlowFast Networks [Kor]
      • SCAN [Eng]
      • DPT [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Eng]
      • Patch Cratf : Video Denoising by Deep Modeling and Patch Matching [Eng]
      • LAFITE: Towards Language-Free Training for Text-to-Image Generation [Kor]
      • RegSeg [Eng]
      • D-NeRF [Eng]
      • SimCLR [Kor]
      • LabOR [Kor]
      • LabOR [Eng]
      • SegFormer [Kor]
      • Self-Calibrating Neural Radiance Fields [Kor]
      • Self-Calibrating Neural Radiance Fields [Eng]
      • GIRAFFE [Kor]
      • GIRAFFE [Eng]
      • DistConv [Kor]
      • SCAN [Eng]
      • slowfastnetworks [Kor]
      • Nesterov and Scale-Invariant Attack [Kor]
      • OutlierExposure [Eng]
      • TSNs [Kor]
      • TSNs [Eng]
      • Improving the Transferability of Adversarial Samples With Adversarial Transformations [Kor]
      • VOS: OOD detection by Virtual Outlier Synthesis [Kor]
      • MultitaskNeuralProcess [Kor]
      • RSLAD [Eng]
      • Deep Learning for 3D Point Cloud Understanding: A Survey [Eng]
      • BEIT [Kor]
      • Divergence-aware Federated Self-Supervised Learning [Eng]
      • NeRF-W [Kor]
      • Learning Multi-Scale Photo Exposure Correction [Eng]
      • ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [Eng]
      • ViT [Eng]
      • CrossTransformer [Kor]
      • NeRF [Kor]
      • RegNeRF [Kor]
      • Image Inpainting with External-internal Learning and Monochromic Bottleneck [Eng]
      • CLIP-NeRF [Kor]
      • CLIP-NeRF [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Kor]
      • DatasetGAN [Eng]
      • MOS [Kor]
      • MOS [Eng]
      • PlaNet [Eng]
      • MAE [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Eng]
      • Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning [Kor]
      • PointNet [Kor]
      • PointNet [Eng]
      • MSD AT [Kor]
      • MM-TTA [Kor]
      • MM-TTA [Eng]
      • M-CAM [Eng]
      • MipNerF [Kor]
      • The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [Eng]
      • Calibration [Eng]
      • CenterPoint [Kor]
      • YOLOX [Kor]
    • [2021 Fall] Paper review
      • DenseNet [Kor]
      • Time series as image [Kor]
      • mem3d [Kor]
      • GraSP [Kor]
      • DRLN [Kor]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Eng]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Kor]
      • NeSyXIL [Kor]
      • NeSyXIL [Eng]
      • RCAN [Kor]
      • RCAN [Eng]
      • MI-AOD [Kor]
      • MI-AOD [Eng]
      • DAFAS [Eng]
      • HyperGAN [Eng]
      • HyperGAN [Kor]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Eng]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Kor]
      • UPFlow [Eng]
      • GFP-GAN [Kor]
      • Federated Contrastive Learning [Kor]
      • Federated Contrastive Learning [Eng]
      • BGNN [Kor]
      • LP-KPN [Kor]
      • Feature Disruptive Attack [Kor]
      • Representative Interpretations [Kor]
      • Representative Interpretations [Eng]
      • Neural Discrete Representation Learning [KOR]
      • Neural Discrete Representation Learning [ENG]
      • Video Frame Interpolation via Adaptive Convolution [Kor]
      • Separation of hand motion and pose [kor]
      • pixelNeRF [Kor]
      • pixelNeRF [Eng]
      • SRResNet and SRGAN [Eng]
      • MZSR [Kor]
      • SANforSISR [Kor]
      • IPT [Kor]
      • Swin Transformer [kor]
      • CNN Cascade for Face Detection [Kor]
      • CapsNet [Kor]
      • Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Kor]
      • CSRNet [Kor]
      • ScrabbleGAN [Kor]
      • CenterTrack [Kor]
      • CenterTrack [Eng]
      • STSN [Kor]
      • STSN [Eng]
      • VL-BERT:Visual-Linguistic BERT [Kor]
      • VL-BERT:Visual-Linguistic BERT [Eng]
      • Squeeze-and-Attention Networks for Semantic segmentation [Kor]
      • Shot in the dark [Kor]
      • Noise2Self [Kor]
      • Noise2Self [Eng]
      • Dynamic Head [Kor]
      • PSPNet [Kor]
      • PSPNet [Eng]
      • CUT [Kor]
      • CLIP [Eng]
      • Local Implicit Image Function [Kor]
      • Local Implicit Image Function [Eng]
      • MetaAugment [Eng]
      • Show, Attend and Tell [Kor]
      • Transformer [Kor]
      • DETR [Eng]
      • Multimodal Versatile Network [Eng]
      • Multimodal Versatile Network [Kor]
      • BlockDrop [Kor]
      • MDETR [Kor]
      • MDETR [Eng]
      • FSCE [Kor]
      • waveletSR [Kor]
      • DAN-net [Eng]
      • Boosting Monocular Depth Estimation [Eng]
      • Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow [Kor]
      • Syn2real-generalization [Kor]
      • Syn2real-generalization [Eng]
      • GPS-Net [Kor]
      • Frustratingly Simple Few Shot Object Detection [Eng]
      • DCGAN [Kor]
      • RealSR [Kor]
      • AMP [Kor]
      • AMP [Eng]
      • RCNN [Kor]
      • MobileNet [Eng]
  • Author's note
    • [2022 Spring] Author's note
      • Pop-Out Motion [Kor]
    • [2021 Fall] Author's note
      • Standardized Max Logits [Eng]
      • Standardized Max Logits [Kor]
  • Dive into implementation
    • [2022 Spring] Implementation
      • Supervised Contrastive Replay [Kor]
      • Pose Recognition with Cascade Transformers [Eng]
    • [2021 Fall] Implementation
      • Diversity Input Method [Kor]
        • Source code
      • Diversity Input Method [Eng]
        • Source code
  • Contributors
    • [2022 Fall] Contributors
    • [2021 Fall] Contributors
  • How to contribute?
    • (Template) Paper review [Language]
    • (Template) Author's note [Language]
    • (Template) Implementation [Language]
  • KAIST AI
Powered by GitBook
On this page
  • 1. Problem definition
  • CLEVER-Hans
  • 2. Motivation
  • Related work
  • Motivating Example: Color-MNIST
  • Idea
  • 3. Method
  • 4. Dataset & Experiment & Result
  • Dataset: CLEVER-Hans
  • Experimental setting
  • Result
  • 5. Conclusion
  • Take home message (์˜ค๋Š˜์˜ ๊ตํ›ˆ)
  • Author / Reviewer information
  • Author
  • Reviewer
  • Reference & Additional materials

Was this helpful?

  1. Paper review
  2. [2021 Fall] Paper review

NeSyXIL [Kor]

Stammer et al. / Right for the Right Concept - Revising Neuro-Symbolic Concepts by Interacting with their Explanations / CVPR 2021

PreviousVinVL: Revisiting Visual Representations in Vision-Language Models [Kor]NextNeSyXIL [Eng]

Last updated 3 years ago

Was this helpful?

of this article is available.

1. Problem definition

  • ๋”ฅ๋Ÿฌ๋‹ ๋ถ„์•ผ์˜ ์„ค๋ช… ๋ฐฉ๋ฒ•์€ ๋Œ€๋ถ€๋ถ„ ์›๋ž˜ ์ž…๋ ฅ ๊ณต๊ฐ„์— ๋Œ€ํ•œ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์— ๋Œ€ํ•œ ์ค‘์š”๋„๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

  • ์ด๋Ÿฌํ•œ "์‹œ๊ฐ์ " ์„ค๋ช…์€ ๋ชจ๋ธ์ด ์–ด๋–ป๊ฒŒ ๊ฒฐ์ •ํ•˜๊ณ  ์žˆ๋Š”์ง€์— ๋Œ€ํ•œ ์˜๋ฏธ๋ก ์ ์ธ ์„ค๋ช…์„ ํ•˜์ง€ ์•Š๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

  • CLEVER-Hans์™€ ๊ฐ™์ด ๋ชจ๋ธ์ด ์ž˜๋ชป๋œ ๊ฒฐ์ •์„ ๋ฐฐ์šธ ๋•Œ, ์˜๋ฏธ๋ก ์ ์ธ ์„ค๋ช…์„ ํ†ตํ•ด ๋ชจ๋ธ์˜ ํ–‰๋™์— ๊ฐœ์ž…ํ•˜๋Š” ๊ฒƒ์ด ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

  • Grad-CAM๊ณผ ๊ฐ™์€ "์‹œ๊ฐ์ " ์„ค๋ช…์„ ํ™œ์šฉํ•ด ํ”ผ๋“œ๋ฐฑ์„ ์ฃผ๋Š” ๋ฐฉ์‹์ด ์•„๋‹ˆ๋ผ "์˜๋ฏธ๋ก ์ ์ธ" ํ”ผ๋“œ๋ฐฑ์„ ํ†ตํ•ด ๋ชจ๋ธ์— ๊ฐœ์ž…์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

CLEVER-Hans

  • https://simple.wikipedia.org/wiki/Clever_Hans

  • ML ๋ชจ๋ธ๋“ค์€ ํŠน์ • ํƒœ์Šคํฌ๋ฅผ ํ’€๊ธฐ ์œ„ํ•ด ์ž˜๋ชป๋œ feature๋ฅผ ํ•™์Šตํ•˜์ง€๋งŒ ์„ฑ๋Šฅ์€ ์ข‹์€ ๊ฒฝ์šฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฐ™์€ ์ˆœ๊ฐ„๋“ค์„ Clever-Hans Moment๋ผ๊ณ  ์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์ง€์นญํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

2. Motivation

Related work

  • ์„ค๋ช… ๊ฐ€๋Šฅํ•œ ์ธ๊ณต์ง€๋Šฅ (Explainable AI, XAI)

    • ์ผ๋ฐ˜์ ์œผ๋กœ ์„ค๋ช… ๊ฐ€๋Šฅํ•œ ์ธ๊ณต์ง€๋Šฅ (XAI) ๋ฐฉ๋ฒ•๋ก ์€ ๋ชจ๋ธ์˜ ์„ค๋ช…์„ ์‚ฌ๋žŒ์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด์–ด, (๋ธ”๋ž™๋ฐ•์Šค ํ˜•ํƒœ) ๋ชจ๋ธ์˜ ๊ฒฐ์ •์— ๋Œ€ํ•œ ์ด์œ ๋ฅผ ๊ฒ€์ฆํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

    • ๋‹ค์–‘ํ•œ XAI ๋ฐฉ๋ฒ•๋ก ์ด ์žˆ์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„ ์„ค๋ช…ํ•˜๋Š” ๋ฐฉ์‹์ด ์ž…๋ ฅ ๊ณต๊ฐ„ ๋ ˆ๋ฒจ์—์„œ ์ด๋ฃจ์–ด์ง€๋Š” ์„ค๋ช…์„ ์‹œ๊ฐํ™”ํ•˜๋Š” ํ˜•ํƒœ๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.

      • [52, 2]: ์—ญ์ „ํŒŒ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ž…๋ ฅ ๊ณต๊ฐ„์— ๋Œ€ํ•ด ์‹œ๊ฐ์ ์œผ๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

      • [28]: ๋ชจ๋ธ์˜ ๊ฒฐ์ •์— ๋Œ€ํ•œ ์„ค๋ช…์„ prototype ํ˜•ํƒœ๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

    • ๊ทธ๋Ÿฌ๋‚˜ ์‹œ๊ฐ์ ์œผ๋กœ ์„ค๋ช…ํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค๊ณผ๋Š” ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์œผ๋กœ ์„ค๋ช…์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” ์—ฐ๊ตฌ๋“ค๋„ ์žˆ์ง€๋งŒ, ์–ด๋–ค ์—ฐ๊ตฌ๋„ ๋ชจ๋ธ์˜ ๊ฐœ์ž…์„ ์œ„ํ•œ ์ˆ˜๋‹จ์œผ๋กœ์จ ์„ค๋ช…์„ ํ™œ์šฉํ•˜๋Š” ์—ฐ๊ตฌ๋Š” ์—†์—ˆ์Šต๋‹ˆ๋‹ค.

  • ์„ค๋ช…์„ ํ†ตํ•œ ๋Œ€ํ™”ํ˜• ํ•™์Šต (Explanatory interactive learning, XIL)

    • XIL = XAI + Active Learning

    • ํ•™์Šต ๊ณผ์ •์—์„œ ๋ชจ๋ธ์ด ๊ฒฐ์ •ํ•œ ์„ค๋ช…๋“ค๊ณผ ๊ต๋ฅ˜ํ•˜๋ฉด์„œ ์‚ฌ์šฉ์ž์™€ XAI๊ฐ€ ํ˜‘๋ ฅํ•˜๋Š” ํ˜•ํƒœ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • ์‚ฌ์šฉ์ž๋Š” XIL์„ ํ†ตํ•ด ๋ชจ๋ธ์ด ๊ฒฐ์ •ํ•œ ์ด์œ ๋ฅผ ๋ฌผ์–ด๋ณผ ์ˆ˜ ์žˆ๊ณ , ํ•„์š”ํ•œ ๊ฒฝ์šฐ ๋ชจ๋ธ์„ ์ˆ˜์ •ํ•  ์ˆ˜๋„ ์žˆ๊ณ , ์„ค๋ช…์— ๋Œ€ํ•ด ๊ฐœ์„ ๋œ ํ”ผ๋“œ๋ฐฑ์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋‰ด๋กœ-์‹ฌ๋ณผ๋ฆญ ์•„ํ‚คํ…์ฒ˜ (Neuro-Symbolic architectures)

    • ๋‰ด๋กœ-์‹ฌ๋ณผ๋ฆญ ๋ถ„์•ผ๋Š” ๊ธฐํ˜ธ์  (symbolic) ๋ชจ๋ธ๊ณผ ๋น„๊ธฐํ˜ธ์  (sub-symbolic) ๋ชจ๋ธ์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ฐœ๋ณ„ ํ•˜์œ„ ์‹œ์Šคํ…œ์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์ตœ๊ทผ ๋ช‡ ๋…„ ๋™์•ˆ ์ ์  ๋” ๊ด€์‹ฌ์„ ๋ฐ›๊ณ  ์žˆ๋Š” ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.

    • ๋‰ด๋กœ-์‹ฌ๋ณผ๋ฆญ ์•„ํ‚คํ…์ฒ˜๋Š” ํฌ๊ฒŒ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์ง•์„ ๊ฐ–๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

      • ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ (data-driven)

      • ๋น„๊ธฐํ˜ธ์ ์ธ ํ‘œํ˜„ (sub-symbolic representation)

      • ๊ธฐํ˜ธ์  ์ถ”๋ก  ์‹œ์Šคํ…œ (symbolic based reasoning system)

Motivating Example: Color-MNIST

  • ์ด ์—ฐ๊ตฌ์—์„œ ๋‹ค๋ฃจ๊ณ  ์žˆ๋Š” ํ•ต์‹ฌ์„ ์ „๋‹ฌํ•˜๊ธฐ ์œ„ํ•ด, ์ €์ž๋“ค์€ ์ž˜ ์•Œ๋ ค์ง„ Color-MNIST ๋ฐ์ดํ„ฐ์…‹์„ ํ†ตํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ColorMNIST๋Š” ๊ธฐ์กด MNIST ๋ฐ์ดํ„ฐ์…‹์— ์ƒ‰์ƒ์ด ์ถ”๊ฐ€๋œ ํ† ์ด ๋ฐ์ดํ„ฐ์…‹์ž…๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹ ๋‚ด ํ•™์Šต ์…‹์—์„œ, ๊ฐ ์ˆซ์ž๋Š” ํŠน์ • ์ƒ‰์ƒ์œผ๋กœ ์น ํ•ด์ ธ ์žˆ๋Š” ๋ฐ˜๋ฉด์—, ํ…Œ์ŠคํŠธ ์…‹์—์„œ๋Š” ์ƒ‰์ƒ์ด๋ผ๋Š” ์†์„ฑ์€ ์„ž์ด๊ฑฐ๋‚˜ ๋ฐ˜์ „๋ฉ๋‹ˆ๋‹ค.

  • ColorMNIST ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด, ๊ฐ„๋‹จํ•œ CNN ๋ชจ๋ธ์€ ํ›ˆ๋ จ ์…‹์—์„œ๋Š” 100% ์ •ํ™•๋„์— ๋„๋‹ฌํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ํ…Œ์ŠคํŠธ ์„ธํŠธ์—์„œ๋Š” 23%์— ๋ถˆ๊ณผํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ชจ๋ธ์ด ์ˆซ์ž ์ž์ฒด๋ณด๋‹ค ์ •ํ™•ํ•œ ์˜ˆ์ธก์„ ์œ„ํ•ด ์ƒ‰์ƒ์— ํฌ๊ฒŒ ์ง‘์ค‘ํ•˜๋Š” ๋ฒ•์„ ๋ฐฐ์› ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • Figure 2๋Š” 9๋กœ ์˜ˆ์ธก๋˜๋Š” 0์— ๋Œ€ํ•œ Grad-CAM ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”ํ•œ ๊ทธ๋ฆผ์ž…๋‹ˆ๋‹ค. ์ฆ‰, ๋ชจ๋ธ์€ ์™œ 0์ด๋ผ๋Š” ์ˆซ์ž๋ฅผ 9๋กœ ์˜ˆ์ธกํ–ˆ๋Š”๊ฐ€์— "์‹œ๊ฐ์  ์„ค๋ช…"์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋•Œ, 0์ด๋ผ๋Š” ํ˜•ํƒœ๋ฅผ ๋ชจ๋ธ์ด ์ง‘์ค‘ํ•ด์„œ ๋ณด๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์ง€๋งŒ, ์–ด๋–ค ์˜๋ฏธ๋ก ์ ์ธ ๊ทผ๊ฑฐ์— ์˜ํ•ด 9๋กœ ์˜ˆ์ธกํ–ˆ๋Š”์ง€ ์•Œ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

  • ๋”ฐ๋ผ์„œ, ์‹œ๊ฐ์  ์„ค๋ช…์„ ํ†ตํ•ด ๋ชจ๋ธ์ด ์˜ฌ๋ฐ”๋ฅธ ๋ฌผ์ฒด์— ์ดˆ์ ์„ ๋งž์ถ”๊ณ  ์žˆ๋‹ค๋Š” ๊ฒƒ์€ ๋ถ„๋ช…ํ•˜์ง€๋งŒ, ๋ชจ๋ธ์ด ์ž˜๋ชป๋œ ์ˆซ์ž๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ์ด์œ ๋Š” ๊ธฐ๋ณธ ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์— ๋Œ€ํ•œ ์ดํ•ด ์—†์ด๋Š” ๋ช…ํ™•ํ•˜์ง€ ์•Š์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • ์ค‘์š”ํ•œ ๊ฒƒ์€ ๋ชจ๋ธ์ด ํ‹€๋ฆฌ๋”๋ผ๋„, ์ด๋Ÿฌํ•œ ์‹œ๊ฐ์ ์ธ ์„ค๋ช…๋งŒ์„ ๋ฐ”ํƒ•์œผ๋กœ XIL์„ ์‚ฌ์šฉํ•ด ๋ชจ๋ธ์˜ ๊ฒฐ์ •์„ ์ˆ˜์ •ํ•˜๋Š” ๊ฒƒ์€ ๊ฝค ์‹ฌ๊ฐํ•œ ๋ฌธ์ œ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Idea

  • Color-MNIST ์˜ˆ์‹œ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, ์ €์ž๋“ค์€ ๊ธฐ์กด์˜ Grad-CAM๊ณผ ๊ฐ™์€ ์‹œ๊ฐ์ ์ธ ์ˆ˜์ค€์˜ ์„ค๋ช…๋งŒ์œผ๋กœ ๋ชจ๋ธ์˜ ๊ฒฐ์ •์„ ํ•ด์„ํ•˜๊ณ  ๋” ๋‚˜์•„๊ฐ€ ๊ฐœ์ž…ํ•˜๋Š” ๊ฒƒ์€ ํž˜๋“ค๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค.

  • ๋”ฐ๋ผ์„œ, ์˜๋ฏธ๋ก ์  (semantic) ์ˆ˜์ค€์—์„œ ๋ชจ๋ธ์„ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๋‰ด๋กœ-์‹ฌ๋ณผ๋ฆญ (Neuro-Symbolic) ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค.

  • ์ด ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋ชจ๋ธ์—๊ฒŒ "๊ฒฐ์ •์„ ๋‚ด๋ฆด ๋•Œ๋Š” ์ƒ‰์ƒ์„ ์ ˆ๋Œ€ ๋ณด๊ณ  ๊ฒฐ์ •ํ•˜์ง€ ๋ง๋ผ."๋ผ๋Š” ๋“ฑ์˜ ๊ฐœ์ž…์ด ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค.

3. Method

  • Neuro-Symbolic Architecture (Figure 3์˜ ์ƒ๋‹จ ๋ถ€๋ถ„)

    • ํฌ๊ฒŒ concept embedding module๊ณผ reasoning module 2๊ฐ€์ง€ ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

    • concept module์˜ ๊ฒฝ์šฐ, ์‚ฌ๋žŒ์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐํ˜ธ์ ์ธ ํ‘œํ˜„์œผ๋กœ ๋งคํ•‘ํ•˜๋Š” ์—ญํ• ์„ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

      • ํ•˜๋‚˜์˜ ์ž…๋ ฅ ์ด๋ฏธ์ง€ xiโˆˆXx_i \in Xxiโ€‹โˆˆX ์— ๋Œ€ํ•ด concept module์„ ํ†ตํ•ด h(xi)=z^ih(x_i) = \hat{z}_ih(xiโ€‹)=z^iโ€‹ ๋กœ ๋งคํ•‘๋ฉ๋‹ˆ๋‹ค.

      • ์ด ๋•Œ, z^iโˆˆ[0,1]Nร—D\hat{z}_i \in [0,1]^{N\times D}z^iโ€‹โˆˆ[0,1]Nร—D ๋Š” ํ•˜๋‚˜์˜ ๊ธฐํ˜ธ์ ์ธ ํ‘œํ˜„์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

    • ์ด๋ ‡๊ฒŒ ๋งคํ•‘๋œ ๊ธฐํ˜ธ์ ์ธ ํ‘œํ˜„์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๋ชจ๋ธ์„ ์˜ˆ์ธกํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด reasoning module์˜ ์—ญํ• ์ž…๋‹ˆ๋‹ค.

      • ํ•˜๋‚˜์˜ ์ž…๋ ฅ z^i\hat{z}_iz^iโ€‹์— ๋Œ€ํ•ด reasoning module์„ ํ†ตํ•ด g(z^i)=y^ig(\hat{z}_i)=\hat{y}_ig(z^iโ€‹)=y^โ€‹iโ€‹ ๋กœ ๋งคํ•‘๋ฉ๋‹ˆ๋‹ค.

      • ์ด ๋•Œ y^iโˆˆ[0,1]Nร—Nc\hat{y}_i \in [0,1]^{N\times N_c}y^โ€‹iโ€‹โˆˆ[0,1]Nร—Ncโ€‹ ๋Š” ์˜ˆ์ธก ๊ฒฐ๊ณผ๊ฐ’์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

    • ์—ฌ๊ธฐ์„œ X:=[x1,โ€ฆ,xN]โˆˆRNร—MX := [x_1, \dots, x_N] \in \mathbb{R}^{N\times M}X:=[x1โ€‹,โ€ฆ,xNโ€‹]โˆˆRNร—M ์ด๊ณ , XXX ๋Š” NcN_cNcโ€‹ ๊ฐœ์˜ ํด๋ž˜์Šค๋“ค์˜ ๋ถ€๋ถ„์ง‘ํ•ฉ์œผ๋กœ ๋‚˜๋‰˜์–ด์ง‘๋‹ˆ๋‹ค.

  • Retrieving Neuro-Symbolic Explanations (Figure 3์˜ ํšŒ์ƒ‰ ํ™”์‚ดํ‘œ)

    • concept embedding module๊ณผ reasoning module์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ฐ๊ฐ์— ๋งž๋Š” ์„ค๋ช…๋“ค์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • ๋ชจ๋“ˆ m(โ‹…)m(\cdot)m(โ‹…), ๊ทธ ๋ชจ๋“ˆ์˜ ์ž…๋ ฅ ๊ฐ’์œผ๋กœ sss๊ฐ€ ์ฃผ์–ด์ง€๊ณ , ๊ทธ๋ฆฌ๊ณ  ๋ชจ๋ธ์˜ ์ถœ๋ ฅ ๊ฐ’์œผ๋กœ ooo ๊ฐ€ ์ฃผ์–ด์ง„๋‹ค๊ณ  ํ–ˆ์„ ๋•Œ, ์„ค๋ช… ํ•จ์ˆ˜๋Š” E(m(โ‹…),o,s)E(m(\cdot), o, s)E(m(โ‹…),o,s) ๋ผ๊ณ  ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • reasoning module์˜ ๊ฒฝ์šฐ, Eg(g(โ‹…),y^i,zi)=:e^igE^g(g(\cdot), \hat{y}_i, z_i) = : \hat{e}^g_iEg(g(โ‹…),y^โ€‹iโ€‹,ziโ€‹)=:e^igโ€‹ ๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

      • ์—ฌ๊ธฐ์„œ, e^ig\hat{e}^g_ie^igโ€‹ ์˜ ๊ฒฝ์šฐ, ์ตœ์ข…์œผ๋กœ ์˜ˆ์ธกํ•œ ๊ฐ’์ธ y^i\hat{y}_iy^โ€‹iโ€‹ ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, reasoning module์˜ ์„ค๋ช…์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

      • Figure 3์—์„œ, Semantic Explanier๋ฅผ ํ†ตํ•ด ๋‚˜์˜จ ํšŒ์ƒ‰ ๊ฐ’์œผ๋กœ ํ•ด์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • concept embedding module์˜ ๊ฒฝ์šฐ, Eh(h(โ‹…),e^ig,xi)=:e^ihE^h(h(\cdot), \hat{e}^g_i, x_i) = : \hat{e}^h_iEh(h(โ‹…),e^igโ€‹,xiโ€‹)=:e^ihโ€‹ ๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

      • ์—ฌ๊ธฐ์„œ, e^ih\hat{e}^h_ie^ihโ€‹ ์˜ ๊ฒฝ์šฐ, reasoning module์˜ ์„ค๋ช…์ธ e^ig\hat{e}^g_ie^igโ€‹ ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, concept embedding module์˜ ์„ค๋ช…์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

      • Figure 3์—์„œ, Visual Explanier๋ฅผ ํ†ตํ•ด ๋‚˜์˜จ ํšŒ์ƒ‰ ๊ฐ’์œผ๋กœ ํ•ด์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • Neuro-Symbolic Concepts

    • explanatory loss term

      • Lexpl=ฮปโˆ‘i=1Nr(Aiv,e^ih)+(1โˆ’ฮป)โˆ‘i=1Nr(Ais,e^ig)L_{expl} = \lambda \sum_{i=1}^N r(A_i^v, \hat{e}^h_i) + (1-\lambda) \sum_{i=1}^N r(A_i^s, \hat{e}^g_i)Lexplโ€‹=ฮปโˆ‘i=1Nโ€‹r(Aivโ€‹,e^ihโ€‹)+(1โˆ’ฮป)โˆ‘i=1Nโ€‹r(Aisโ€‹,e^igโ€‹)

      • r(โ‹…,โ‹…)r(\cdot, \cdot)r(โ‹…,โ‹…): regularization function (e.g. RRR, HINT)

      • AivA^v_iAivโ€‹: "visual feeback"์„ ์˜๋ฏธํ•˜๊ณ , ์ž…๋ ฅ ๊ณต๊ฐ„์— ๋Œ€ํ•œ binary image mask๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

      • AisA^s_iAisโ€‹: "semantic feed back"์„ ์˜๋ฏธํ•˜๊ณ , symbolic space์— ๋Œ€ํ•œ binary mask๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • Reasoning Module

    • ์ด๋ฏธ์ง€๊ฐ€ concep embedding module์„ ์ง€๋‚˜๋ฉด, ๋‚˜์˜ค๋Š” ์ถœ๋ ฅ๊ฐ’์€ ์ˆœ์„œ๊ฐ€ ์—†๋Š” ์ง‘ํ•ฉ์ž…๋‹ˆ๋‹ค.

    • ๋”ฐ๋ผ์„œ ์ด์— ๋งž๋Š” ์ž…๋ ฅ์„ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด, Set Transformer๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ ๊ฐ’์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

    • ๋ฐ˜๋Œ€๋กœ, reasoning module์— ๋Œ€ํ•œ ์„ค๋ช…์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ฃผ์–ด์ง„ symoblic representation์— ๋Œ€ํ•ด Set Transformer์˜ ์„ค๋ช…์„ ๋งŒ๋“ค์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

    • ์ด๋ฅผ ์œ„ํ•ด gradient-based Integrated Gradients exaplanation ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

  • (Slot) Attention is All You Need (for object-based explanations)

    • Slot Attention module์„ ์ด์šฉํ•ด, ์ž…๋ ฅ ์ด๋ฏธ์ง€ xix_ixiโ€‹ ์— ๋Œ€ํ•œ attention BiB_iBiโ€‹ ๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • BiB_iBiโ€‹ ๊ฐ’๊ณผ e^ig\hat{e}^g_ie^igโ€‹ ๋ฅผ ํ†ตํ•ด Eh(h(โ‹…),e^ig,xi)E^h(h(\cdot), \hat{e}^g_i, x_i)Eh(h(โ‹…),e^igโ€‹,xiโ€‹) ๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

(Gradient-based Integrated Gradients explanation, Set Transformer, Slot Attention์— ๋Œ€ํ•ด ์ž์„ธํ•˜๊ฒŒ ๋‹ค๋ฃจ์ง€๋Š” ์•Š์•˜์Šต๋‹ˆ๋‹ค.)

4. Dataset & Experiment & Result

Dataset: CLEVER-Hans

  • Dataset

    • CLEVER-Hans3

      • ์•„๋ž˜์™€ ๊ฐ™์€ 3๊ฐœ์˜ ํด๋ž˜์Šค ๊ทœ์น™์„ ํฌํ•จํ•˜๊ณ  ์žˆ๊ณ , ๊ทธ ์ค‘ 2๊ฐœ ํด๋ž˜์Šค ๊ทœ์น™์€ confounder๋กœ ์ž‘์šฉํ•ฉ๋‹ˆ๋‹ค.

      • class rule 1: large (gray) cube and large cylinder (training/validation set์—์„œ๋Š” ๊ด„ํ˜ธ์˜ ์†์„ฑ(gray)์— ํ•ด๋‹นํ•˜๋Š” ์ด๋ฏธ์ง€๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์ง€๋งŒ, test set์—์„œ๋Š” ๊ด„ํ˜ธ์˜ ์†์„ฑ์ด ์•„๋‹Œ ์ด๋ฏธ์ง€๋„ ํฌํ•จํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.)

      • class rule 2: small metal cube and small (metal) sphere (training/validation set์—์„œ๋Š” ๊ด„ํ˜ธ์˜ ์†์„ฑ(metal)์— ํ•ด๋‹นํ•˜๋Š” ์ด๋ฏธ์ง€๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์ง€๋งŒ, test set์—์„œ๋Š” ๊ด„ํ˜ธ์˜ ์†์„ฑ์ด ์•„๋‹Œ ์ด๋ฏธ์ง€๋„ ํฌํ•จํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.)

      • class rule 3: large blue sphere and small yellow sphere (training/validation/test set ๋ชจ๋‘ ๋™์ผํ•ฉ๋‹ˆ๋‹ค.)

    • CLEVER-Hans7

      • CLEVER-Hans3๊ณผ ๋น„์Šทํ•˜๊ฒŒ 7๊ฐœ์˜ ํด๋ž˜์Šค ๊ทœ์น™์„ ํฌํ•จํ•˜๊ณ  ์žˆ๊ณ , ๊ทธ ์ค‘ 4๊ฐœ ํด๋ž˜์Šค ๊ทœ์น™์€ confounder๋กœ ์ž‘์šฉํ•ฉ๋‹ˆ๋‹ค.

Experimental setting

  • Dataset

    • CLEVER-Hans3

    • CLEVER-Hans7

  • Baselines

    • CNN (Default): ResNet-based CNN ์‚ฌ์šฉ

    • CNN (XIL): ResNet-based CNN์— XIL ์ ์šฉ

      • ์ด ๋•Œ ์‚ฌ์šฉ๋œ CNN์˜ ์„ค๋ช…๋ฐฉ๋ฒ•๋ก ์€ (๋…ผ๋ฌธ์—์„œ ๋งํ•˜๋Š” ์‹œ๊ฐ์  ์„ค๋ช…์˜ ์˜ˆ์‹œ๋กœ) Grad-CAM์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

    • NeSy (Default): Neuro Symbolic Architecture ์‚ฌ์šฉ

    • NeSy (XIL): Neuro Symbolic Architecture์— XIL ์ ์šฉ

      • ์ด ๋•Œ ์‚ฌ์šฉ๋œ NeSy์˜ ์„ค๋ช…๋ฐฉ๋ฒ•๋ก ์€ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ์‹์ด๋ฉฐ, ์ด๋Š” ์‹œ๊ฐ์ ๊ณผ ์˜๋ฏธ๋ก ์  ์„ค๋ช…์„ ๋‘˜ ๋‹ค ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

  • Training setup

    • Default: Cross Entropy Loss๋ฅผ ์‚ฌ์šฉํ•œ ์ผ๋ฐ˜์ ์ธ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ์„ธํŒ…

    • XIL: Explanatory Loss๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ํ•™์Šต์— ๊ฐœ์ž…์„ ํ•˜๋Š” ์„ธํŒ…

  • Evaluation metric

    • Classification accuracy

Result

Table 2: CLEVER-Hans3 ๋ฐ์ดํ„ฐ์…‹๊ณผ CLEVER-Hans7 ๋ฐ์ดํ„ฐ์…‹์—์„œ ์‹คํ—˜ ๊ฒฐ๊ณผ

  • ๊ด€์ฐฐ 1: CNN ๋ชจ๋ธ์€ Clever-Hans์™€ ๊ฐ™์€ ์ˆœ๊ฐ„์„ ๊ฒช๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

    • ๊ทผ๊ฑฐ: Table2์—์„œ CNN(Default)์˜ validation ์„ฑ๋Šฅ์€ ๊ฑฐ์˜ ์™„๋ฒฝ์— ๊ฐ€๊น์ง€๋งŒ, ๊ทธ์— ๋น„ํ•ด test ์„ฑ๋Šฅ์ด ๋‚ฎ์Œ

  • ๊ด€์ฐฐ 2: ์‹œ๊ฐ์ ์ธ ์„ค๋ช…์„ ํ™œ์šฉํ•˜๋Š” XIL์€ CLEVER-Hans ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ์‹คํŒจํ•œ ๋ชจ์Šต์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

    • ๊ทผ๊ฑฐ: Table2์—์„œ CNN(XIL)์˜ validation ์„ฑ๋Šฅ์€ ๊ฑฐ์˜ ์™„๋ฒฝ์— ๊ฐ€๊น์ง€๋งŒ, ๊ทธ์— ๋น„ํ•ด test ์„ฑ๋Šฅ์ด (XIL์„ ์ ์šฉํ–ˆ์Œ์—๋„) ์—ฌ์ „ํžˆ ๋‚ฎ์Œ

  • ๊ด€์ฐฐ 3: Neuro-Symbolic ๋ชจ๋ธ์€ CNN ๋ชจ๋ธ์— ๋น„ํ•ด test accuracy๊ฐ€ ๋†’์Šต๋‹ˆ๋‹ค.

    • ๊ทผ๊ฑฐ: Table2์—์„œ NeSy(Default)์˜ test ์„ฑ๋Šฅ์ด CNN(Default)์˜ test ์„ฑ๋Šฅ๋ณด๋‹ค ๊ฝค ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ

  • ๊ด€์ฐฐ 4: XIL๋กœ ๊ฐœ์ž…ํ•˜๊ธฐ ์ด์ „์—, Neuro-Symbolic ๋ชจ๋ธ ์—ญ์‹œ Clever-Hans์™€ ๊ฐ™์€ ์ˆœ๊ฐ„์„ ๊ฒช๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

    • ๊ทผ๊ฑฐ: Table2์—์„œ NeSy(Default)์˜ test ์„ฑ๋Šฅ์ด ์™„๋ฒฝ์— ๊ฐ€๊นŒ์šด validation ์„ฑ๋Šฅ์— ๋งŽ์ด ๋ชป ๋ฏธ์น˜๊ณ  ์žˆ๋Š” ๋ชจ์Šต์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ

  • ๊ด€์ฐฐ 5: Neuro-Symbolic ๋ชจ๋ธ์„ ํ†ตํ•ด ์˜๋ฏธ๋ก ์ ์ธ ์„ค๋ช…๊นŒ์ง€ ํ™œ์šฉํ•˜๋Š” XIL์˜ ๊ฒฝ์šฐ Clever-Hans ๋ฐ์ดํ„ฐ์…‹๊ณผ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๋งŽ์ด ํ•ด๊ฒฐํ–ˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

    • ๊ทผ๊ฑฐ: Table2์—์„œ NeSy(XIL)์˜ test ์„ฑ๋Šฅ์ด validation ์„ฑ๋Šฅ ๊ทผ์ฒ˜๋กœ ๊ฐ€์žฅ ๊ฐ€๊น๊ฒŒ ์˜จ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ

    • => ์ €์ž๋“ค์˜ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ํšจ๊ณผ์ ์ž„

    • => ๋‹ค๋ฅธ ์˜๋ฏธ๋กœ, ์˜๋ฏธ๋ก ์ ์ธ ์„ค๋ช…์ด CLEVER-Hans์™€ ๊ฐ™์€ ์ˆœ๊ฐ„์— ํšจ๊ณผ์ ์ž„

Figure 5: CLEVER-Hans3์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ๋“ค์˜ ์„œ๋กœ ๋‹ค๋ฅธ ์ •์„ฑ์  ๊ฒฐ๊ณผ

  • Figure 5์—์„œ๋Š” ์„œ๋กœ ๋‹ค๋ฅธ 2๊ฐ€์ง€ ๋ชจ๋ธ์ด ์ผ๋ฐ˜์ ์œผ๋กœ ํ•™์Šตํ•œ ๊ฒฝ์šฐ (Default), ์‚ฌ๋žŒ์ด ๊ฐœ์ž…ํ•œ ๊ฒฝ์šฐ (XIL)์— ๋Œ€ํ•ด์„œ ์–ด๋–ป๊ฒŒ ํ•ด๋‹น ์„ค๋ช…๋“ค์ด ๋ฐ”๋€Œ๊ณ  ์žˆ๋Š”์ง€ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋Š” ๊ทธ๋ฆผ์ž…๋‹ˆ๋‹ค.

  • CNN(Default), CNN(XIL), NeSY(Default) ๋ชจ๋‘ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์˜ˆ์ธกํ•˜์ง€ ๋ชปํ•œ ๋ชจ์Šต์„ ๋ณผ ์ˆ˜ ์žˆ๊ณ , NeSY(XIL) ๊ฒฝ์šฐ๋งŒ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์˜ˆ์ธกํ•œ ๋ชจ์Šต์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • CNN ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ, ์–ด๋–ค ์ด์œ ๋กœ ์ธํ•ด ๋ชจ๋ธ์ด ๊ฒฐ์ •ํ–ˆ๋Š”์ง€ ์˜๋ฏธ๋ก ์ ์œผ๋กœ ํ•ด์„ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

  • NeSY ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ, ์˜๋ฏธ๋ก ์ ์œผ๋กœ ๋ชจ๋ธ์ด ์–ด๋–ค ์˜ˆ์ธก์„ ํ•˜๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

5. Conclusion

  • ์ €์ž๋“ค์€ ์˜๋ฏธ๋ก ์  (semantic) ์ˆ˜์ค€์—์„œ ๋ชจ๋ธ์„ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋Š”, ๋‰ด๋กœ-์‹ฌ๋ณผ๋ฆญ (Neuro-Symbolic) ์žฅ๋ฉด ํ‘œํ˜„ (scene representaiton) ๊ณต๊ฐ„์— ๊ฐœ์ž…ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

  • ๋ชจ๋ธ์˜ ํšจ๊ณผ๋ฅผ ์ž…์ฆํ•˜๊ธฐ ์œ„ํ•ด, ์ƒˆ๋กœ์šด ๊ต๋ž€๋œ ๋ฐ์ดํ„ฐ์…‹์ธ CLEVER-Hans3๊ณผ CLEVER-Hans7์„ ๊ตฌ์ถ•ํ–ˆ์Šต๋‹ˆ๋‹ค.

  • "์‹œ๊ฐ์ " ์„ค๋ช…๋งŒ์œผ๋กœ ์‹๋ณ„ํ•  ์ˆ˜ ์—†๋Š” ๊ต๋ž€์š”์ธ์„ "์˜๋ฏธ๋ก ์ " ์„ค๋ช…์„ ํ†ตํ•ด ์‹๋ณ„ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

  • ๊ฒฐ๊ณผ์ ์œผ๋กœ, ์ด ์˜๋ฏธ๋ก ์  ์ˆ˜์ค€์— ๋Œ€ํ•œ ํ”ผ๋“œ๋ฐฑ์„ ํ†ตํ•ด ํ•ด๋‹น ์˜๋ฏธ์— ์ดˆ์ ์„ ๋งž์ถ”์–ด ๋ชจ๋ธ์„ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Take home message (์˜ค๋Š˜์˜ ๊ตํ›ˆ)

์ด ๋…ผ๋ฌธ์€ Color-MNIST๋ผ๋Š” motivated example์— ๊ฑธ์ณ, ๋ชจ๋ธ์— ๊ฐœ์ž…ํ•  ๋•Œ ์™œ ๊ธฐํ˜ธ์ ์ธ ํ‘œํ˜„์„ ์ด์šฉํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•œ์ง€ ๋ณด์—ฌ์ค€ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค.

์ฒ˜์Œ์œผ๋กœ XIL์— Neural Symbolic Architecture๋ฅผ ์ œ์•ˆํ•˜์—ฌ, Grad-CAM๊ฐ™์€ ์‹œ๊ฐ์  ์„ค๋ช…๋“ค๋ณด๋‹ค๋„ ํšจ๊ณผ์ ์œผ๋กœ ๋ชจ๋ธ์— ๊ฐœ์ž…์ด ๊ฐ€๋Šฅํ•จ์„ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

์ด์ฒ˜๋Ÿผ ๋”ฅ๋Ÿฌ๋‹์—์„œ ๋น„๊ธฐํ˜ธ์ ์ธ ํ‘œํ˜„๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ธฐํ˜ธ์ ์ธ ํ‘œํ˜„์„ ์ž˜ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด, ์‚ฌ๋žŒ์ด ์ง€๊ฐํ•˜๊ณ  ์ธ์ง€ํ•˜๋Š” ๋Šฅ๋ ฅ์„ ํ™œ์šฉํ•˜๋Š”๋ฐ ํฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Author / Reviewer information

Author

๋ฐฐ์„ฑ์ˆ˜

  • KAIST AI

  • contact: seongsu@kaist.ac.kr

Reviewer

  1. Korean name (English name): Affiliation / Contact information

  2. Korean name (English name): Affiliation / Contact information

  3. ...

Reference & Additional materials

English version
Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations