๐Ÿ“
Awesome reviews
  • Welcome
  • Paper review
    • [2022 Spring] Paper review
      • RobustNet [Eng]
      • DPT [Kor]
      • DALL-E [Kor]
      • VRT: A Video Restoration Transformer [Kor]
      • Barbershop [Kor]
      • Barbershop [Eng]
      • REFICS [ENG]
      • Deep texture manifold [Kor]
      • SlowFast Networks [Kor]
      • SCAN [Eng]
      • DPT [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Eng]
      • Patch Cratf : Video Denoising by Deep Modeling and Patch Matching [Eng]
      • LAFITE: Towards Language-Free Training for Text-to-Image Generation [Kor]
      • RegSeg [Eng]
      • D-NeRF [Eng]
      • SimCLR [Kor]
      • LabOR [Kor]
      • LabOR [Eng]
      • SegFormer [Kor]
      • Self-Calibrating Neural Radiance Fields [Kor]
      • Self-Calibrating Neural Radiance Fields [Eng]
      • GIRAFFE [Kor]
      • GIRAFFE [Eng]
      • DistConv [Kor]
      • SCAN [Eng]
      • slowfastnetworks [Kor]
      • Nesterov and Scale-Invariant Attack [Kor]
      • OutlierExposure [Eng]
      • TSNs [Kor]
      • TSNs [Eng]
      • Improving the Transferability of Adversarial Samples With Adversarial Transformations [Kor]
      • VOS: OOD detection by Virtual Outlier Synthesis [Kor]
      • MultitaskNeuralProcess [Kor]
      • RSLAD [Eng]
      • Deep Learning for 3D Point Cloud Understanding: A Survey [Eng]
      • BEIT [Kor]
      • Divergence-aware Federated Self-Supervised Learning [Eng]
      • NeRF-W [Kor]
      • Learning Multi-Scale Photo Exposure Correction [Eng]
      • ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [Eng]
      • ViT [Eng]
      • CrossTransformer [Kor]
      • NeRF [Kor]
      • RegNeRF [Kor]
      • Image Inpainting with External-internal Learning and Monochromic Bottleneck [Eng]
      • CLIP-NeRF [Kor]
      • CLIP-NeRF [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Kor]
      • DatasetGAN [Eng]
      • MOS [Kor]
      • MOS [Eng]
      • PlaNet [Eng]
      • MAE [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Eng]
      • Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning [Kor]
      • PointNet [Kor]
      • PointNet [Eng]
      • MSD AT [Kor]
      • MM-TTA [Kor]
      • MM-TTA [Eng]
      • M-CAM [Eng]
      • MipNerF [Kor]
      • The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [Eng]
      • Calibration [Eng]
      • CenterPoint [Kor]
      • YOLOX [Kor]
    • [2021 Fall] Paper review
      • DenseNet [Kor]
      • Time series as image [Kor]
      • mem3d [Kor]
      • GraSP [Kor]
      • DRLN [Kor]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Eng]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Kor]
      • NeSyXIL [Kor]
      • NeSyXIL [Eng]
      • RCAN [Kor]
      • RCAN [Eng]
      • MI-AOD [Kor]
      • MI-AOD [Eng]
      • DAFAS [Eng]
      • HyperGAN [Eng]
      • HyperGAN [Kor]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Eng]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Kor]
      • UPFlow [Eng]
      • GFP-GAN [Kor]
      • Federated Contrastive Learning [Kor]
      • Federated Contrastive Learning [Eng]
      • BGNN [Kor]
      • LP-KPN [Kor]
      • Feature Disruptive Attack [Kor]
      • Representative Interpretations [Kor]
      • Representative Interpretations [Eng]
      • Neural Discrete Representation Learning [KOR]
      • Neural Discrete Representation Learning [ENG]
      • Video Frame Interpolation via Adaptive Convolution [Kor]
      • Separation of hand motion and pose [kor]
      • pixelNeRF [Kor]
      • pixelNeRF [Eng]
      • SRResNet and SRGAN [Eng]
      • MZSR [Kor]
      • SANforSISR [Kor]
      • IPT [Kor]
      • Swin Transformer [kor]
      • CNN Cascade for Face Detection [Kor]
      • CapsNet [Kor]
      • Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Kor]
      • CSRNet [Kor]
      • ScrabbleGAN [Kor]
      • CenterTrack [Kor]
      • CenterTrack [Eng]
      • STSN [Kor]
      • STSN [Eng]
      • VL-BERT:Visual-Linguistic BERT [Kor]
      • VL-BERT:Visual-Linguistic BERT [Eng]
      • Squeeze-and-Attention Networks for Semantic segmentation [Kor]
      • Shot in the dark [Kor]
      • Noise2Self [Kor]
      • Noise2Self [Eng]
      • Dynamic Head [Kor]
      • PSPNet [Kor]
      • PSPNet [Eng]
      • CUT [Kor]
      • CLIP [Eng]
      • Local Implicit Image Function [Kor]
      • Local Implicit Image Function [Eng]
      • MetaAugment [Eng]
      • Show, Attend and Tell [Kor]
      • Transformer [Kor]
      • DETR [Eng]
      • Multimodal Versatile Network [Eng]
      • Multimodal Versatile Network [Kor]
      • BlockDrop [Kor]
      • MDETR [Kor]
      • MDETR [Eng]
      • FSCE [Kor]
      • waveletSR [Kor]
      • DAN-net [Eng]
      • Boosting Monocular Depth Estimation [Eng]
      • Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow [Kor]
      • Syn2real-generalization [Kor]
      • Syn2real-generalization [Eng]
      • GPS-Net [Kor]
      • Frustratingly Simple Few Shot Object Detection [Eng]
      • DCGAN [Kor]
      • RealSR [Kor]
      • AMP [Kor]
      • AMP [Eng]
      • RCNN [Kor]
      • MobileNet [Eng]
  • Author's note
    • [2022 Spring] Author's note
      • Pop-Out Motion [Kor]
    • [2021 Fall] Author's note
      • Standardized Max Logits [Eng]
      • Standardized Max Logits [Kor]
  • Dive into implementation
    • [2022 Spring] Implementation
      • Supervised Contrastive Replay [Kor]
      • Pose Recognition with Cascade Transformers [Eng]
    • [2021 Fall] Implementation
      • Diversity Input Method [Kor]
        • Source code
      • Diversity Input Method [Eng]
        • Source code
  • Contributors
    • [2022 Fall] Contributors
    • [2021 Fall] Contributors
  • How to contribute?
    • (Template) Paper review [Language]
    • (Template) Author's note [Language]
    • (Template) Implementation [Language]
  • KAIST AI
Powered by GitBook
On this page
  • 1. Problem definition
  • 2. Motivation
  • 3. Method
  • 4. Experiment & Result
  • Result
  • 5. Conclusion
  • Author / Reviewer information
  • Author
  • Reviewer
  • Reference & Additional materials

Was this helpful?

  1. Paper review
  2. [2021 Fall] Paper review

RCNN [Kor]

Girshick et al. / Rich feature hierarchies for accurate object detection and semantic segmentation / CVPR 2014

PreviousAMP [Eng]NextMobileNet [Eng]

Last updated 3 years ago

Was this helpful?

1. Problem definition

Object Detection ๋ถ„์•ผ๋Š” ํ•œ๋™์•ˆ ์ •์ฒด๋˜์—ˆ๊ณ  2012๋…„ ILSVRC (ImageNet Large Scale Visual Recognition Callenge) ์—์„œ CNN์ด ์•Œ๋ ค์กŒ๋‹ค. ์ด ๋…ผ๋ฌธ์€ PASCAL VOC Challenge์—์„œ CNN์œผ๋กœ classification ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์„ฑ๋Šฅ ์ข‹์€ object detection์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.

2. Motivation

Region proposal๊ณผ CNN์„ ํ†ตํ•œ clssification์„ ๊ฒฐํ•ฉํ•˜์—ฌ object detection์— ๊ด€ํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ. ์ดํ›„ RCNN์„ ๋ฐ”ํƒ•์œผ๋กœ RCNN ๊ณ„์—ด (Fast RCNN, Faster RCNN, Mask RCNN ๋“ฑ) ๋…ผ๋ฌธ๋“ค์—์„œ ๊พธ์ค€ํžˆ ์„ฑ๋Šฅ๊ณผ ์†๋„๋ฅผ ํ–ฅ์ƒ

  1. ์ด๋ฏธ์ง€๋ฅผ ์ง‘์–ด๋„ฃ๋Š”๋‹ค.

  2. 2000๊ฐœ ์ดํ•˜์˜ ์˜์—ญ์„ ์ถ”์ถœํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ž˜๋ผ๋‚ธ๋‹ค.

  3. ์ž˜๋ผ๋‚ธ ์ด๋ฏธ์ง€์˜ ์‚ฌ์ด์ฆˆ๋Š” CNN ๋ชจ๋ธ์— ๋งž๊ฒŒ ์กฐ์ •ํ•œ๋‹ค. (227x227 pixels)

  4. ์ด๋ฏธ์ง€๋ฅผ ImageNet์„ ํ™œ์šฉํ•œ pre-trained CNN ๋ชจ๋ธ๋กœ ์—ฐ์‚ฐํ•œ๋‹ค.

  5. ๊ฐ ์˜์—ญ๋ณ„๋กœ ์ž˜๋ผ๋‚ธ ์ด๋ฏธ์ง€๋“ค์˜ CNN ๊ฒฐ๊ณผ๋กœ ๋‚˜์˜จ feature map ์„ ํ™œ์šฉํ•˜์—ฌ, SVM์œผ๋กœ Classification ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•œ๋‹ค.

  6. regressor๋ฅผ ํ†ตํ•œ bounding box regression์„ ์ง„ํ–‰ํ•œ๋‹ค.

์œ„์˜ ๊ณผ์ •์—์„œ ๊ฐ ๋ฌผ์ฒด์˜ ์˜์—ญ์„ ์ฐพ์•„๋‚ด๋Š” Region proposal๊ณผ ์ž˜๋ผ๋‚ธ ์ด๋ฏธ์ง€๋“ค์„ ๋ถ„๋ฅ˜ํ•˜๋Š” clssification์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค. ์ด 2๊ฐ€์ง€์˜ ๊ณผ์ •์„ ์—ฐ๋‹ฌ์•„ ์ง„ํ–‰ํ•จ์œผ๋กœ์จ object detection์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค.

3. Method

object detection ์‹œ์Šคํ…œ์€ 3๊ฐ€์ง€์˜ ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.

  1. Region proposal

    selective search๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ์˜์—ญ์„ ๋ถ„๋ฆฌํ•˜์—ฌ ์ชผ๊ฐ ๋‹ค. [0,1] ์‚ฌ์ด๋กœ ์ •๊ทœํ™”๋œ 4๊ฐ€์ง€ ์š”์†Œ (์ƒ‰์ƒ, ์žฌ์งˆ, ํฌ๊ธฐ, ์ฑ„์›€) ๋“ฑ์˜ ๊ฐ€์ค‘ํ•ฉ์œผ๋กœ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•œ๋‹ค. ์ดˆ๊ธฐ์— ์„ ํƒ๋œ ์˜์—ญ๋“ค ์ค‘์— ์œ ์‚ฌ๋„๊ฐ€ ๋†’์€ ์˜์—ญ๋“ค์„ ์„ ํƒํ•˜์—ฌ ๋ณ‘ํ•ฉํ•œ๋‹ค. ๋ณ‘ํ•ฉํ•œ ์˜์—ญ๊ณผ ๋‹ค๋ฅธ ์˜์—ญ์˜ ์œ ์‚ฌ๋„๋ฅผ ์žฌ์ •์˜ํ•œ๋‹ค. ์ด ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•˜์—ฌ ์œ ์‚ฌ๋„๊ฐ€ ๋†’์€ ์˜์—ญ๋“ค์„ ํ•ฉ์ณ ๊ฐ ์˜์—ญ์„ ๊ตฌ๋ถ„ํ•œ๋‹ค.

  2. Pre-trained CNN (Convolutional Neural Network)

    region proposal์— ์˜ํ•ด ์ชผ๊ฐœ์ง„ ์ด๋ฏธ์ง€๋“ค์„ 227x277 ์‚ฌ์ด์ฆˆ๋กœ ๋งž์ถ˜๋‹ค. ๊ณ ์ •๋œ ์‚ฌ์ด์ฆˆ๋กœ ๋งž์ถฐ์ง„ ์ด๋ฏธ์ง€๋ฅผ CNN์— ๋„ฃ์–ด์„œ Classification์„ ์ง„ํ–‰ํ•œ๋‹ค. ๊ธฐ์กด์˜ AlexNet์˜ ๊ตฌ์กฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ–ˆ๋‹ค. ๋‹จ์ง€ object detection์„ ์œ„ํ•ด์„œ 1000๊ฐœ์˜ class๋กœ ๋ถ„๋ฅ˜ํ•˜๋˜ ๊ตฌ์กฐ๋Œ€์‹ ์— (200,20)์˜ feature map์„ ์ถ”์ถœํ•˜๋„๋ก ๋ณ€ํ˜•์„ ์‹œ์ผฐ๋‹ค.

  3. SVM (Support Vector Machine) CNN์„ ํ†ตํ•ด์„œ feature๋“ค์ด ์ถ”์ถœ๋œ๋‹ค. Feature ๋“ค์„ ์ด์šฉํ•ด์„œ Linear SVM์œผ๋กœ Classifciation์„ ์ง„ํ–‰ํ•œ๋‹ค.

  4. Bounding Box Regression Region proposal์„ ๊ฑฐ์น˜๋ฉด์„œ ์ถ”์ถœ๋œ bounding box์ธ P์™€ ground truth bounding box๋ฅผ ๋งž์ถ”๋„๋ก ํ•™์Šต์„ ํ•˜๋Š” ๊ฒƒ์ด Bounding Box Regression์˜ ๋ชฉํ‘œ์ด๋‹ค.

4. Experiment & Result

TBD

Result

Please summarize and interpret the experimental result in this subsection.

5. Conclusion

RCNN์€ ๊ธฐ์กด PASCAL VOC 2012์˜ ๊ฐ€์žฅ ์ข‹์€ ๊ธฐ๋ก๋ณด๋‹ค 30%์˜ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋ฌ๋‹ค. 2๊ฐ€์ง€ ๊ด€์ ์—์„œ ์˜์˜๋ฅผ ๊ฐ€์ง„๋‹ค. ํ•˜๋‚˜๋Š” region proposal๊ณผ CNN์„ ํ™œ์šฉํ•œ Object detection ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ ๊ฒƒ์ด๊ณ , ๋‚˜๋จธ์ง€๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•œ ์ƒํƒœ์—์„œ pre-train ๋œ ๊ฑฐ๋Œ€ CNN๊ณผ ํŠน์ • ๋ชฉ์ ์œผ๋กœ fine-tuneํ•˜์—ฌ ํšจ์œจ์„ฑ์„ ์ œ๊ณ ํ–ˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

Author / Reviewer information

Author

Korean Name (English name)

  • ๊ถŒ๋ฌธ๋ฒ” (NAVER)

  • https://github.com/MBKwon

Reviewer

  1. Korean name (English name): Affiliation / Contact information

  2. Korean name (English name): Affiliation / Contact information

  3. ...

Reference & Additional materials

  1. Citation of this paper

  2. Official (unofficial) GitHub repository

  3. Citation of related work

  4. Other useful materials

  5. ...

Figure 1: You can freely upload images in the manuscript.
Figure 1: You can freely upload images in the manuscript.
Figure 1: You can freely upload images in the manuscript.