📝
Awesome reviews
  • Welcome
  • Paper review
    • [2022 Spring] Paper review
      • RobustNet [Eng]
      • DPT [Kor]
      • DALL-E [Kor]
      • VRT: A Video Restoration Transformer [Kor]
      • Barbershop [Kor]
      • Barbershop [Eng]
      • REFICS [ENG]
      • Deep texture manifold [Kor]
      • SlowFast Networks [Kor]
      • SCAN [Eng]
      • DPT [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Eng]
      • Patch Cratf : Video Denoising by Deep Modeling and Patch Matching [Eng]
      • LAFITE: Towards Language-Free Training for Text-to-Image Generation [Kor]
      • RegSeg [Eng]
      • D-NeRF [Eng]
      • SimCLR [Kor]
      • LabOR [Kor]
      • LabOR [Eng]
      • SegFormer [Kor]
      • Self-Calibrating Neural Radiance Fields [Kor]
      • Self-Calibrating Neural Radiance Fields [Eng]
      • GIRAFFE [Kor]
      • GIRAFFE [Eng]
      • DistConv [Kor]
      • SCAN [Eng]
      • slowfastnetworks [Kor]
      • Nesterov and Scale-Invariant Attack [Kor]
      • OutlierExposure [Eng]
      • TSNs [Kor]
      • TSNs [Eng]
      • Improving the Transferability of Adversarial Samples With Adversarial Transformations [Kor]
      • VOS: OOD detection by Virtual Outlier Synthesis [Kor]
      • MultitaskNeuralProcess [Kor]
      • RSLAD [Eng]
      • Deep Learning for 3D Point Cloud Understanding: A Survey [Eng]
      • BEIT [Kor]
      • Divergence-aware Federated Self-Supervised Learning [Eng]
      • NeRF-W [Kor]
      • Learning Multi-Scale Photo Exposure Correction [Eng]
      • ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [Eng]
      • ViT [Eng]
      • CrossTransformer [Kor]
      • NeRF [Kor]
      • RegNeRF [Kor]
      • Image Inpainting with External-internal Learning and Monochromic Bottleneck [Eng]
      • CLIP-NeRF [Kor]
      • CLIP-NeRF [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Kor]
      • DatasetGAN [Eng]
      • MOS [Kor]
      • MOS [Eng]
      • PlaNet [Eng]
      • MAE [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Eng]
      • Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning [Kor]
      • PointNet [Kor]
      • PointNet [Eng]
      • MSD AT [Kor]
      • MM-TTA [Kor]
      • MM-TTA [Eng]
      • M-CAM [Eng]
      • MipNerF [Kor]
      • The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [Eng]
      • Calibration [Eng]
      • CenterPoint [Kor]
      • YOLOX [Kor]
    • [2021 Fall] Paper review
      • DenseNet [Kor]
      • Time series as image [Kor]
      • mem3d [Kor]
      • GraSP [Kor]
      • DRLN [Kor]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Eng]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Kor]
      • NeSyXIL [Kor]
      • NeSyXIL [Eng]
      • RCAN [Kor]
      • RCAN [Eng]
      • MI-AOD [Kor]
      • MI-AOD [Eng]
      • DAFAS [Eng]
      • HyperGAN [Eng]
      • HyperGAN [Kor]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Eng]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Kor]
      • UPFlow [Eng]
      • GFP-GAN [Kor]
      • Federated Contrastive Learning [Kor]
      • Federated Contrastive Learning [Eng]
      • BGNN [Kor]
      • LP-KPN [Kor]
      • Feature Disruptive Attack [Kor]
      • Representative Interpretations [Kor]
      • Representative Interpretations [Eng]
      • Neural Discrete Representation Learning [KOR]
      • Neural Discrete Representation Learning [ENG]
      • Video Frame Interpolation via Adaptive Convolution [Kor]
      • Separation of hand motion and pose [kor]
      • pixelNeRF [Kor]
      • pixelNeRF [Eng]
      • SRResNet and SRGAN [Eng]
      • MZSR [Kor]
      • SANforSISR [Kor]
      • IPT [Kor]
      • Swin Transformer [kor]
      • CNN Cascade for Face Detection [Kor]
      • CapsNet [Kor]
      • Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Kor]
      • CSRNet [Kor]
      • ScrabbleGAN [Kor]
      • CenterTrack [Kor]
      • CenterTrack [Eng]
      • STSN [Kor]
      • STSN [Eng]
      • VL-BERT:Visual-Linguistic BERT [Kor]
      • VL-BERT:Visual-Linguistic BERT [Eng]
      • Squeeze-and-Attention Networks for Semantic segmentation [Kor]
      • Shot in the dark [Kor]
      • Noise2Self [Kor]
      • Noise2Self [Eng]
      • Dynamic Head [Kor]
      • PSPNet [Kor]
      • PSPNet [Eng]
      • CUT [Kor]
      • CLIP [Eng]
      • Local Implicit Image Function [Kor]
      • Local Implicit Image Function [Eng]
      • MetaAugment [Eng]
      • Show, Attend and Tell [Kor]
      • Transformer [Kor]
      • DETR [Eng]
      • Multimodal Versatile Network [Eng]
      • Multimodal Versatile Network [Kor]
      • BlockDrop [Kor]
      • MDETR [Kor]
      • MDETR [Eng]
      • FSCE [Kor]
      • waveletSR [Kor]
      • DAN-net [Eng]
      • Boosting Monocular Depth Estimation [Eng]
      • Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow [Kor]
      • Syn2real-generalization [Kor]
      • Syn2real-generalization [Eng]
      • GPS-Net [Kor]
      • Frustratingly Simple Few Shot Object Detection [Eng]
      • DCGAN [Kor]
      • RealSR [Kor]
      • AMP [Kor]
      • AMP [Eng]
      • RCNN [Kor]
      • MobileNet [Eng]
  • Author's note
    • [2022 Spring] Author's note
      • Pop-Out Motion [Kor]
    • [2021 Fall] Author's note
      • Standardized Max Logits [Eng]
      • Standardized Max Logits [Kor]
  • Dive into implementation
    • [2022 Spring] Implementation
      • Supervised Contrastive Replay [Kor]
      • Pose Recognition with Cascade Transformers [Eng]
    • [2021 Fall] Implementation
      • Diversity Input Method [Kor]
        • Source code
      • Diversity Input Method [Eng]
        • Source code
  • Contributors
    • [2022 Fall] Contributors
    • [2021 Fall] Contributors
  • How to contribute?
    • (Template) Paper review [Language]
    • (Template) Author's note [Language]
    • (Template) Implementation [Language]
  • KAIST AI
Powered by GitBook
On this page
  • 1. Problem definition
  • 2. Motivation
  • 2.1. Related work
  • 2.2. Idea
  • 3. Residual Channel Attention Network (RCAN)
  • 3.1. Network Architecture
  • 3.2. Residual in Residual (RIR)
  • 3.3. Residual Channel Attention Block (RCAB)
  • 4. Experiment & Result
  • 4.1. Experimental setup
  • 4.2. Result
  • 1. Effects of RIR and CA
  • 2. Model Size Analyses
  • 5. Conclusion
  • Take home message (오늘의 교훈)
  • Author / Reviewer information
  • 1. Author
  • 2. Reviewer
  • Reference & Additional materials

Was this helpful?

  1. Paper review
  2. [2021 Fall] Paper review

RCAN [Eng]

Yulun Zhang et al. / Image Super-Resolution Using Very Deep Residual Channel Attention Networks / ECCV 2018

PreviousRCAN [Kor]NextMI-AOD [Kor]

Last updated 3 years ago

Was this helpful?

한국어로 쓰인 리뷰를 읽으려면 를 누르세요.

1. Problem definition

The Single Image Super-Resolution (SISR) technique aims to restore a low resolution (LR) image to a high resolution (HR) while removing blur and various noises in the image. SR is expressed as an equation as follows, where x and y are LR and HR images, respectively.

y=(x⊗k)↓s+n\textbf{y}=(\textbf{x} \otimes \textbf{k} )\downarrow_s + \textbf{n}y=(x⊗k)↓s​+n

where y and x denote high and low resolution image, respectively, and k and n mean blur and noise matrix, respectively. Recently, CNN-based SR has been actively studied, since CNN works effectively on SR, However, CNN-based SR has the following two limitations.

  • Gradient Vanishing [Note i] occurs as the layer deepens, making learning more difficult

  • The representativeness of each feature map is weakened as low-frequency information included in the LR image is treated equally in all channels.

To overcome the aforementioned goals of SR and the above two limitations, this paper proposes Deep-RCAN (Residual Channel Attention Networks).

[Note i] Gradient Vanishing: As the input value goes through the activation function, it is squeezed into a small range of output values, so it means the state that the initial input value has little effect on the output value as it goes through the activation functions of several layers. Accordingly, the rate of change of the parameter values of the initial layers with respect to the output becomes small, making learning impossible.

2. Motivation

2.1. Related work

The papers related to deep-CNN and attention technique, which are the baselines of this paper, are as follows.

1. CNN based SR

  • [SRCNN & FSRCNN]: SRCNN, the first technique applying CNN to SR, significantly improved performance compared to existing Non-CNN based SR techniques by constructing a 3-layer CNN. FSRCNN simplifies the network structure of SRCNN to increase inference and learning speed.

  • [VDSR & DRCN]: By stacking layers deeper than SRCNN (20 layers), the performance is greatly improved.

  • [SRResNet & SRGAN]: SRResNet was the first to introduce ResNet to SR. In SRGAN, photo-realistic SR was implemented by mitigating blur by introducing GAN to SRResNet. However, there are cases where an unintentional artifact object is created.

  • [EDSR & MDSR]: By removing unnecessary modules from the existing ResNet, the speed is greatly increased. However, it cannot implement the deep layer, which is the key in image processing, and has limitations in that it includes unnecessary calculations and does not represent various features by treating low-frequency information equally in all channels.

2. Attention Method

Attention is a technique for biasing processing resources on a specific part of interest in input data, and increases the processing performance for that part. Until now, attention has been generally used for high-level vision tasks such as object recognition and image classification, and has hardly been dealt with in low-level vision tasks such as image SR. In this paper, attention is applied to the high-frequency region in the LR image to enhance the high-frequency constituting the high-resolution (HR) image.

2.2. Idea

The idea of the paper and its contribution can be summarized in the following three categories.

1. Residual Channel Attention Network (RCAN)

Through Residual Channel Attention Network (RCAN), a more accurate SR image is obtained by layering more deeply than the existing CNN-based SR.

2. Residual in Residual (RIR)

By building deeper layers that are trainable through Residual in Residual (RIR), and bypassing low-frequency information of low-resolution images with long and short skip connections inside RIR blocks, more efficient neural networks can be designed.

3. Channel Attention (CA)

By considering interdependencies between feature channels through Channel Attention (CA), adaptive feature rescaling is possible.

3. Residual Channel Attention Network (RCAN)

3.1. Network Architecture

The network structure of RCAN is mainly composed of 4 parts: i) Shallow feature extraction, ii) RIR deep feature extraction, iii) Upscale module, and iv) Reconstruction part. In this paper, one convolutional layer, deconvolutional layer, and L1 loss are used for i), iii), and iv), respectively, similar to the existing EDSR technique. ii) Contributions to CA and RCAB, including RIR deep feature extraction, are introduced in the next section.

L(Θ)=1N∑Ni=1∥HRCAN(ILRi)−IHRi∥1L(\Theta )=\frac{1}{N}\sum_{N}^{i=1}\left \| H_{RCAN}(I_{LR}^i)-I_{HR}^i \right \|_1L(Θ)=N1​N∑i=1​​HRCAN​(ILRi​)−IHRi​​1​

3.2. Residual in Residual (RIR)

RIR consists of G blocks consisting of a residual group (RG) and a long skip connection (LSC). In particular, one RG consists of B operations in units of residual channel attention block (RCAB) and short skip connection (SSC). With this structure, it is possible to form more than 400 CNN layers. Since piling only RG deeply has limitations in terms of performance, LSC is introduced at the end of the RIR to stabilize the neural network. In addition, by introducing LSC and SSC together, unnecessary low-frequency information in the LR image can be bypassed more efficiently.

3.3. Residual Channel Attention Block (RCAB)

In this paper, the Residual Channel Attention Block (RCAB) is proposed by merging Channel Attention (CA) with the Residual Block (RB). In particular, to overcome the fact that CNN cannot use overall information other than the local region by considering only the local receptive field, CA expressed spatial information using global average pooling.

On the other hand, in order to show the correlation between channels, a gating mechanism [Note ii] was additionally introduced. In general, the gating mechanism should exhibit nonlinearity between channels, and the mutually exclusive relationship should be learned while the features of multiple channels are emphasized compared to one-hot activation. To meet these criteria, sigmoid gating and ReLU were selected.

[Note ii] Gating Mechanisms: Gating mechanisms were introduced to address the vanishing gradient problem and have proven to be crucial to the success of RNNs. This mechanism essentially smooths out the update. [Gu, Albert, et al. "Improving the gating mechanism of recurrent neural networks." International Conference on Machine Learning. PMLR, 2020.]

4. Experiment & Result

4.1. Experimental setup

1. Datasets and degradation models

Some 800 images of the DIV2K dataset were used for training images, and Set5, B100, Urban 100, and Manga109 were used as test images. As the degradation models, bicubic (BI) and blur-downscale (BD) were used.

2. Evaluation metrics

The Y channel of the YCbCr color space [Note iii] of the PSNR and SSIM-processed images was evaluated. Also, compared with other SR techniques ranked 1st to 5th in recognition error, the performance advantage was confirmed.

[Note iii] YcbCr: YCbCr, Y′CbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or Y′CBCR, is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y′ is the luma component and CB and CR are the blue-difference and red-difference chroma components. Y′ (with prime) is distinguished from Y, which is luminance, meaning that light intensity is nonlinearly encoded based on gamma corrected RGB primaries. [Wikipedia]

3. Training settings

Data augmentation such as rotation and vertical inversion was applied to 800 images in the aforementioned DIV2K dataset, and 16 LR patches with a size of 48x48 were extracted as inputs from each training batch. ADAM was used as an optimization technique.

4.2. Result

1. Effects of RIR and CA

While the existing technique showed performance of 37.45dB, by using RIR and CA including long skip connection (LSC) and short skip connection (SSC), the performance was increased to 37.90dB.

2. Model Size Analyses

Compared to other techniques (DRCN, FSRCNN, PSyCo, ENet-E), RCAN achieves the deepest neural network, has the smallest number of parameters, but shows the highest performance.

5. Conclusion

In this paper, RCAN is applied to obtain high-accuracy SR images. In particular, by utilizing the RIR structure together with LSC and SSC, it was possible to form a deep layer. In addition, RIR allows the neural network to learn high-frequency information by bypassing the low-frequency information, which is unnecessary information in the LR image. Furthermore, channel-wise features were adaptively rescaled by introducing CA and considering interdependencies between channels. The proposed technique verified the SR performance using the BI and DB degradation models, and it was confirmed that it also showed excellent performance in object recognition.

Take home message (오늘의 교훈)

By segmenting the information in the area of interest in the image and applying attention to the information, the weight can be increased in the learning process for interest part.

It is more effective to increase the performance by building the neural network deeper than increasing the total number of parameters.

Author / Reviewer information

1. Author

한승호 (Seungho Han)

  • KAIST ME

  • Research Topics: Formation Control, Vehicle Autonomous Driving, Image Super Resolution

  • https://www.linkedin.com/in/seung-ho-han-8a54a4205/

2. Reviewer

  1. Korean name (English name): Affiliation / Contact information

  2. Korean name (English name): Affiliation / Contact information

  3. ...

Reference & Additional materials

  1. [Original Paper] Zhang, Yulun, et al. "Image super-resolution using very deep residual channel attention networks." Proceedings of the European conference on computer vision (ECCV). 2018.

  2. [Github] https://github.com/yulunzhang/RCAN

  3. [Github] https://github.com/dongheehand/RCAN-tf

  4. [Github] https://github.com/yjn870/RCAN-pytorch

  5. [Attention] https://wikidocs.net/22893

  6. [Dataset] Xu, Qianxiong, and Yu Zheng. "A Survey of Image Super Resolution Based on CNN." Cloud Computing, Smart Grid and Innovative Frontiers in Telecommunications. Springer, Cham, 2019. 184-199.

  7. [BSRGAN] Zhang, Kai, et al. "Designing a practical degradation model for deep blind image super-resolution." arXiv preprint arXiv:2103.14006 (2021).

  8. [Google's SR3] https://80.lv/articles/google-s-new-approach-to-image-super-resolution/

  9. [SRCNN] Dai, Yongpeng, et al. "SRCNN-based enhanced imaging for low frequency radar." 2018 Progress in Electromagnetics Research Symposium (PIERS-Toyama). IEEE, 2018.

  10. [FSRCNN] Zhang, Jian, and Detian Huang. "Image Super-Resolution Reconstruction Algorithm Based on FSRCNN and Residual Network." 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC). IEEE, 2019.

  11. [VDSR] Hitawala, Saifuddin, et al. "Image super-resolution using VDSR-ResNeXt and SRCGAN." arXiv preprint arXiv:1810.05731 (2018).

  12. [SRResNet ] Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

  13. [SRGAN] Nagano, Yudai, and Yohei Kikuta. "SRGAN for super-resolving low-resolution food images." Proceedings of the Joint Workshop on Multimedia for Cooking and Eating Activities and Multimedia Assisted Dietary Management. 2018.

여기