SANforSISR [Kor]

Dai et al. / Second-Order Attention Network for Single Image Super-Resolution / CVPR 2019

1. Introduction

Single Image Super-Resolution(SISR) ๋ถ„์•ผ์— Convolutional Neural Network(CNN)๊ฐ€ ๋„์ž…๋˜๋ฉฐ ํฐ ์„ฑ๋Šฅ์˜ ํ–ฅ์ƒ์ด ์ด๋ฃจ์–ด์กŒ๋‹ค. ์—ฌ๊ธฐ์„œ ๊ธฐ์กด CNN based SISR methods ๋Š” wider/deeper architecture design์— ์ง‘์ค‘ํ•˜๋Š”๋ฐ, ์ด๋Š” intermediate layers ๊ฐ„ feature correlation์„ ๋ฌด์‹œํ•˜์—ฌ CNN์˜ representational power์„ ๋ฐฉํ•ดํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋‚ณ์•˜๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” Second-order Attention Network(SAN)๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

Second-Order Channel Attention(SOCA) module์€ first-order๋ณด๋‹ค ๋” ๋‚˜์€ feature correlation ํ•™์Šต์„ ์œ„ํ•œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด๋‹ค. ์ด๋Š” discriminative representation ํ–ฅ์ƒ์„ ์œ„ํ•ด second-order feature statics๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. channel-wise features๋ฅผ adaptively rescaleํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์ด๋ฅผ ํ™œ์šฉํ•˜์˜€๋Š”๋ฐ, ์ด๋Š” ๋„คํŠธ์›Œํฌ๊ฐ€ '๋” ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ๊ฐ–๋Š” feature'์— ์ง‘์ค‘ํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ํ•™์Šต ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

Non-locally Enhanced Residual Group(NLRG) structure์€ Local-Source Residual Attention Group(LSRAG)๋ฅผ ํฌํ•จํ•˜๋Š” ์—ฐ์‚ฐ์œผ๋กœ, long-distance spatial contextual information์„ ์ˆ˜์ง‘ํ•˜๋Š” non-local ์—ฐ์‚ฐ์ด๋‹ค. ์ถ”์ƒ์ ์ธ feature representation ํ•™์Šต์„ ์œ„ํ•œ LSRAG๋กœ, Low-Resolution(LR) image์—์„œ ๋งŽ์€ ์ •๋ณด๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  low frequency ์ •๋ณด๋ฅผ ํ†ต๊ณผ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•˜์˜€๋‹ค.

CNN-based SR models

์ตœ๊ทผ CNN-based methods๋Š” nonlinear ํ‘œํ˜„์˜ ๊ฐ•์  ๋•Œ๋ฌธ์— SR์— ๋งŽ์ด ์ด์šฉ๋˜์—ˆ๋‹ค. ์ด๋Š” SR์„ ์ด๋ฏธ์ง€-์ด๋ฏธ์ง€ ๊ฐ„ ๋ฌธ์ œ๋กœ ์ƒ๊ฐํ•˜์—ฌ LR-HR ๊ฐ„ ๋งคํ•‘์œผ๋กœ ์ง์ ‘ ๋Ÿฌ๋‹์„ ์‹คํ–‰ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์€ ์ฃผ๋กœ deeper/wider ๋„คํŠธ์›Œํฌ ์„ค๊ณ„๋ฅผ ์ค‘์ ์œผ๋กœ ํ•˜์˜€๋‹ค.

Attention mechanism

์ธ๊ฐ„์€ ์‹œ๊ฐ ์ •๋ณด๋ฅผ adaptiveํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, ์ค‘์š”ํ•œ ์˜์—ญ์— ์‹œ๊ฐ์„ ์ง‘์ค‘ํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๊ฐ–๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์›๋ฆฌ๋ฅผ CNN์— ์ ์šฉํ•œ ๊ฒƒ์ด Attention์˜ ์‹œ์ž‘์ด๋‹ค.

SENet์€ channel-wise relationship ํ™œ์šฉ์„ ํ†ตํ•ด ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค. SR ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด deep-CNN์— ๋„์ž…ํ–ˆ์œผ๋‚˜, SENet์€ first-order statistics๋งŒ ํ™œ์šฉํ•œ๋‹ค. ์ฆ‰, higher order statistics๋ฅผ ๋ฌด์‹œํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋„คํŠธ์›Œํฌ์˜ discriminative ability๊ฐ€ ์ €ํ•˜๋œ๋‹ค๋Š” ๋‹จ์ ์„ ๊ฐ–๊ณ  ์žˆ๋‹ค.

3. Method

Second-order Attention Network (SAN)

Network Framework

1_networkframework

- Shallow feature extraction

๋‹จ์ผ convolution layer๋งŒ ์‚ฌ์šฉํ•˜์—ฌ shallow feature์„ ์ถ”์ถœํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค.

2_Shallowfeatureextraction

- Non-locally enhanced residual group (NLRG) based deep feature extraction

2๊ฐœ์˜ Region-level Non-local module(RL-NL) ์‚ฌ์ด์— Share-source Residual Group(SSRG)์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋‹จ๊ณ„์ด๋‹ค. ์—ฌ๊ธฐ์„œ SSRG๋Š” ์—ฌ๋Ÿฌ(G)๊ฐœ์˜ Local-Source Residual Attention Groups(LSRAG)๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. LSRAG๋Š” 2๊ฐœ์˜ Residual block ์‚ฌ์ด์— ์—ฌ๋Ÿฌ Conv. layers + 1 ReLU layer ์˜ ๊ตฌ์„ฑ์— SOCA module์ด ๋“ค์–ด์žˆ๋Š” ํ˜•ํƒœ์ด๋‹ค.

NLRG ๋‚ด๋ถ€์˜ module ๋ฐ layers๋ฅผ ๊ทธ๋ฆผ์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

3_NLRG

SSRG: Share Source Skip Connection(SSC)์„ ํ™œ์šฉํ•˜๋Š” G * LSRAG modules๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.

LSRAG: SSC๋ฅผ ํ™œ์šฉํ•˜๋Š” M * residual blocks๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.

SOCA: inter-dependencies๋ฅผ ํ™œ์šฉํ•œ๋‹ค.

์ „์ฒด์  ๊ตฌ์„ฑ์—์„œ ๋ณด์ด๋“ฏ์ด, residual blocks๋ฅผ ๋งŽ์ด ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” ๋” ๊นŠ์€ CNN์˜ ํ™œ์šฉ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋‚˜, bottle-neck์ด ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์กด์žฌํ•œ๋‹ค. ๊ทธ๋ ‡๊ธฐ์— LSRAG์˜ ํ™œ์šฉ์„ ์ œ์•ˆ๋˜์—ˆ์œผ๋‚˜, LSRAG ๋งŒ์œผ๋กœ๋Š” ์„ฑ๋Šฅ์ด ๋ถ€์กฑํ–ˆ๊ธฐ์— SSC๋ฅผ ์ถ”๊ฐ€๋กœ ํ™œ์šฉํ•˜์—ฌ ํ•™์Šต ์ด‰์ง„๋ฐ low-frequency ์ •๋ณด๋ฅผ ํ†ต๊ณผ์‹œํ‚ค๋„๋ก ํ•˜์˜€๋‹ค.

g-th LSRAG(H_g): 4_gthLSRAG

g-th LSRAG, m-th residual block: 5_gthLSRAGmthresidualblock

Local source skip connection: 6_Localsourceskipconnection

RL-NL: non-local NN์€ high-level task์—์„œ ์ „์ฒด image์˜ long-range dependency๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ• ์ œ์‹œํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ global-level non-local operation์€ ๊ณผ๋„ํ•œ ์—ฐ์‚ฐ๋Ÿ‰ ๋“ฑ์˜ ๋ฌธ์ œ์ ์ด ์žˆ์œผ๋ฏ€๋กœ, ์ด๋ฅผ global-level์ด ์•„๋‹Œ region-level๋กœ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์ด RL-NL์ด๋‹ค.

์œ„์™€ ๊ฐ™์ด ๊ตฌ์„ฑ๋œ NLRG๋Š” ๋งค์šฐ ๊นŠ์€ depth ์™€ receptive field๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”๋ฐ, ์ด๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ ์š”์•ฝํ•  ์ˆ˜ ์žˆ๋‹ค.

7_NLRGeq

- Up-scale module

์œ„์˜ ๊ณผ์ •์œผ๋กœ๋ถ€ํ„ฐ ์–ป์€ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ Up-scale์„ ์ง„ํ–‰ํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค. ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์„ ํƒ์ง€๊ฐ€ ์กด์žฌํ•˜๋ฏ€๋กœ, complexity์™€ performance ๊ฐ„์˜ trade-off๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์„ ํƒํ•ด์•ผํ•œ๋‹ค.

8_Upscalemodule

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ตœ๊ทผ CNN-based SR์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ•์ธ pixel shuffle method๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

- Reconstruction

๋‹จ์ผ convolution layer์„ ์ด์šฉํ•ด feature์„ SR image๋กœ mappingํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค.

9_Reconstruction

์ด ๋•Œ, Loss function(L1 loss)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

10_lossfunc

4. Experiment & Result

Experiment

Setup

  • SSRG ๋‚ด๋ถ€ LSRAG ๊ฐœ์ˆ˜ G = 20

  • LSRAG ๋‚ด๋ถ€ residual block ๊ฐœ์ˆ˜ M = 10 : SOCA module (reduction ratio 16์ธ 1x1 convolution filter) + convolution filters (3x3 64 channel filter)

  • Up-scale module: pixel shuffle method

  • Training set: DIV2K

Result

- Zoom visual from Urban 100

11_zoomvisualurban100

๋ณธ ๋…ผ๋ฌธ์˜ ๋ชจ๋ธ์ธ (h)๊ฐ€ ์ด ์ค‘ (a) HR ์ด๋ฏธ์ง€์™€ ๊ฐ€์žฅ ์œ ์‚ฌํ•˜๋‹ค๋Š” ๊ฒƒ์—์„œ, ๋‹ค๋ฅธ SR ๋ชจ๋ธ (b)~(g)์™€ ๋น„๊ตํ•˜์—ฌ visual quality ๋ฐ image detail์ด ์ข‹๋‹ค๋Š” ๊ฒƒ์ด ํ™•์ธ ๊ฐ€๋Šฅํ•˜๋‹ค.

- Urban 100

12_urban100

์œ„ figure์€ Visual comparision for 4x SR with BI model on Urban100 dataset ์ด๋‹ค. ๊ฐ ๊ฒฝ์šฐ์˜ ์ฒซ๋ฒˆ์งธ ์‚ฌ์ง„์ด HR(original)์ด๊ณ , 10๋ฒˆ์งธ ์‚ฌ์ง„์ด ๋ณธ ๋…ผ๋ฌธ์˜ SAN์„ ์ ์šฉํ•œ ๊ฒฐ๊ณผ, ๊ทธ๋ฆฌ๊ณ  9๋ฒˆ์งธ ์‚ฌ์ง„์ด ๊ธฐ์กด ์—ฐ๊ตฌ ์ค‘ ๊ฐ€์žฅ SAN๊ณผ ์œ ์‚ฌํ•œ ์›๋ฆฌ๋ฅผ ๊ฐ–๋Š” RCAN method ์ด๋‹ค.

์œ„ figure์˜ ๋‘ ์ผ€์ด์Šค๋ฅผ ํ†ตํ•ด, SAN์€ ๊ธฐ์กด ์—ฐ๊ตฌ์™€ ๋น„๊ตํ•˜์—ฌ ์œ ์˜๋ฏธํ•œ visual quality์˜ ์ƒ์Šน์„ ๊ฐ€์ ธ์™”์Œ์ด ํ™•์ธ ๊ฐ€๋Šฅํ•˜๋‹ค.

5. Conclusion

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ณด๋‹ค ์ •ํ™•ํ•œ SR์„ ์œ„ํ•ด SAN์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์—ฌ๊ธฐ์„œ NLRG structure์„ ํ™œ์šฉํ•œ SAN์€ ๋„คํŠธ์›Œํฌ์— long-distance dependencies & structural information๋ฅผ ์บก์ฒ˜ํ•˜์˜€๋Š”๋ฐ, ์ด NLRG์— ์ถ”๊ฐ€๋กœ SSC๋ฅผ ํ™œ์šฉํ•˜์—ฌ low-frequency ์ •๋ณด๋ฅผ ํ†ต๊ณผ์‹œ์ผœ ๋Ÿฌ๋‹ ํšจ๊ณผ๋ฅผ ์ƒ์Šน์‹œ์ผฐ๋‹ค.

์ถ”๊ฐ€๋กœ, ๋…ผ๋ฌธ์—์„œ๋Š” ๋ณด๋‹ค discriminative representations๋ฅผ ์œ„ํ•ด global covariance pooling์„ ํ†ตํ•ด feature interdependencies๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด SOCA module์„ ์ œ์•ˆํ•œ๋‹ค.

์ด๋ฅผ BI & BD degradation models์— ์‹คํ—˜ํ•ด๋ณธ ๊ฒฐ๊ณผ, SAN์€ SR์— ๋Œ€ํ•ด quantative/visual ์ ์œผ๋กœ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ƒˆ์Œ์„ ํ™•์ธ ๊ฐ€๋Šฅํ–ˆ๋‹ค.

Take home message (์˜ค๋Š˜์˜ ๊ตํ›ˆ)

Attention mechanism ์— ๋Œ€ํ•œ ์ดํ•ด ๋ฐ ๋ฐฐ๊ฒฝ ์ง€์‹์„ ๋Š˜๋ฆด ์ˆ˜ ์žˆ๋Š” ์ข‹์€ ๊ธฐํšŒ์˜€๋‹ค. SR ์ชฝ์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ์—ˆ๊ธฐ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” mechanism์ด ๋‹ค์–‘ํ•ด์ง„ ๊ฒƒ ๊ฐ™๋‹ค. ์—ฐ๊ด€ ๋ถ„์•ผ์— ๋Œ€ํ•œ ๊ณต๋ถ€๊ฐ€ ํ˜„์žฌ ์—ฐ๊ตฌ์— ํฌ๊ฒŒ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ์‚ฌ์‹ค์„ ๋А๊ผˆ๋‹ค.

Author / Reviewer information

Author

์–‘์Šนํ›ˆ (Seunghoon Yang)

  • KAIST Mechanical Engineering

  • https://github.com/SeunghoonYang

  • shyang9512@kaist.ac.kr

Reviewer

  1. Korean name (English name): Affiliation / Contact information

  2. Korean name (English name): Affiliation / Contact information

  3. ...

Reference & Additional materials

  1. T. Dai, J. Cai, Y. Zhang, S. Xia and L. Zhang, "Second-Order Attention Network for Single Image Super-Resolution," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 11057-11066, doi: 10.1109/CVPR.2019.01132.

  2. https://github.com/daitao/SAN.git

  3. Ding Liu, Bihan Wen, Yuchen Fan, Chen Change Loy, and Thomas S Huang. Non-local recurrent network for image restoration. In NIPS, 2018.

  4. Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. Residual dense network for image super-resolution. In CVPR, 2018.

  5. Zhang, Yulun, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong and Yun Raymond Fu. โ€œImage Super-Resolution Using Very Deep Residual Channel Attention Networks.โ€ ECCV (2018).

Last updated