Local Implicit Image Function [Kor]

Chen et al. / Learning Continuous Image Representation with Local Implicit Image Function / CVPR 2021

English version of this article is available.

๐Ÿ“‘ 1. Problem Definition

Image as a Function

Image Representation์— ๋Œ€ํ•œ ๊ธฐ์ดˆ๋Š” ์ด๋ฏธ์ง€๋ฅผ ํ•จ์ˆ˜๋กœ ๋‚˜ํƒ€๋‚ด๋Š”๋ฐ์„œ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ํ•จ์ˆ˜๋Š” ์ž…๋ ฅ์„ ๋„ฃ์œผ๋ฉด ๋ฌด์–ธ๊ฐ€ ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ด์ฃผ๋Š” ๊ฑฐ์ฃ . XX์— ๋”ฐ๋ผ์„œ YY์˜ ๊ฐ’์ด ๋ฐ”๋€Œ๋Š”๋ฐ, Figure 1์˜ ๋‹คํ•ญํ•จ์ˆ˜, ์ง€์ˆ˜ํ•จ์ˆ˜, ์‚ผ๊ฐํ•จ์ˆ˜์ฒ˜๋Ÿผ ์‰ฌ์šธ ์ˆ˜๋„ ์žˆ๊ณ , ์•„๋‹ˆ๋ฉด Figure 2 ์ฒ˜๋Ÿผ ๋ฌด์ง€ ๋ณต์žกํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

Figure 1
Figure 2

๋‹จ์ˆœํ•œ ํ˜•ํƒœ์˜ ํ•จ์ˆ˜๋Š” ํ•จ์ˆ˜์‹์„ ์œ ์ถ”ํ•˜๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค.

์ด๋ฏธ์ง€์ฒ˜๋Ÿผ ๊ฐ ํ”ฝ์„น ์œ„์น˜์— ๋Œ€ํ•ด์„œ RGB๊ฐ’์ด ๋‹ค์–‘ํ•œ ๊ฒฝ์šฐ, ์œ„์น˜๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, R,G,B๋ฅผ ๋งตํ•‘ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ฐพ๋Š” ๊ฒƒ์€ ์–ด๋ ค์Šต๋‹ˆ๋‹ค.

Image -> Function : ์ด๋ฏธ์ง€๋Š” ํ”ฝ์…€ (x,y)(x,y) ์ขŒํ‘œ์— ๋Œ€ํ•ด์„œ RGB ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Figure 2 ์ฒ˜๋Ÿผ ์ด๋ฏธ์ง€ ํ•จ์ˆ˜๋Š” ํ•œ๋ˆˆ์— ๋ด๋„ ๊ต‰์žฅํžˆ ๋ณต์žกํ•˜๊ณ , ์—ฌ๊ธฐ์— ๋งž๋Š” ๋‹คํ•ญํ•จ์ˆ˜๋‚˜ Sine,CosiseSine, Cosise ํ•จ์ˆ˜๋ฅผ ์ฐพ๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅ์— ๊ฐ€๊น์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ์ด๋ฏธ์ง€์˜ ๊ฐ’์„ ๋Œ€์‘์‹œํ‚ค๋Š” ํ•จ์ˆ˜๋ฅผ ์ฐพ๋Š” ๊ฒƒ์€ ๊ฒฐ์ฝ” ์‰ฌ์šด ๊ฒŒ ์•„๋‹ˆ๊ณ , ์ด๋ฅผ ์ธ๊ณต์‹ ๊ฒฝ๋ง์œผ๋กœ ํ•™์Šตํ•˜๋ ค๋Š” ์‹œ๋„๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ถ„์•ผ๋ฅผ Neural Implicit Represenation (NIR) ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

Why we need NIR??

๊ตณ์ด ์ด๋ฏธ์ง€๋ฅผ ํ•จ์ˆ˜๋กœ ํ•™์Šต์‹œํ‚ค๋Š” ๋ชฉ์ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด 2๊ฐ€์ง€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. Neural Network์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์ฆˆ๋ณด๋‹ค ์ž‘๋‹ค๋ฉด ๋ฐ์ดํ„ฐ ์••์ถ•ํšจ๊ณผ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

  2. ์ด๋ฏธ์ง€๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ Discrete (Pixel 1, Pixel 2, ...) ์ธ๋ฐ, ์—ฐ์†์ ์ธ ํ•จ์ˆ˜๋กœ ๋‚˜ํƒ€๋ƒ„์œผ๋กœ์จ ํ”ฝ์…€ ์‚ฌ์ด์˜ ์œ„์น˜์— ๋Œ€ํ•œ RGB๋ฅผ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (โญ)

ํฌ์ŠคํŒ…์—์„œ ์†Œ๊ฐœํ•˜๋Š” ๋…ผ๋ฌธ๋„ CVPR 2021์— ์ถœํŒ๋œ NIR ๊ด€๋ จ ๋…ผ๋ฌธ์œผ๋กœ (โญ) ๋‘ ๋ฒˆ์งธ ๋ชฉ์  (Continuous Representation)์„ ๋‹ค๋ค˜์Šต๋‹ˆ๋‹ค.๋ณธ ํฌ์ŠคํŒ…์€ ๋…ผ๋ฌธ์˜ ๋‘ ๊ฐ€์ง€ contribution์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

  • Discrete Image๋ฅผ Continuous ํ•œ ์ด๋ฏธ์ง€ ํ‘œํ˜„์œผ๋กœ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•

  • Continuous Representation์„ ํ†ตํ•ด ๋” ๋†’์€ Resolution์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•

๐Ÿ“‘ 2. Local Implicit Image Function (LIIF)

Definition

ํ”ฝ์…€ xx ์— ๋Œ€ํ•ด์„œ RGB ๊ฐ’์„ ์œ ์ถ”ํ•˜๋Š” ํ•จ์ˆ˜๋Š” s=fฮธ(x)s = f_\theta (x) ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ ํ”ฝ์…€ ์œ„์น˜ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ RGB๊ฐ’(ํ˜น์€ Grey scale)์„ ์œ ์ถ”ํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆํ•œ ๋ชจ๋ธ์ธ Local Implicit Image Function(LIIF) ๋Š” Latent Code๋ฅผ ์ด์šฉํ•˜์—ฌ Image ์— ๋Œ€ํ•œ ์ •๋ณด MโˆˆRHร—Wร—DM \in \mathbb{R}^{H\times W \times D} ๊ฐ€ ์žˆ์„ ๋•Œ, ์ด๋ฅผ Continuous image II ๋กœ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์œ„์น˜ ์ •๋ณด xx ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, Latent Code์—๋„ ์˜์กด์‹œํ‚จ LIIF์˜ ๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

s=fฮธ(z,x)s = f_\theta (z,x)

Latent Code for continuous position

Latent Code๋Š” [0,H]ร—[0,W][0,H]\times [0,W] ์ด๋ฏธ์ง€๊ฐ€ ์žˆ์„ ๋•Œ, ๊ฐ ํ”ฝ์…€๋งˆ๋‹ค Latent Code๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ „์ฒด Hร—WH \times W ๊ฐœ์˜ Latent Code๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฆ„์ด Latent Code์ธ ์ด์œ ๋Š”, Hร—WH\times W ๊ฐ€ Low Resolution ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ํฌ๊ธฐ์ด๊ธฐ ๋•Œ๋ฌธ์—, ์›๋ž˜ ์ด๋ฏธ์ง€๋ณด๋‹ค ํ”ฝ์…€ ์ˆ˜๊ฐ€ ์ ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ด๋กœ๋ถ€ํ„ฐ ์›ํ•˜๋Š” continuous ์œ„์น˜ xx ๊ฐ€ ์žˆ์„ ๋•Œ, ๊ฐ€๊นŒ์šด Latent code๋ฅผ ์„ ํƒํ•ด์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค. Figure 4์—์„œ๋Š” xx ์œ„์น˜์— ๋Œ€ํ•ด์„œ 1๊ฐœ๊ฐ€ ์•„๋‹Œ 4 ๊ฐœ์˜ Latent Code๋ฅผ ์„ ํƒํ•˜์˜€๋Š”๋ฐ, ์ด๋ฅผ ๋…ผ๋ฌธ์—์„œ๋Š” Local ensemble์ด๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ์ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๋Š” 4.3 ์—์„œ ๋‹ค๋ฃจ๊ฒ ์Šต๋‹ˆ๋‹ค.

Figure 3
Figure 4

์ „์ฒด 4x4 Pixel์ด ์žˆ์„ ๋•Œ, Latent Code๋Š” 4x4 ๊ฐœ๊ฐ€ ๊ฐ ์œ„์น˜๋ณ„๋กœ ๊ณ ๋ฅด๊ฒŒ ๋ถ„ํฌ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

continuous ํ•œ ์œ„์น˜ xx ์— ๋Œ€ํ•ด์„œ zโˆ—z^* ๋Š” xx ์—์„œ ๊ฐ€๊นŒ์šด 4๊ฐœ์˜ Latent Code๋กœ ์ •ํ•ด์ง‘๋‹ˆ๋‹ค.

๐Ÿง Latent code๊ฐ’์— ๋Œ€ํ•œ ๋ช‡ ๊ฐ€์ง€ ์˜๋ฌธ์ ์„ ์ง‘๊ณ  ๋„˜์–ด๊ฐ€๊ฒ ์Šต๋‹ˆ๋‹ค.

Q1. Latent Code๊ฐ’(ํ˜น์€ ์ดˆ๊ธฐ๊ฐ’)์€ ๋ฌด์—‡์ธ๊ฐ€?

A1. Pretrained Encoder(EDSR ํ˜น์€ RDN)๋กœ ์ด๋ฏธ์ง€๋ฅผ ์ธ์ฝ”๋”ฉ ํ›„ ์–ป์€ Feature Vector

Q2. ์—ฌ๋Ÿฌ ์ด๋ฏธ์ง€๊ฐ€ ์žˆ์„ ๋•Œ, Latent Code๋Š” ๊ณต์œ ๋˜๋Š”๊ฐ€?

A2. (No) Pretrained Model๋กœ๋ถ€ํ„ฐ ์ด๋ฏธ์ง€๋ฅผ ์ธ์ฝ”๋”ฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฏธ์ง€๋งˆ๋‹ค Latent Code๊ฐ€ ์ƒ๊ธด๋‹ค.

Q3. LIIF Training ์‹œ Latent Code๋Š” ๋ณ€ํ•˜๋Š”๊ฐ€?

A3. (Yes), Freezing ํ•˜์ง€ ์•Š๋Š”๋‹ค.

Continuous Representation using Latent Code

์ด๋ฏธ์ง€์— ๋Œ€ํ•œ Latent Code๊ฐ€ ๊ณ ์ •๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ Continuous Image์˜ xx ์ขŒํ‘œ์— ๋Œ€ํ•œ RGB ๊ฐ’์€ Latent Code์˜ ์œ„์น˜ vโˆ—v* ์™€ xx์˜ ์ฐจ์ด๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋„ฃ์–ด์„œ ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค. Latent code์™€ ์ƒ๋Œ€์œ„์น˜๋ฅผ ๋„ฃ๋Š” continous representation์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

I(x)=โˆ‘tโˆˆ{00,01,10,11}StSโ‹…fฮธ(ztโˆ—,xโˆ’vtโˆ—)I(x) = \sum_{t \in \{ 00, 01,10,11 \}} \frac{S_t}{S} \cdot f_\theta (z_t^*, x - v_t^*)

์ž…๋ ฅ์œผ๋กœ Latent Code์™€์˜ ๊ฑฐ๋ฆฌ ์ฐจ์ด๊ฐ€ ์ฃผ์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์—, continuous ํ•œ ๊ฑฐ๋ฆฌ ์ฐจ์ด๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋„ฃ๊ฒŒ ๋œ๋‹ค๋ฉด, ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ continuous representation ์„ ์–ป๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. Figure 5 ์—์„œ ๋‚˜ํƒ€๋‚˜๋“ฏ์ด, ์—ฐ์†์ ์ธ xx๋“ค์„ ๋‹ค์–‘ํ•˜๊ฒŒ ์„ ํƒํ•  ์ˆ˜ ์žˆ๊ณ , continousํ•œ ์ƒ๋Œ€์œ„์น˜ xโˆ’vtโˆ—x - v_t^* ๊ฐ’์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค.

Figure 5

๐Ÿ“‘ 3. Pipeline

์œ„์—์„œ Latent Code์™€ LIIF ํ•จ์ˆ˜์˜ ์˜๋ฏธ๋ฅผ ์‚ดํŽด๋ดค์Šต๋‹ˆ๋‹ค. ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ํ•ด๋‹น ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ ์ €์ž๋Š” Self-Supervised Learning ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์™€ ํ•™์Šต ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  1. โœ”๏ธ Data Preparation ๋‹จ๊ณ„

  2. โœ”๏ธ Training ๋‹จ๊ณ„

Data Preparation

Data Preparation์—์„œ๋Š” Down-sampling๋œ ์ด๋ฏธ์ง€(์ด๋ฏธ์ง€์˜ ํ”ฝ์…€ ์ˆ˜ ๊ฐ์†Œ)์™€ ์˜ˆ์ธกํ•  pixel ์œ„์น˜ xhrx_{hr} ์™€ RGB ๊ฐ’ shrs_{hr} ์„ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค. Figure 6 ์— ๋‚˜ํƒ€๋‚˜์žˆ๋“ฏ์ด, ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€๋ฅผ Down-samplingํ•˜์—ฌ ํฌ๊ธฐ๋ฅผ ์ค„์ด๊ณ  ์ด ์ •๋ณด๋กœ๋ถ€ํ„ฐ ์‚ฌ์ด์ฆˆ๊ฐ€ ํฐ ์›๋ž˜ ์ด๋ฏธ์ง€์˜ ํ”ฝ์…€์— ๋Œ€ํ•œ RGB๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, Higer resolution์„ ํƒ€๊ฒŸํŒ…ํ•˜์—ฌ ํ•™์Šตํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ํ”ฝ์…€์— ๋Œ€ํ•œ ๋‚ด์šฉ์€ ๋ฐ‘์—์„œ ์กฐ๊ธˆ ๋” ์ž์„ธํžˆ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

Figure 6

Training

ํ•™์Šตํ•  ๋•Œ๋Š” Down-sampling๋œ ์ด๋ฏธ์ง€(48ร—4848\times48)๋ฅผ pretrained encoder์— ๋„ฃ์–ด์„œ feature vector๋ฅผ ๋ฝ‘์•„์ค๋‹ˆ๋‹ค. ์ด ๊ฐ’์ด Latent Code ์—ญํ• ์„ ํ•˜๋ฉฐ, pretrained encoder๋Š” ์ด๋ฏธ์ง€์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•ด์ค๋‹ˆ๋‹ค. Figure 7 ์ฒ˜๋Ÿผ Data Preparation ๋‹จ๊ณ„์—์„œ ์–ป์€ xhrx_{hr} ๊ณผ Latent Code๋“ค์„ LIIF model์— ๋„ฃ์–ด์คŒ์œผ๋กœ์จ, ์›ํ•˜๋Š” RGB๊ฐ’ shrs_{hr}์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ ์‹ค์ œ ๊ฐ’๊ณผ L1L1 Loss๋กœ ๊ณ„์‚ฐํ•ด์ฃผ๋ฉด ํ•™์Šต์ด ๋ฉ๋‹ˆ๋‹ค.

Figure 7

๐Ÿง input์˜ ํฌ๊ธฐ๋Š” 48x48 ์ธ๋ฐ, 224x224 ๋ฅผ ์–ป๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ผ๊นŒ?

๐Ÿ“‘ 4. Additional Engineering

LIIF ๋ฐฉ๋ฒ•์— ์ถ”๊ฐ€์ ์ธ ๋ฐฉ๋ฒ•๋“ค์„ ํ†ตํ•ด์„œ ์„ฑ๋Šฅ์„ ์˜ฌ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” ์ด 3๊ฐœ์˜ ๋ฐฉ๋ฒ•์ด ์ œ์•ˆ๋˜๋ฉฐ, ์…‹๋‹ค ์‚ฌ์šฉํ–ˆ์„ ๋•Œ, ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

  1. โœ”๏ธ Featuer Unfolding : Latent Code๋ฅผ ์ฃผ๋ณ€ 3x3 Latent Code ์™€ Concatenation

  2. โœ”๏ธ Local Ensemble : continuous position xx์— ๋Œ€ํ•ด์„œ 4๊ฐœ์˜ Latetn Code ์„ ํƒ

  3. โœ”๏ธ Cell Decoding : RGB๋ฅผ Predictionํ•  ๋•Œ, ์›ํ•˜๋Š” cell size ์ถ”๊ฐ€.

Feature Unfolding

Encoder๋กœ๋ถ€ํ„ฐ ๋‚˜์˜จ Feature (Latent Code)์— ๋Œ€ํ•ด์„œ, ์ฃผ๋ณ€ 3x3์— ๋Œ€ํ•œ concatenation์„ ํ•จ์œผ๋กœ์จ, ์œ„์น˜์— ๋Œ€ํ•œ ํ‘œํ˜„๋ ฅ์„ ๋†’์ž…๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ, input์˜ dimesion์— ๋Œ€ํ•œ size๋Š” 9๋ฐฐ ์ฆ๊ฐ€ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

M^jk=Concat({Mj+l,k+m}l,mโˆˆ{โˆ’1,0,1})\hat{M}_{jk} = Concat(\{ M_{j+l, k+m} \}_{l,m \in \{-1,0,1\}})

Local Ensemble

๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜์œผ๋กœ Latent Code๋ฅผ ์„ ํƒํ•˜๋Š”๋ฐ ํ•œ ๊ฐ€์ง€ ๋ฌธ์ œ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜์œผ๋กœ ์„ ํƒํ•˜๊ฒŒ ๋œ๋‹ค๋ฉด Figure 8 ์ฒ˜๋Ÿผ ๋‹ค์Œ Latent Code๋กœ ๋„˜์–ด๊ฐ€๊ฒŒ ๋˜๋Š” ์‹œ์ ์—์„œ ๋‘ xx ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ๊ต‰์žฅํžˆ ๊ฐ€๊นŒ์šธ์ง€๋ผ๋„ Latent Code๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ Figure 9์ฒ˜๋Ÿผ ์ฃผ๋ณ€ 4๊ฐœ์˜ Latent Codes๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

Figure 8
Figure 9

๋งŒ์ผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด Latent Code ํ•˜๋‚˜๋งŒ ๊ณ ๋ฅธ๋‹ค๋ฉด, ๋ฒ”์œ„๋ฅผ ๋„˜์–ด๊ฐ€๋ฉด์„œ Latent Code๊ฐ€ ๊ธ‰๋ณ€ํ•˜๋Š” ํ˜„์ƒ์ด ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.

์‚ฌ๋ถ„๋ฉด์— ๋Œ€ํ•ด์„œ ๊ฐ€๊นŒ์šด 4๊ฐœ๋ฅผ ๊ณ ๋ฅธ๋‹ค๋ฉด ์„ ํƒ์— ๋Œ€ํ•œ ๋ฒ”์œ„๋ฅผ ๋„˜์–ด๊ฐˆ ๋•Œ ์ ˆ๋ฐ˜๋งŒ ๋ฐ”๋€Œ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์™ผ์ชฝ xx ์— ๋Œ€ํ•ด์„œ๋Š” ๊ฐ€๊นŒ์šด ์œ„์น˜์— ์žˆ๋Š” z12,z13,z22,z23z_{12}, z_{13}, z_{22}, z_{23} Latent Code๊ฐ€ ์„ ํƒ๋˜๋ฉฐ, ์˜ค๋ฅธ์ชฝ x์— ๋Œ€ํ•ด์„œ๋Š” ๊ฐ€๊นŒ์šด ์œ„์น˜์— ์žˆ๋Š” z13,z14,z23,z24z_{13}, z_{14}, z_{23}, z_{24}๊ฐ€ ์„ ํƒ๋ฉ๋‹ˆ๋‹ค.

Cell Decoding

LIIF ๋ชจ๋ธ์€ ์œ„์น˜์— ๋Œ€ํ•œ ์ •๋ณด์™€ ๊ทผ์ฒ˜ Latent Code์˜ ์ •๋ณด๋ฅผ ์ค๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์šฐ๋ฆฌ๊ฐ€ ์–ด๋А ์ •๋„์˜ Resolution์„ ๋ชฉํ‘œ๋กœ ํ•˜๋Š”์ง€ ์•Œ๋ ค์ฃผ์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด์„œ, 48ร—4848\times 48 ์—์„œ 224ร—224224 \times 224 ๋กœ ํ•ด์ƒ๋„๋ฅผ ๋†’์ผ ๋•Œ, ์ขŒํ‘œ์— ๋Œ€ํ•œ ์ •๋ณด๋Š” ์ฃผ์ง€๋งŒ, ์šฐ๋ฆฌ๊ฐ€ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” Decoding Cell์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ์ฃผ์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ์‹œ์—์„œ๋Š” ํ•ด๋‹น ์œ„์น˜๋กœ๋ถ€ํ„ฐ 2ร—22\times2์˜ ํฌ๊ธฐ๋ฅผ ์›ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๋ ค์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค. Cell Decoding์„ ํฌํ•จํ•œ LIIF ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด Pixcel๊ฐ’์— Cell ํฌ๊ธฐ๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ๋ถ™์—ฌ์„œ ์ž…๋ ฅ์œผ๋กœ ๋„ฃ์–ด์ค๋‹ˆ๋‹ค.

s=fcell(z,[x,c])s = f_{cell} (z, [x,c])

๐Ÿ“‘ 5. Experiments

High Resolution Benchmark

Figure 10 ์€ High Resolution Benchmark์ธ DIV2K ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด์„œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ Row Group์€ EDSR ์ธ์ฝ”๋”๋ฅผ, ๋‘ ๋ฒˆ์งธ Row Group์€ RDN ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

  • EDSR ์„ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ, ๋‹ค๋ฅธ High Resolution ๋ฐฉ์‹๋“ค๋ณด๋‹ค ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ Out-of-distribution์— ๋Œ€ํ•ด์„œ๋Š” ์ œ์•ˆ๋œ ๋ชจ๋ธ์ด ๋”์šฑ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. ์ด๋Š” x1~x4๋ฐฐ๋กœ high resolution์„ ๋งŒ๋“ค๋„๋ก ํ•™์Šตํ•œ ๋ชจ๋ธ์— ๋” ๋†’์€ resoltuion์„ ์š”๊ตฌํ•œ ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค. LIIF๋ชจ๋ธ์ด ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒƒ์€ Latent code์˜ ์ƒ๋Œ€ ์œ„์น˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์˜ˆ์ธกํ•˜๊ธฐ ๋•Œ๋ฌธ์œผ๋กœ ์ถ”์ธกํ•ฉ๋‹ˆ๋‹ค.

  • RDN ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ๋Š”, in-distribution์— ๋Œ€ํ•ด์„œ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด์ง€๋งŒ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ out-of-distribution์— ๋Œ€ํ•ด์„œ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

๐Ÿ’ก ๊ฒฐ๊ณผ์ ์œผ๋กœ LIIF ๋ชจ๋ธ์€ ๋” ๋†’์€ resolution์„ ์š”๊ตฌํ•˜๋Š” ๊ฒฝ์šฐ, ๋‹ค๋ฅธ ๋ชจ๋ธ์— ๋น„ํ•ด์„œ ์›”๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Figure 10

๐Ÿง Difference between RDN and EDSR

RDN์€ Residual Deep Network๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ EDSR์€ Enhanced Deep Residual Networks์œผ๋กœ RDN ์ดํ›„ ๊ฐœ๋ฐœ๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋‘˜ ๋‹ค Low Resolution์œผ๋กœ๋ถ€ํ„ฐ High Resolution์„ ํƒ€๊ฒŸํŒ…ํ•˜๋Š” CNN + Upsampling ๊ตฌ์กฐ์ธ ๊ฒƒ์€ ๋™์ผ์ง€ํžˆ๋งŒ, EDSR์€ Batch-Normalizaiton์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์œผ๋ฉฐ, ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ RDN๋ณด๋‹ค ์ ์œผ๋ฉด์„œ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. High Resolution์„ ์œ„ํ•ด, ์ด๋ฏธ์ง€๋ฅผ ์ธ์ฝ”๋”ฉํ•˜๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

Continuous Representation

Continuous Representation์„ ์ž˜ ํ•™์Šตํ–ˆ๋‹ค๋ฉด ์ด๋ฏธ์ง€๋ฅผ ํ™•๋Œ€ํ–ˆ์„ ๋•Œ๋„ ๋Š๊ธฐ์ง€ ์•Š๊ณ  ์ œ๋Œ€๋กœ ๋ณด์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ NIR์ด๋‚˜ High resolution ๋ชจ๋ธ๋“ค๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ, LIIF์˜ ์ด๋ฏธ์ง€๋Š” ๋”์šฑ ์—ฐ์†์ ์ธ ํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋‚˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์ด ์•ฝ๊ฐ„์˜ ๋Š๊ธฐ๋Š” ํ˜„์ƒ์ด๋‚˜, Blur ํšจ๊ณผ๊ฐ€ ์žˆ๋Š” ๋ฐ˜๋ฉด, LIIF ๋ชจ๋ธ์€ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๊ฐ€ ๊ต‰์žฅํžˆ ๋ถ€๋“œ๋Ÿฌ์šด ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Figure 11

๐Ÿ“‘ 6. Conclusion

์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฐ์†์ ์ธ ์ด๋ฏธ์ง€ ํ‘œํ˜„์„ ์œ„ํ•œ Local Implicit Image Function(f(z,xโˆ’v)f(z, x-v))์„ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค. Latent code์˜ ์œ„์น˜์—์„œ ํŠน์ • ์œ„์น˜๊นŒ์ง€ ๋–จ์–ด์ง„ ์ ์˜ RGB ๊ฐ’์„ ์œ ์ถ”ํ•จ์œผ๋กœ์จ continuous image representation์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด๋ฏธ์ง€ ๊ฐœ๋ณ„์ด ์•„๋‹Œ, ์ด๋ฏธ์ง€๋ฅผ pre-trained encoder๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ feature vector๋ฅผ latent code์˜ ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ, ๋‹ค์–‘ํ•œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ์ ์šฉ๊ฐ€๋Šฅํ•œ Training ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ด๋ฏธ์ง€๋Š” ํ”ฝ์…€ ์œ„์น˜์— ๋Œ€ํ•ด์„œ RGB ๊ฐ’์„ ๊ฐ€์ง€๊ธฐ ๋–„๋ฌธ์—, ๋„ˆ๋ฌด ํฐ ์ด๋ฏธ์ง€๋Š” ๋ฐ์ดํ„ฐ์˜ ์šฉ๋Ÿ‰์— ๋Œ€ํ•œ ์ด์Šˆ๋กœ ์ €์žฅํ•˜๊ธฐ ์–ด๋ ค์šด ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋งŒ์ผ NIR์ด ๋”์šฑ ๋ฐœ๋‹ฌํ•˜์—ฌ, ํ›จ์”ฌ ์ ์€ ๋ชจ๋ธ๋กœ ์ด๋ฏธ์ง€๋ฅผ ์™ธ์šธ ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์ด ์žˆ๋‹ค๋ฉด, ๋ฐ์ดํ„ฐ ์ „์†ก ์‹œ, ์ด๋ฏธ์ง€๋ฅผ ๋ณด๋‚ด๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, Neural Network๋ฅผ ๋ณด๋‚ด๋Š” ๊ฒƒ๋„ ํ–ฅํ›„์—๋Š” ๊ฐ€๋Šฅํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Take Home Message

๋ณดํ†ต Implicit Neural Represenation์€ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๋ฐ”๋กœ ํ•จ์ˆ˜๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ์„ ๋•Œ๋งˆ๋‹ค ํ•จ์ˆ˜๋ฅผ ์ƒˆ๋กœ ํ•™์Šตํ•ด์•ผ ํ•˜์ฃ . ๋”ฅ๋Ÿฌ๋‹์„ ์ด์šฉํ•˜๋ฉด, ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ Feature Vector๋ฅผ ๋ฝ‘์„ ์ˆ˜ ์žˆ๊ธฐ์—, Feature Vector๋ฅผ input์œผ๋กœ ์ผ๋ฐ˜ํ™”์‹œ์ผœ์„œ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•œ ๊ฒƒ์„ ์ด ๋…ผ๋ฌธ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ Continuous Domain์„ Feature๋กœ๋ถ€ํ„ฐ ๊ฑฐ๋ฆฌ๋กœ ํ•ด์„ํ•œ ๊ฒƒ๋„ ์ข‹์€ ์ ‘๊ทผ๋ฒ•์ž…๋‹ˆ๋‹ค.

๐Ÿ“‘ Author / Reviewer information

Author

  1. ๋ฐ•๋ฒ”์ง„ (Bumjin Park): KAIST / bumjin@kaist.ac.kr

Reviewer

  • None

๐Ÿ“ฐ References & Additional materials

Last updated

Was this helpful?