Supervised Contrastive Replay [Kor]

Mai, Zheda / Supervised contrastive replay- Revisiting the nearest class mean classifier in online class-incremental continual learning / CVPR 2021

Supervised Contrastive Replay: Revisiting the Nearest Class Mean Classifier in Online Class-Incremental Continual Learning[Kor]

1. Introduction

Continaul Learning (CL)

CL์ด๋ž€, ์—ฐ์†์ ์œผ๋กœ ์ฃผ์–ด์ง€๋Š” Data Stream์„ Input์œผ๋กœ ๋ฐ›์•„, ์—ฐ์†์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ๋ฌธ์ œ ์„ธํŒ…์ž…๋‹ˆ๋‹ค. ํ˜„์žฌ ๋”ฅ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ๋“ค์€, ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์…‹์„ ํ•™์Šตํ•  ๊ฒฝ์šฐ ์ด์ „ ๋ฐ์ดํ„ฐ์…‹์—์„œ์˜ ์„ฑ๋Šฅ์€ ๋งค์šฐ ๋–จ์–ด์ง‘๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ˜„์ƒ์„ Catastrophic Forgetting(CF)๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์„ค๋ช…ํ•˜์ž๋ฉด, Cifar10์„ ํ•™์Šตํ•œ ๋ชจ๋ธ์ด MNIST๋ฅผ ํ•™์Šตํ•  ๊ฒฝ์šฐ, MNIST์—์„œ์˜ ์„ฑ๋Šฅ์€ ๋†’์ง€๋งŒ, Cifar10์˜ ์„ฑ๋Šฅ์€ ๋‚ฎ์•„์ง‘๋‹ˆ๋‹ค.(๋‹จ์ˆœํžˆ MNIST๋ฅผ ํŠธ๋ ˆ์ด๋‹ ํ•œ ๊ฒฝ์šฐ, ๊ฑฐ์˜ 0%์— ๊ฐ€๊นŒ์šด ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.) ์ด์ €์— Cifar10์—์„œ์˜ ์„ฑ๋Šฅ์ด ์–ด๋•Ÿ๋˜ ๊ฐ„์—, ๊ทน์ ์ธ ์„ฑ๋Šฅ ํ•˜๋ฝ์ด ๋‚˜ํƒ€๋‚˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋•Œ Cifar10๊ณผ MNIST ๊ฐ™์ด ์—ฐ์†์ ์œผ๋กœ ๋“ค์–ด์˜ค๋Š” Dataset๋“ค์„ Task๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

CF๋Š” ๋”ฅ ๋Ÿฌ๋‹์ด ์—ฌ๊ธฐ์ €๊ธฐ์— ์“ฐ์ด๊ณ  ์žˆ๋Š” ๊ณผ์ •์—์„œ ๊ผญ ํ•ด๊ฒฐํ•ด์•ผ ํ•  ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ํ•œ๋ฒˆ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๊ณ  ๋‚œ ํ›„, ๊ทธ ๋ชจ๋ธ์„ ์‹ค์ œ ์„œ๋น„์Šค์— ์„œ๋น™ํ•  ๊ฒฝ์šฐ ๋ฐ์ดํ„ฐ๋Š” ๋” ์Œ“์ด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€๋กœ ํ•™์Šต์‹œํ‚ค๊ฒŒ ๋˜๋ฉด, ๋ชจ๋ธ์€ ์˜คํžˆ๋ ค ์„ฑ๋Šฅ์ด ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ „์— ๋ชจ๋ธ์„ ํŠธ๋ ˆ์ด๋‹ ํ•  ๋•Œ ์‚ฌ์šฉํ–ˆ๋˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ „๋ถ€ ๋‹ค ๋‹ค์‹œ ์‚ฌ์šฉํ•˜๊ณ , ์ถ”๊ฐ€๋กœ ์ถ”๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์–ด์ฃผ์–ด์„œ ํŠธ๋ ˆ์ด๋‹์„ ์‹œ์ผœ์•ผ ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๊ทน์ ์ธ ๊ณ„์‚ฐ ๋น„ํšจ์œจ์„ฑ์„ ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ์ž๋™์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ์•„์„œ ์ ์  ๋˜‘๋˜‘ํ•ด์ง€๋Š”, ์˜ํ™”์™€ ๊ฐ™์€ AI๋Š” ์ง€๊ธˆ ๋‚˜ํƒ€๋‚˜์ง€ ์•Š๋Š” ์ด์œ ์ž…๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ CF๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฌธ์ œ ์„ธํŒ…์ด CL์ž…๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์˜ ์ €์ž Zheda Mai๋Š” CL ๋ถ„์•ผ์—์„œ ์ตœ๊ทผ ์ข‹์€ ๋…ผ๋ฌธ์„ ๋งŽ์ด ๋‚ด๋ฉฐ SOTA์— ๊ฐ€๊นŒ์šด ๋ฐฉ๋ฒ•๋ก ๋“ค์„ ๋งค๋ฒˆ ์ œ์‹œํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Mai์˜ ๋…ผ๋ฌธ ์ค‘์—์„œ๋„ ์ด ๋…ผ๋ฌธ์€, ๋น„๋ก ํŠธ๋ฆญ์„ ์‚ฌ์šฉํ•˜๊ธฐ๋Š” ํ–ˆ์ง€๋งŒ CL๋กœ์„œ๋Š” ์ƒ์ƒ๋„ ํ•˜์ง€ ๋ชปํ–ˆ๋˜ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ๋…ผ๋ฌธ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋‹นํžˆ ๋งค๋ ฅ์ ์ž…๋‹ˆ๋‹ค.

Experience Replay(ER)

CL ๋ฌธ์ œ ์„ธํŒ…์—์„œ ํ˜„์žฌ ์ง€๋ฐฐ์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์€ Experience Replay์ž…๋‹ˆ๋‹ค. ๋‹จ์ˆœํ•œ ๋ฐฉ๋ฒ•์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ , ๊ฐœ์„ ํ•  ์—ฌ์ง€๊ฐ€ ๋ชจ๋“ˆ์ ์œผ๋กœ ๋งŽ์ด ๋‚จ์•„์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋งŽ์ด ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ER์˜ ๋ฐฉ๋ฒ•๋ก ์€ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค. ์ด์ „ ํƒœ์Šคํฌ์—์„œ ๋ช‡๊ฐ€์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฝ‘์•„ External Memory์— ์ €์žฅํ•ด๋‘ก๋‹ˆ๋‹ค. ์ƒˆ๋กœ์šด ํƒœ์Šคํฌ๊ฐ€ ๋“ค์–ด์˜ค๋ฉด External Memory์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ์™€ ํ•จ๊ป˜ ํ›ˆ๋ จ์‹œํ‚ต๋‹ˆ๋‹ค.

๋‹น์—ฐํžˆ External Memory๊ฐ€ ๋งŽ์œผ๋ฉด ๋งŽ์„ ์ˆ˜๋ก ์ด์ „ ํƒœ์Šคํฌ์˜ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์ž˜ ๋ง‰์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ER์˜ ์ตœ์ข… ๋ชฉํ‘œ๋Š” ์ตœ์†Œํ•œ์˜ External Memory๋ฅผ ์ด์šฉํ•ด์„œ ์ตœ๋Œ€ํ•œ CF๋ฅผ ์ค„์ด๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

ER์˜ ํ˜„์žฌ ์ตœ์‹  ์„ธํŒ…์„ ๊ฐ„๋žตํ•˜๊ฒŒ ์ •๋ฆฌํ•˜์ž๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ ์ด ์ค‘์š”ํ•˜๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ํ˜„์žฌ ํƒœ์Šคํฌ์˜ batch 1๊ฐœ + External Memory์—์„œ์˜ batch 1๊ฐœ๋ฅผ ํ•จ๊ป˜ ํŠธ๋ ˆ์ด๋‹ ํ•œ๋‹ค.

  • External Memory์˜ ๊ฒฝ์šฐ ํฌ๊ธฐ๊ฐ€ ๋ณดํ†ต ์ž‘๊ธฐ ๋•Œ๋ฌธ์— ๋‘˜์„ ๊ทธ๋Œ€๋กœ ํ•จ๊ป˜ ํŠธ๋ ˆ์ด๋‹ ํ•ด๋ฒ„๋ฆฌ๋ฉด ๋‘˜์˜ Class Imbalance๊ฐ€ ์ผ์–ด๋‚˜์„œ ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋‘˜์˜ ๋น„์œจ์„ ๋งž์ถฐ์„œ ํŠธ๋ ˆ์ด๋‹ ํ•ด ์ฃผ๋Š” ๊ฒƒ์ด ER์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ํŒ์ž…๋‹ˆ๋‹ค.

2. Method

SoftMax Classifier์˜ CL์—์„œ์˜ ๋ฌธ์ œ์ 

์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ Contribution์ด์ž ์ €์ž๊ฐ€ ์ฃผ์žฅํ•˜๋Š” ๊ฒƒ์€ Softmax Classifier์˜ ๋ฌธ์ œ์ ์ž…๋‹ˆ๋‹ค. Softmax Classifier๋Š” ๋งŽ์€ ๋ถ€๋ถ„์—์„œ ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ์„ ๋‚ด๊ณ  ์žˆ์ง€๋งŒ, CL์—์„œ ๋งŒํผ์€ ์ข‹์ง€ ์•Š๋‹ค๋Š” ๊ฒƒ์ด ์ €์ž์˜ ์ƒ๊ฐ์ž…๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ์ƒˆ๋กœ์šด ํด๋ž˜์Šค๊ฐ€ ๋“ค์–ด์˜ค๋Š” ๊ฒƒ์— ์œ ์—ฐํ•˜์ง€ ์•Š๋‹ค

    • Softmax์˜ ํŠน์„ฑ์ƒ ์ฒ˜์Œ๋ถ€ํ„ฐ ํด๋ž˜์Šค์˜ ๊ฐฏ์ˆ˜๋ฅผ ์ •ํ•ด์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋•Œ๋ฌธ์— ํƒœ์Šคํฌ๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋“ค์–ด์˜ฌ์ง€ ๋ชจ๋ฅด๋Š” CL ์„ธํŒ…์˜ ํŠน์„ฑ์— ๋งž์ง€ ์•Š์Šต๋‹ˆ๋‹ค. (ํ•˜์ง€๋งŒ ํ˜„์žฌ CL ์—ฐ๊ตฌ๋Š” ๋Œ€๋ถ€๋ถ„ ํƒœ์Šคํฌ๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋“ค์–ด์˜ฌ์ง€ ์•Œ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ํ›„์˜ ์‹คํ—˜์„ ๋ณด์‹œ๋ฉด ๋” ์ž˜ ์ดํ•ด๋ฉ๋‹ˆ๋‹ค.)

  • representation๊ณผ classification์ด ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์ง€ ์•Š๋‹ค

    • Encoder๊ฐ€ ๋ฐ”๋€” ๊ฒฝ์šฐ Softmax layer๋Š” ์ƒˆ๋กœ ํ›ˆ๋ จ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  • Task-recency bias

    • ์ด์ „์˜ ์—ฌ๋Ÿฌ ์—ฐ๊ตฌ์—์„œ, Softmax classifier๊ฐ€ ์ตœ๊ทผ ํƒœ์Šคํฌ์— ์น˜์ค‘๋˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค๋Š” ๊ฒƒ์ด ๊ด€์ฐฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๊ฐ€ ํ˜„์žฌ ํƒœ์Šคํฌ์— ์น˜์ค‘๋˜์–ด์žˆ๋Š” CL์˜ ํŠน์„ฑ์ƒ ์„ฑ๋Šฅ์— ์น˜๋ช…์ ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Nearest Class Mean(NCM) Classifier

์ €์ž๋Š” ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ, Few-shot learning์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” NCM Classifier๋ฅผ ์‚ฌ์šฉํ•˜์ž๊ณ  ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค. NCM Classifier์˜ ๊ฒฝ์šฐ Prototype Classifier๋ผ๊ณ ๋„ ๋ถˆ๋ฆฝ๋‹ˆ๋‹ค. ์ด Classifier๋Š” ํŠธ๋ ˆ์ด๋‹์ด ๋๋‚œ ํ›„, ํŠธ๋ ˆ์ด๋‹์— ์‚ฌ์šฉ๋˜์—ˆ๋˜ ๋ชจ๋“  ํด๋ž˜์Šค ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท ์„ ๋‚ด์–ด ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์ €์žฅ๋œ ํ‰๊ท ๊ฐ’์€ Prototype์ฒ˜๋Ÿผ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. Test์‹œ, ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด Prototype์„ ๊ฐ€์ง€๋Š” ํด๋ž˜์Šค๋กœ ํด๋ž˜์Šค๋ฅผ ์ถ”์ธกํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

NCM Classifier๋Š” SoftMax์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ฉด์„œ, few-shot learning์ฒ˜๋Ÿผ data ๋ถ€์กฑ ํ˜„์ƒ์— ์‹œ๋‹ฌ๋ฆฌ๋Š” CL๊ณผ ๊ต‰์žฅํžˆ ๊ถํ•ฉ์ด ์ž˜ ๋งž์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ NCM Classfier๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ ๋Œ€๋ถ€๋ถ„์˜ CL ๋ฐฉ๋ฒ•๋ก ์˜ ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ์ƒ์Šนํ•ฉ๋‹ˆ๋‹ค.

uc=1ncโˆ‘if(xi)โ‹…1{yi=c}u_c = \frac{1}{n_c}\sum_i f(x_i) \cdot 1\{y_i = c \}

yโˆ—=argminc=1,...,tโˆฃโˆฃf(x)โˆ’ucโˆฃโˆฃy^* = argmin_{c=1,...,t} ||f(x) - u_c ||

NCM classifier๋ฅผ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ์ˆ˜์‹์€ ์œ„์™€ ๊ฐ™๋‹ค. ์—ฌ๊ธฐ์„œ c๋Š” ํด๋ž˜์Šค๋ฅผ ๋œปํ•˜๊ณ , 1{y=c} ๋Š” y๊ฐ€ c์ผ ๋•Œ๋ฌธ 1์ด๋ผ๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. ํด๋ž˜์Šค ๋ณ„ ๋ฉ”๋ชจ๋ฆฌ์— ๋“ค์–ด์žˆ๋Š” ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท ์„ ๊ตฌํ•˜๊ณ , ๊ทธ ํ‰๊ท ์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํด๋ž˜์Šค๋กœ Inference๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค.

Supervisied Contrastive Replay

NCM Classifier์˜ ํฌํ…์…œ์„ ๋” ๋†’์ผ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด SCR์ž…๋‹ˆ๋‹ค. NCM Classifier๋Š” Representation ๊ฐ„ ๊ฑฐ๋ฆฌ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ inference๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฐ ์ƒํ™ฉ์—์„œ ๋‹ค๋ฅธ ํด๋ž˜์Šค๋Š” ๋” ๋ฉ€๋ฆฌ, ๊ฐ™์€ ํด๋ž˜์Šค๋Š” ๋” ๊ฐ€๊นŒ์ด ๋ถ™์—ฌ๋‘๋Š” Contrastive Learning์€ NCM์— ํฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ €์ž๋Š” ํŠธ๋ ˆ์ด๋‹ ๋ฐ์ดํ„ฐ์— ๋‹จ์ˆœํ•œ Augmented View๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ , ์ด ๋ฐ์ดํ„ฐ๋“ค์„ ์ด์šฉํ•˜์—ฌ Contrastive Learning์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ ๋ฐ์ดํ„ฐ์™€ ํ˜„์žฌ ๋ฐ์ดํ„ฐ๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

LSCL(ZI)=โˆ‘iโˆˆI1โˆฃP(i)โˆฃโˆ‘pโˆˆP(i)logexp(ziโ‹…zp/ฯ„)โˆ‘jโˆˆA(i)exp(ziโ‹…zj/ฯ„)L_{SCL}(Z_I) = \sum_{i\in I} \frac{1}{|P(i)|} \sum{p\in P(i)} log \frac{exp(z_i\cdot z_p / \tau)}{\sum{j \in A(i)}exp(z_i \cdot z_j / \tau) }

Loss ์‹์€ ์œ„ ์‹๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. $B = {x_k,y_k}{k=1,...,b}$์˜ Mini Batch๋ผ๊ณ  ํ•  ๋•Œ, $\tilde{B}$ $= { \tilde{x_k} = Aug(x_k), y_k }{k=1,...,b}$ ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  $B_I = B \cap \tilde{B}$ ์ž…๋‹ˆ๋‹ค. $I$๋Š” $B_I$์˜ ์ง€์ˆ˜๋“ค์˜ ์ง‘ํ•ฉ์ด๊ณ , $A(i)=I \setminus {i}$ ์ž…๋‹ˆ๋‹ค. $P(i) = {p \in A(i) : y_p = y_i}$ ์ž…๋‹ˆ๋‹ค. ๋ณต์žกํ•ด ๋ณด์ด์ง€๋งŒ ์ฐฌ์ฐฌํžˆ ๋œฏ์–ด๋ณด๋ฉด ์–ด๋ ต์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ $P(i)$๋Š” ์ƒ˜ํ”Œ i๋ฅผ ์ œ์™ธํ•œ ๊ฒƒ ์ค‘์—์„œ label์ด ๊ฐ™์€ ๊ฒƒ, ๊ทธ๋Ÿฌ๋‹ˆ๊นŒ Positive sample์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. $Z_I = {z_i}_{i \in I} = Model(x_i)$ ์ด๊ณ , $\tau$๋Š” ์กฐ์ •์„ ์œ„ํ•œ temperature parameter ์ž…๋‹ˆ๋‹ค.

Implementation์—์„œ๋Š” Continual Learning์˜ ๋ฒค์น˜๋งˆํฌ๋ผ๊ณ  ํ•  ์ˆ˜๋„ ์žˆ๋Š” Split Cifar-10์—์„œ ์‹คํ—˜์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ BaseLine์œผ๋กœ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” Experience Replay์— ๋Œ€ํ•œ ๊ตฌํ˜„๊ณผ, ์ด ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ NCN Classifier๋ฅผ ์‚ฌ์šฉํ•œ Experience Replay์— ๋Œ€ํ•œ ๊ตฌํ˜„์„ ์ค€๋น„ํ–ˆ์Šต๋‹ˆ๋‹ค.

Environment

Colab ํ™˜๊ฒฝ์—์„œ ์‹คํ—˜ํ•˜๊ธฐ๋ฅผ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

Setting of Continual Learning

์ด ์ฑ•ํ„ฐ์—์„œ๋Š” Continual Learning evaluation์„ ์œ„ํ•œ ๊ธฐ๋ณธ์ ์ธ ์„ธํŒ…์„ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์€ Cifar-10์„ 5๊ฐœ์˜ ํƒœ์Šคํฌ๋กœ ๋‚˜๋ˆˆ Split Cifar-10์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” Reduced_ResNet18์„ ๋ฒ ์ด์Šค ๋ชจ๋ธ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด Implementation์—์„œ๋Š” ๊ตฌํ˜„์˜ ๊ฐ„๋‹จํ•จ์„ ์œ„ํ•ด ์ž‘์€ CNN๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ์—์„œ๋Š” Split Cifar-10์„ ๋งŒ๋“ค๊ณ , Reduced_ResNet18์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ์ฝ”๋“œ๋Š” ์‚ฌ์šฉ๋  Base CNN ๋ชจ๋ธ์ธ Reduced ResNet18์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰ FC ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” features๋ผ๋Š” ํ•จ์ˆ˜๊ฐ€ ์กด์žฌํ•˜๋Š” ์ ์ด ํŠน์ดํ• ๋งŒํ•œ ์ ์ž…๋‹ˆ๋‹ค. ์ด features๋Š” ํ›„์— NCM classifier๋ฅผ ๊ตฌํ˜„ํ• ๋•Œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค,

Experience Replay

์•„๋ž˜ ์ฝ”๋“œ๋Š” Continual Learning์—์„œ ๊ฐ€์žฅ ๋งŽ์ด ์“ฐ์ด๋Š” ๋ฒ ์ด์Šค๋ผ์ธ ์ค‘ ํ•˜๋‚˜์ธ Experience Replay๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. Memory size, training epoch, learning rate ๋“ฑ ๋‹ค์–‘ํ•œ ์˜ต์…˜๋“ค์„ ๋ฐ”๋€Œ๋ฉฐ ์„ฑ๋Šฅ์ด ์–ด๋–ป๊ฒŒ ๋ณ€ํ•˜๋Š”์ง€ ์•Œ์•„๋ณด๋ฉด ์žฌ๋ฏธ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋จผ์ € ์•„๋ž˜ ์ฝ”๋“œ์—์„œ๋Š” External Memory๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ๋Š” ์–ด๋–ค ์‹์œผ๋กœ ๊ตฌํ˜„ํ•ด๋„ ์ƒ๊ด€์€ ์—†์ง€๋งŒ, ๋žœ๋ค์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ์— ๋“ค์–ด๊ฐˆ/๋ฉ”๋ชจ๋ฆฌ์—์„œ ๋ฝ‘ํž ๋ฐ์ดํ„ฐ๋ฅผ ์‰ฝ๊ฒŒ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ํด๋ž˜์Šค๋ฅผ ํ•˜๋‚˜ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.

๋ฉ”๋ชจ๋ฆฌ์— ๋“ค์–ด๊ฐˆ ์ƒ˜ํ”Œ๊ณผ, ๊บผ๋‚ด์ง€๋Š” ์ƒ˜ํ”Œ์„ ์ •ํ•˜๋Š” ๊ฒƒ์€ ER method์—์„œ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์ธ ER method๋Š” ๋ชจ๋“  ๊ฒƒ์„ ๋žœ๋ค์œผ๋กœ ์กฐ์ •ํ•˜์ง€๋งŒ, MIR, GSS, ASER ๋“ฑ์˜ ์ถ”๊ฐ€์ ์ธ ๋ฉ”์†Œ๋“œ๋Š” ์ด ๋ถ€๋ถ„์œผ๋กœ ์ฃผ์š”ํ•˜๊ฒŒ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋งŒ๋“ค์—ˆ์œผ๋‹ˆ ๋‹ค์Œ์œผ๋กœ ์ง„ํ–‰ํ•  ๊ฒƒ์€ ํŠธ๋ ˆ์ด๋‹, ํ…Œ์ŠคํŠธ, ๊ทธ๋ฆฌ๊ณ  Continual Leaerning setting์ž…๋‹ˆ๋‹ค. ์ง„ํ–‰ํ•˜๊ธฐ ํŽธํ•˜๊ฒŒ ํŠธ๋ ˆ์ด๋‹๊ณผ ํ…Œ์ŠคํŠธ๋ฅผ ๋”ฐ๋กœ ํ•จ์ˆ˜ํ™” ํ•˜๊ณ , Continual Learning process๋Š” ER ํ•จ์ˆ˜์—์„œ ๋”ฐ๋กœ ์ •์˜ํ•ด์ค๋‹ˆ๋‹ค.

colab cpu๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ์•ฝ 20๋ถ„ ์ •๋„๊ฐ€ ์†Œ์š”๋ฉ๋‹ˆ๋‹ค. Memory size 1000, epoch 1์˜ ์ƒํ™ฉ์—์„œ ์ตœ์ข… ์„ฑ๋Šฅ์˜ ํ‰๊ท ์€ ์•ฝ 34-36์ •๋„๋กœ ๋‚˜์˜จ๋‹ค๋ฉด ํ›Œ๋ฅญํ•ฉ๋‹ˆ๋‹ค. ์ €์ž์˜ ๋…ผ๋ฌธ์— ๋‚˜์˜จ ํ‰๊ท ๊ฐ’์€ ๋Œ€๋žต 37 ์ •๋„์ž…๋‹ˆ๋‹ค. learning rate์„ 0.05-0.08 ์ •๋„๋กœ ๋‚ฎ์ถ˜๋‹ค๋ฉด ์ €์ž์˜ ์„ฑ๋Šฅ์— ๊ทผ์ ‘ํ•œ ๊ฐ’์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Use NCM Classifier

์—ฌ๊ธฐ์„œ Contrastive Learning๊นŒ์ง€ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์€ CPU๋งŒ ์‚ฌ์šฉํ•˜๋Š” ํŠน์„ฑ์ƒ ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์—, NCM Classifier๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ , ์„ฑ๋Šฅ ์ƒ์Šน์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋„๋ก Implementation ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

NCM_ER์„ ์ด์šฉํ•  ๊ฒฝ์šฐ, Colab CPU์—์„œ ์•ฝ 21๋ถ„์ด ์†Œ์š”๋ฉ๋‹ˆ๋‹ค. ์„ฑ๋Šฅ์€ memory size 1000 ๊ธฐ์ค€์œผ๋กœ ์•ฝ 38-41 ์ •๋„๋กœ, ์ €์ž์˜ reference ๊ฐ’๋ณด๋‹ค ๋‚ฎ๊ฒŒ ๋‚˜์˜ค๋”๋ผ๋„ ๊ดœ์ฐฎ์Šต๋‹ˆ๋‹ค. hyperparemeter tuning์„ ์ž˜ ์ˆ˜ํ–‰ํ•œ๋‹ค๋ฉด ์ €์ž์˜ ์„ฑ๋Šฅ์— ๊ทผ์ ‘ํ•˜๊ฒŒ ์„ฑ๋Šฅ์„ ์˜ฌ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Take Home Message

continual learning์€ ์•„์ง ๊ฐˆ ๊ธธ์ด ๋จธ๋‚˜, contrastive learning์ด๋‚˜ transformer์ฒ˜๋Ÿผ main vision task์—์„œ๋Š” ์ด๋ฏธ ๊ทธ ๋Šฅ๋ ฅ์ด ๊ฒ€์ฆ๋˜์—ˆ์ง€๋งŒ continual leanring์—์„œ๋Š” ์•ˆ ์“ฐ์ธ ๊ฒƒ๋“ค์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ์ž˜ ์‚ดํŽด๋ณธ๋‹ค๋ฉด ์•„์ง continual learning์€ ๋ฐœ์ „ ๊ฐ€๋Šฅ์„ฑ์ด ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๋‹ค.

Author

๊ถŒ๋ฏผ์ฐฌ (MINCHAN KWON)

  • KAIST AI

  • https://kmc0207.github.io/CV/

  • kmc0207@kaist.ac.kr

Reviewer

  1. Korean name (English name): Affiliation / Contact information

  2. Korean name (English name): Affiliation / Contact information

  3. ...

Reference & Additional materials

  1. Citation of this paper

  2. Official (unofficial) GitHub repository

  3. Citation of related work

  4. Other useful materials

  5. ...

Last updated

Was this helpful?