SimCLR [Kor]

Ting Chen al. / A Simple Framework for Contrastive Learning of Visual Representation / ICML '2020

SimCLR [Korean]

1. Problem definition

๋Œ€๊ทœ๋ชจ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ์‚ฌ์ „ ํ›ˆ๋ จ์€ ์—ฌ๋Ÿฌ ๋…ผ๋ฌธ์—์„œ ์ž…์ฆ๋œ ๊ฒƒ์ฒ˜๋Ÿผ ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์—์„œ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ž ์žฌ๋ ฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์€ ๋น„์ฃผ๋„ ํ•™์Šต ๋ฌธ์ œ๋ฅผ ๋ ˆ์ด๋ธ”๋˜์ง€ ์•Š์€ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋Œ€์šฉ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•˜์—ฌ ์ฃผ๋„๋œ ๋ฌธ์ œ๋กœ ์ „ํ™˜ํ•˜๋Š” ๊ธฐ์ˆ ์˜ ๊ณ„์—ด์ธ ์ž๊ธฐ์ฃผ๋„ ํ•™์Šต์˜ ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ํ˜„์žฌ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ž๊ธฐ์ฃผ๋„ ๊ธฐ์ˆ ์€ ๋ณต์žกํ•˜์—ฌ ์•„ํ‚คํ…์ฒ˜ ๋˜๋Š” ํ›ˆ๋ จ ์ ˆ์ฐจ์— ์ƒ๋‹นํ•œ ์ˆ˜์ •์ด ํ•„์š”ํ•˜๋ฉฐ ๋„๋ฆฌ ์ฑ„ํƒ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

ํ•ด๋‹น ๋…ผ๋ฌธ์€ ์ž๊ธฐ ์ง€๋„ ํ•™์Šต์—์„œ ์ฃผ๋œ ์š”์†Œ๋“ค์„ ์—ฐ๊ตฌํ•˜๋ฉฐ, ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ์ž๊ธฐ์ฃผ๋„์  ํ‘œํ˜„ ํ•™์Šต์— ๋Œ€ํ•œ ์ด์ „์˜ ์ ‘๊ทผ ๋ฐฉ์‹์„ ๊ฐ„์†Œํ™”ํ•  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ SimCLR์ด๋ผ๋Š” ๊ธฐ์ดˆ ํ”„๋ ˆ์ž„ ์›Œํฌ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋ก ์„ ์ด์šฉํ•˜์—ฌ SOTA ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ์ ‘๊ทผ ๋ฐฉ์‹์˜ ๋‹จ์ˆœ์„ฑ์€ ๊ธฐ์กด์˜ ์ฃผ๋„ ํ•™์Šต๊ด€์— ์‰ฝ๊ฒŒ ํ†ตํ•ฉ๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

2. Motivation

์‚ฌ๋žŒ ์—†์ด ํšจ๊ณผ์ ์ธ ์‹œ๊ฐ์  ํ‘œํ˜„์— ๋Œ€ํ•œ ์ง€๋„ ํ•™์Šต์€ ์˜ค๋žซ๋™์•ˆ ์—ฐ๊ตฌ๋กœ์„œ ๋‹ค๋ฃจ์–ด์ ธ ์™”์Šต๋‹ˆ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ์ฃผ๋ฅ˜ ์ ‘๊ทผ ๋ฐฉ์‹์€ ์ƒ์„ฑ์  ๋˜๋Š” ์ฐจ๋ณ„์ ์ด๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ํด๋ž˜์Šค ์ค‘ ํ•˜๋‚˜๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒ์„ฑ์  ์ ‘๊ทผ์€ ๋ชจ๋ธ์˜ ๋ผ๋ฒจ๋ง์— ๋Œ€ํ•ด ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์šฐ๊ฑฐ๋‚˜ ๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋ฉด ์ž…๋ ฅ ๊ณต๊ฐ„์˜ ํ”ฝ์…€์„ ๋ชจ๋ธ๋งํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Ÿฌํ•œ ํ”ฝ์…€ ๋‹จ์œ„์˜ ์ƒ์„ฑ์€ ๊ณ„์‚ฐ์ ์œผ๋กœ ๋งค์šฐ ๋น„์‹ผ ๋น„์šฉ์ด ๋“ค ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ํ‘œํ˜„ ํ•™์Šต์— ๊ผญ ํ•„์š”ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ณ€๋ณ„์ ์ธ ์ ‘๊ทผ ๋ฐฉ๋ฒ•์€ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ํ‘œํ˜„์„ ์ง€๋„ ํ•™์Šต์˜ ๋ชฉ์ ํ•จ์ˆ˜์™€ ๋น„์Šทํ•œ ๋ชฉ์ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ž…๋ ฅ๊ณผ ๋ ˆ์ด๋ธ”์ด ๋ชจ๋‘ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์˜ค๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ๋„คํŠธ์›Œํฌ๋ฅผ ํ•™์Šตํ•œ๋‹ค๋Š” ์ ์ด ์ง€๋„ ํ•™์Šต๊ณผ์˜ ์ฐจ์ด์ ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์€ ์—ฐ๊ตฌ์ž๊ฐ€ ์ •์˜ํ•œ ์—…๋ฌด(pretext task)๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ๋งค์šฐ ํœด๋ฆฌ์Šคํ‹ฑํ•˜๊ฒŒ ์ด๋ฃจ์–ด์กŒ์Šต๋‹ˆ๋‹ค.

Idea

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์‹œ๊ฐ์  ํ‘œํ˜„์˜ ๋Œ€์กฐ ํ•™์Šต์„ ์œ„ํ•œ SimCLR์ด๋ผ๋Š” ๊ฐ„๋‹จํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

๋ณธ ์—ฐ๊ตฌ๋Š” ๋จผ์ € ๋น„ ์ง€์ •๋œ ๋ ˆ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ด๋ฏธ์ง€์˜ ์ผ๋ฐ˜์ ์ธ ํ‘œํ˜„์„ ํ•™์Šตํ•œ ๋‹ค์Œ, ์†Œ๋Ÿ‰์˜ ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋กœ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ์ฃผ์–ด์ง„ ๋ถ„๋ฅ˜ ์ž‘์—…์— ๋Œ€ํ•ด ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

SimCLR์€ ์›๋ณธ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์˜ˆ์ œ๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์ถ”์ถœํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ํ™•๋Œ€ (์ž„์˜ ์ž๋ฅด๊ธฐ, ์ž„์˜์˜ ์ƒ‰์ƒ ์™œ๊ณก ๋ฐ ๊ฐ€์šฐ์‹œ์•ˆ ๋ธ”๋Ÿฌ)์˜ ์กฐํ•ฉ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ์˜ˆ์ œ๋ฅผ ๋‘ ๋ฒˆ ๋ณ€ํ™˜ํ•˜์—ฌ ๋‘ ์„ธํŠธ์˜ ํ•ด๋‹น ๋ณด๊ธฐ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๊ฐœ๋ณ„ ์ด๋ฏธ์ง€์˜ ์ด๋Ÿฌํ•œ ๊ฐ„๋‹จํ•œ ๋ณ€ํ™˜์˜ ๊ทผ๊ฑฐ๋Š”

  1. ๋ณ€ํ™˜์‹œ ๋™์ผํ•œ ์ด๋ฏธ์ง€์˜ ์ผ๊ด€๋œ ํ‘œํ˜„์„ ์žฅ๋ คํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  2. ์‚ฌ์ „ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๋ ˆ์ด๋ธ”์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— ์–ด๋–ค ์ด๋ฏธ์ง€์— ์–ด๋–ค ๊ฐ์ฒด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋Š”์ง€๋ฅผ ์‚ฌ์ „์— ์•Œ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

  3. ์šฐ๋ฆฌ๋Š” ์ด๋Ÿฌํ•œ ๊ฐ„๋‹จํ•œ ๋ณ€ํ™˜์ด ์‹ ๊ฒฝ๋ง์ด ์ข‹์€ ํ‘œํ˜„์„ ๋ฐฐ์šฐ๊ธฐ์— ์ถฉ๋ถ„ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•˜์ง€๋งŒ ๋” ๋ณต์žกํ•œ ๋ณ€ํ™˜ ์ •์ฑ…๋„ ํ†ตํ•ฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ ๋‹ค์Œ, SimCLR์€ ResNet ์•„ํ‚คํ…์ฒ˜ ๊ธฐ๋ฐ˜ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ๋ณ€ํ˜•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ํ‘œํ˜„์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ํ›„ SImCLR์€ Fully-Connected Network๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ํ‘œํ˜„์˜ ๋น„์„ ํ˜• ํˆฌ์˜์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ธฐ๋Šฅ์€ ๋ณ€ํ•˜์ง€ ์•Š๋Š” ๊ธฐ๋Šฅ์„ ์ฆํญ์‹œํ‚ค๊ณ  ๋™์ผํ•œ ์ด๋ฏธ์ง€์˜ ๋‹ค๋ฅธ ๋ณ€ํ™˜์„ ์‹๋ณ„ํ•˜๋Š” ๋„คํŠธ์›Œํฌ์˜ ๊ธฐ๋Šฅ์„ ์ตœ๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋Œ€๋น„ ๋ชฉํ‘œ์˜ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ํ™•๋ฅ ์  ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ CNN๊ณผ MLP๋ฅผ ๋ชจ๋‘ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์ „ ํ•™์Šตํ•œ ํ›„์—๋Š” CNN์˜ ์ถœ๋ ฅ์„ ์ด๋ฏธ์ง€ ํ‘œํ˜„์œผ๋กœ ์ง์ ‘ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋กœ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ž‘์—…์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3. Method

The Contrastive Learning Framework

AI604_1

๋จผ์ €, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ SimCLR์ด๋ผ๋Š” ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์ตœ๊ทผ ๋Œ€์กฐ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ ์˜๊ฐ์„ ๋ฐ›์•˜์œผ๋ฉฐ, ๋™์ผ ๋ฐ์ดํ„ฐ ์˜ˆ์‹œ์— ๋Œ€ํ•˜์—ฌ ๋‹ค๋ฅด๊ฒŒ ์ฆ๊ฐ•๋œ ๋ทฐ๋ฅผ ์‚ฌ์ด์˜ ์ผ์น˜๋ฅผ ์ตœ๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‘ ์ž ์žฌ ๊ณต๊ฐ„์—์„œ์˜ ๋Œ€์กฐ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค.

Figure 2.๋Š” 4๊ฐœ์˜ ์š”์†Œ๋กœ ๊ตฌ์„ฑ๋œ SimCLR ํ”„๋ ˆ์ž„ ์›Œํฌ๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ € ํ™•๋ฅ ์  ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ชจ๋“ˆ์ด ์ฃผ์–ด์ง„ ์ž„์˜์˜ ๋ฐ์ดํ„ฐ ์˜ˆ์‹œ๋ฅผ ๋žœ๋คํ•˜๊ฒŒ ๋‘ ๊ฐœ์˜ ์—ฐ๊ด€๋œ ๋ทฐ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์ด๊ฒƒ์„ ๊ธ์ • ์Œ์ด๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š”, 3๊ฐ€์ง€์˜ ๋‹จ์ˆœํ•œ ์ฆ๊ฐ•๋ฒ•์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋žœ๋ค ์ž˜๋ผ๋‚ด๊ธฐ, ๋žœ๋ค ์ƒ‰ ์™œ๊ณก, ๊ทธ๋ฆฌ๊ณ  ๋žœ๋ค ๊ฐ€์šฐ์‹œ์•ˆ ๋ธ”๋Ÿฌ์ž…๋‹ˆ๋‹ค.

์ฆ๊ฐ•๋œ ๋ฐ์ดํ„ฐ ์˜ˆ์ œ์—์„œ ํ‘œํ˜„ ๋ฒกํ„ฐ๋ฅผ ์ถ”์ถœํ•˜๋Š” ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ์ธ์ฝ”๋”๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ธ์ฝ”๋” f๋Š” ๋ณธ ์—ฐ๊ตฌ๋Š” ๋‹จ์ˆœํ•จ์„ ์ถ”๊ตฌํ•˜๊ธฐ์—, ๋ณธ ์—ฐ๊ตฌ์˜ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์ œ์•ฝ ์กฐ๊ฑด์—†์ด ๋‹ค์–‘ํ•œ ๋„คํŠธ์›Œํฌ ์•„ํ‚คํ…์ณ๋ฅผ ๊ณ ๋ฅผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ResNet๋ฅผ f๋กœ์„œ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

์ž‘์€ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ํ”„๋กœ์ ์…˜ ํ—ค๋“œ g๋Š” ์ด๋ฏธ์ง€์˜ ํ‘œ์‹œ๋ฅผ ๋Œ€์กฐ ์†์‹ค ํ•จ์ˆ˜๊ฐ€ ์ ์šฉ๋˜๋Š” ๊ณต๊ฐ„์œผ๋กœ ๋งคํ•‘์‹œํ‚ต๋‹ˆ๋‹ค. ์ฆ‰, 1๊ฐœ์˜ ์€๋‹‰์ธต๊ณผ ReLu ํ•จ์ˆ˜๊ฐ€ ์žˆ๋Š” MLP๋ฅผ ์ด์šฉํ•˜์—ฌ ์†์‹ค ํ•จ์ˆ˜์— ์“ธ ๋น„์„ ํ˜• ํ•จ์ˆ˜๋ฅผ ์–ป์–ด๋ƒ…๋‹ˆ๋‹ค.

๋Œ€์กฐ ์†์‹ค ํ•จ์ˆ˜๋Š” ๋Œ€์กฐ ์˜ˆ์ธก ์ž‘์—…์„ ์œ„ํ•ด ์ •์˜๋ฉ๋‹ˆ๋‹ค. x_i์™€ x_j์˜ ๊ธ์ •์Œ์„ ํฌํ•จํ•œ x_k์ด๋ผ๋Š” ์ง‘ํ•ฉ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ๋Œ€์กฐ ์˜ˆ์ธก ์ž‘์—…์€ x_i๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, x_i๊ฐ€ ์•„๋‹Œ x_j๋ฅผ x_k ์ง‘ํ•ฉ์—์„œ ์ฐพ๋Š” ๊ฒƒ์— ์ฃผ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ, ๊ธ์ •์Œ์„ ๊ฐ€์ง„ ์†์‹คํ•จ์ˆ˜๋Š” AI604_2

๋กœ์„œ ์ •์˜๋ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ SimCLR์˜ ๋ฉ”์ธ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.

AI604_3

Training with Large Batch Size

๋‹จ์ˆœํ•จ์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋ฉ”๋ชจ๋ฆฌ ๋ฑ…ํฌ๋กœ ํ›ˆ๋ จํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋Œ€์‹ , ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ํ›ˆ๋ จ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ 256์—์„œ 8129๊นŒ์ง€ ๋‹ค์–‘ํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. 8192์˜ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋Š” ์ฆ๊ฐ•๋œ ๋ทฐ๋กœ๋ถ€ํ„ฐ ๋‚˜์˜จ ๊ฐ๊ฐ์˜ ๊ธ์ •์Œ๋งˆ๋‹ค 16382๊ฐœ์˜ ๋ถ€์ • ์˜ˆ์‹œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์„ ํ˜• ํ•™์Šต๋ฅ  ์Šค์ผ€์ผ๋ง์„ ์ด์šฉํ•œ SGD/Momentum๋ฅผ ์ ์šฉํ–ˆ์„ ๋•Œ, ํฐ ์‚ฌ์ด์ฆˆ์˜ ๋ฐฐ์น˜๋กœ ํ•™์Šตํ•˜๋ฉด ์•ˆ์ •ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” LARS ์ตœ์ ํ™” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

Data Augmentation for Contrastive Representation Learning

AI604_4

๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์œ„ํ•ด์„œ๋Š” ๋žœ๋คํ•˜๊ฒŒ ์ผ์–ด๋‚˜๋Š” crop๊ณผ resize, ๊ทธ๋ฆฌ๊ณ  ์ƒ‰ ์™œ๊ณก๊ณผ ๊ฐ€์šฐ์‹œ์•ˆ ๋ธ”๋Ÿฌ ๋“ฑ์„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์˜ ์˜ํ–ฅ์„ ์ฒด๊ณ„์ ์œผ๋กœ ์—ฐ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด, ๋žœ๋คํ•˜๊ฒŒ ์ผ์–ด๋‚˜๋Š” crop๊ณผ resize, ์ƒ‰ ์™œ๊ณก, ๊ฐ€์šฐ์‹œ์•ˆ ๋ธ”๋Ÿฌ ์™ธ์—๋„ ๋ฐ์ดํ„ฐ์˜ ํšŒ์ „, ์ปท์•„์›ƒ, ๋Œ€๋น„ ๋ฐ ์ฑ„๋„ ๋ณ€ํ™” ๋“ฑ๊ณผ ๊ฐ™์€ ๋ณ€ํ™”๋ฅผ ํฌํ•จํ•˜์—ฌ ์ฆ๊ฐ•ํ•˜์˜€๋‹ค. ๊ฐœ๋ณ„์ ์œผ๋กœ ํ˜น์€ ์ง์œผ๋กœ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์ ์šฉํ•  ๋•Œ, ๋ณธ ํ”„๋ ˆ์ž„ ์›Œํฌ์˜ ์„ฑ๋Šฅ์„ ์กฐ์‚ฌํ•˜์˜€๋‹ค. ImageNet์˜ ์ด๋ฏธ์ง€๋“ค์€ ๊ฐ์ž ๋‹ค๋ฅธ ์‚ฌ์ด์ฆˆ๋“ค์ด๋ฏ€๋กœ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ํ•ญ์ƒ ์ž๋ฅด๊ธฐ์™€ resize๋ฅผ ํ•˜์˜€๋‹ค.

๋‹จ์ผ ๋ณ€ํ™˜์€ ์ตœ๊ณ ์˜ ํ‘œํ˜„์„ ์ œ๊ณตํ•˜๋Š” ์˜ˆ์ธก ์ž‘์—…์„ ์ •์˜ํ•˜๋Š” ๋ฐ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š๋‹ค. ํ•˜์ง€๋งŒ, ๋žœ๋ค ํฌ๋ž๊ณผ ๋žœ๋ค ์ƒ‰ ์™œ๊ณก์ด๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ๋ณ€ํ™˜์ด ๊ฐ€์žฅ ๋‘๋“œ๋Ÿฌ์ง€๋Š” ์˜ํ–ฅ์„ ์ฃผ์—ˆ๋‹ค. ์ž๋ฅด๊ธฐ๋‚˜ ์ƒ‰ ์™œ๊ณก์ด ์ž์ฒด์ ์œผ๋กœ ๊ณ ์„ฑ๋Šฅ์„ ๋‚ด์ฃผ์ง„ ์•Š์ง€๋งŒ, ์ด ๋‘ ๊ฐ€์ง€ ๋ณ€ํ˜•์„ ๊ตฌ์„ฑํ•˜๋ฉด ์ตœ์‹  ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

Architectures for Encoder and Head

SimCLR์—์„œ๋Š” ๋Œ€์กฐ ํ•™์Šต ๋ชฉํ‘œ์— ๋Œ€ํ•œ ์†์‹คํ•จ์ˆ˜๊ฐ€ ๊ณ„์‚ฐ๋˜๊ธฐ ์ „์— MLP ๊ธฐ๋ฐ˜ ๋น„์„ ํ˜• ํˆฌ์˜์ด ์ ์šฉ๋˜์–ด ๊ฐ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๋ณ€ํ•˜์ง€ ์•Š๋Š” ํŠน์ง•์„ ์‹๋ณ„ํ•˜๊ณ  ๋™์ผํ•œ ์ด๋ฏธ์ง€์˜ ๋‹ค๋ฅธ ๋ณ€ํ™˜์„ ์‹๋ณ„ํ•˜๋Š” ๋„คํŠธ์›Œํฌ์˜ ๋Šฅ๋ ฅ์„ ์ตœ๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๊ฐ€ ํ–ˆ๋˜ ์‹คํ—˜์—์„œ, ์ด๋Ÿฌํ•œ ๋น„์„ ํ˜• ํˆฌ์˜๋ฒ•์„ ์‚ฌ์šฉํ•˜๋ฉด ํ‘œํ˜„ ํ’ˆ์งˆ์„ ํ–ฅ์ƒํ•˜๊ณ  SImCLR ํ•™์Šต๋œ ํ‘œํ˜„์— ๋Œ€ํ•ด ํ›ˆ๋ จ๋œ ์„ ํ˜• ๋ถ„๋ฅ˜๊ธฐ์˜ ์„ฑ๋Šฅ์„ 10% ์ด์ƒ ํ–ฅ์ƒํ•˜๋Š” ๋ฐ์— ๋„์›€์ด ๋œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•˜์Šต๋‹ˆ๋‹ค.

AI604_5

Loss Functions and Batch Size

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” NT-Xent ์†์‹ค ํ•จ์ˆ˜๋ฅผ ๋‹ค๋ฅธ ์ผ๋ฐ˜์ ์ธ ๋Œ€์กฐ ์†์‹ค ํ•จ์ˆ˜๋“ค(๋กœ์ง€์Šคํ‹ฑ ์†์‹ค ํ•จ์ˆ˜)๊ณผ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค.

AI604_6

Table 2๋Š” ์†์‹ค ํ•จ์ˆ˜์˜ ์ธํ’‹์˜ ๊ทธ๋ž˜๋””์–ธํŠธ์™€ ๋ชฉ์ ํ•จ์ˆ˜๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” L2 ์ •๊ทœํ™”์™€ ์ ์ ˆํ•œ temperature๊ฐ€ ๋ชจ๋ธ ํ•™์Šต์— ๋„์›€์„ ์ค„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•˜์Šต๋‹ˆ๋‹ค.

4. Experiment & Result

If you are writing Author's note, please share your know-how (e.g., implementation details)

This section should cover experimental setup and results. Please focus on how the authors of paper demonstrated the superiority / effectiveness of the proposed method.

Note that you can attach tables and images, but you don't need to deliver all materials included in the original paper.

Experimental setup

๋ณธ ์—ฐ๊ตฌ๋Š” ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ํ”„๋กœํ† ์ฝœ์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค๋ฅธ ๋””์ž์ธ ์„ ํƒ์— ๋Œ€ํ•ด ๋ณธ ์—ฐ๊ตฌ์˜ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ์— ์ดˆ์ ์„ ๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

  • Dataset

    ๋ฐ์ดํ„ฐ ์…‹์œผ๋กœ๋Š” ImageNet ILSVRC-2012 dataset์€ ๋ณธ ์—ฐ๊ตฌ์˜ ๋Œ€๋ถ€๋ถ„์—์„œ ํ™œ์šฉ๋˜๋Š” ๋น„์ง€๋„ํ•™์Šต์—์„œ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ์ธ์ฝ”๋” ๋„คํŠธ์›Œํฌ f (Figure 2)๋ฅผ ๋ผ๋ฒจ ์—†์ด ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐ์— ํ™•์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ช‡๋ช‡ ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ ํ›ˆ๋ จ์„ ์œ„ํ•ด์„œ๋Š” CIFAR-10 ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•˜์˜€์œผ๋ฉฐ, ์ „์ด ํ•™์Šต์„ ํ…Œ์ŠคํŠธํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค.

  • Baselines

    MoCo๋‚˜ PIPL, CPC v2, Local Agg, BigBiGAN.

  • Test setting

    Optimizer : LARS optimizer

    learning rate : 4.8 ( = 0.3 * BatchSize/256)

    weight decay : 10^-6

    Batch Size : 4096 for 100 epochs.

    Using linear warmup for the first 10 epochs.

  • Evaluation metric

  • Default Setting

    ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์œ„ํ•ด ๋žœ๋ค ํฌ๋ž, ๋žœ๋ค ๋ฆฌ์‚ฌ์ด์ฆˆ, ์ƒ‰ ์™œ๊ณก, ๊ฐ€์šฐ์‹œ์•ˆ ๋ธ”๋Ÿฌ๋ฅผ ์ด์šฉํ–ˆ์œผ๋ฉฐ, ๋ฒ ์ด์Šค ์ธ์ฝ”๋” ๋„คํŠธ์›Œํฌ (Figure 2์˜ f) ResNet-50์„ ์ด์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  2-layer์˜ MLP ํ”„๋กœ์ ์…˜ ํ—ค๋“œ์ธ g๋ฅผ ์ด์šฉํ–ˆ์œผ๋ฉฐ ์ด๋Š” ์ด๋ฏธ์ง€ ํ‘œ์‹œ๋ฅผ 128 ์ฐจ์›์˜ latent space๋กœ ์ „์ด์‹œํ‚ต๋‹ˆ๋‹ค.

    Loss function์œผ๋กœ๋Š” NT-Xent๋ฅผ ์ด์šฉํ–ˆ๋‹ค.

Result

  • Comparison with State-of-the-art

    ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ framework๋ฅผ ํ†ตํ•ด SOTA ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค๋Š” ๊ฒƒ์„ ๊ฒฐ๊ณผ๋กœ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค.

AI604_7
AI604_8
AI604_9

5. Conclusion

๋ณธ ๋…ผ๋ฌธ์€ ๋Œ€์กฐ์  ์‹œ๊ฐ์  ํ‘œํ˜„ ํ•™์Šต์„ ์œ„ํ•œ ๊ฐ„๋‹จํ•œ ํ”„๋ ˆ์ž„ ์›Œํฌ์™€ ์ธ์Šคํ„ด์Šคํ™”๋ฅผ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.

์šฐ๋ฆฌ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ž์ฒด ์ง€๋„, ๋Œ€์กฐ ํ•™์Šต ๋ฐ ์ „์ด ํ•™์Šต์ด ์ด์ „ ๋ฐฉ๋ฒ•๋ณด๋‹ค ์ƒ๋‹นํžˆ ๊ฐœ์„ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ณธ ์—ฐ๊ตฌ๋Š” ํ‘œ์ค€ ์ง€๋„ํ•™์Šต๊ณผ๋Š” ๋‹ค๋ฆ…๋‹ˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์€ ์•„๋ž˜์™€ ๊ฐ™์€ ์„ธ ๊ฐ€์ง€๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

  • ๊ตฌ์„ฑ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์€ ํšจ๊ณผ์ ์ธ ์˜ˆ์ธก ์ž‘์—…์„ ์ •์˜ํ•˜๋Š” ๊ฒƒ์— ์•„์ฃผ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

  • ํ‘œํ˜„๊ณผ ๋Œ€์กฐ ์†์‹ค ์‚ฌ์ด์— ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋น„์„ ํ˜• ๋ณ€ํ™˜์„ ๋„์ž…ํ•˜๋ฉด ํ•™์Šต๋œ ํ‘œํ˜„์˜ ํ’ˆ์งˆ์ด ํฌ๊ฒŒ ํ–ฅ์ƒ๋ฉ๋‹ˆ๋‹ค.

  • ๋Œ€์กฐ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ ํ‘œํ˜„ ํ•™์Šต์€ ์ •๊ทœํ™”๋œ ์ž„๋ฒ ๋”ฉ๊ณผ ์ ์ ˆํžˆ ์กฐ์ ˆ๋œ ์˜จ๋„ ๋งค๊ฐœ ๋ณ€์ˆ˜๋กœ๋ถ€ํ„ฐ ์ด์ ์„ ์–ป์Šต๋‹ˆ๋‹ค.

  • ๋Œ€์กฐ ํ•™์Šต์€ ๋” ํฐ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ์™€ ๋” ํฐ ํŠธ๋ ˆ์ด๋‹ ์Šคํ…์— ๋Œ€ํ•˜์—ฌ ์ง€๋„ํ•™์Šต๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ์— ํฐ ์ด์ ์„ ๊ฐ€์ ธ๊ฐ‘๋‹ˆ๋‹ค.

Take home message (์˜ค๋Š˜์˜ ๊ตํ›ˆ)

๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์€ ์‹ฌ์ธตํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์ปดํ“จํ„ฐ ๋น„์ „์—์„œ ์•„์ฃผ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•œ๋‹ค.

์ด์ „์˜ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ž๊ธฐ์ฃผ๋„ ๊ธฐ์ˆ ์€ ๋ณต์žกํ•˜์—ฌ ์•„ํ‚คํ…์ฒ˜ ๋˜๋Š” ํ›ˆ๋ จ ์ ˆ์ฐจ์— ์ƒ๋‹นํ•œ ์ˆ˜์ •์ด ํ•„์š”ํ•˜๋ฉฐ ๋„๋ฆฌ ์ฑ„ํƒ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ด๋ฏธ์ง€์˜ ์ผ๋ฐ˜์ ์ธ ํ‘œํ˜„์„ ํ•™์Šตํ•œ ๋‹ค์Œ, ์†Œ๋Ÿ‰์˜ ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋กœ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ์ฃผ์–ด์ง„ ๋ถ„๋ฅ˜ ์ž‘์—…์— ๋Œ€ํ•ด SOTA ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

Data augmentation is important in the computer vision based on the deep learning

Data augmentation

All men are mortal.

Socrates is a man.

Therefore, Socrates is mortal.

Author / Reviewer information

You don't need to provide the reviewer information at the draft submission stage.

Author

๊น€ํ•˜์ค€ (Kim Hajun)

  • Contact mail : hajun0219@kaist.ac.kr github : https://github.com/Hajun0219/

  • Company KAIST Mechanical Engineering

  • Introduction I'm studying about robotics. I'm interested in control, path planning, state estimation with optimization or learning based framework.

...

Reviewer

    Korean name (English name): Affiliation / Contact information

    Korean name (English name): Affiliation / Contact information

    ...

Reference & Additional materials

    1. "A Simple Framework for Contrastive Learning of Visual Representations", Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1597-1607, 2020.

    Citation of this paper

  1. 2.https://github.com/Hajun0219/awesome-reviews-kaist.git

Official (unofficial) GitHub repository

    Citation of related work

    Other useful materials ther useful materials

Last updated

Was this helpful?