BlockDrop [Kor]

Wu et al. / BlockDrop - Dynamic Inference Paths in Residual Networks / CVPR 2018

1. Problem definition

์ตœ๊ทผ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ๋“ค์€ ์ •ํ™•๋„๊ฐ€ ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜๋ฉฐ ๋‹ค์–‘ํ•œ dataset์—์„œ ํฐ ์„ฑ๊ณผ๋ฅผ ์ด๋ฃจ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ๋“ค์€ ์ž์œจ ์ฃผํ–‰, ๋ชจ๋ฐ”์ผ ์‹ค์‹œ๊ฐ„ ์ƒํ˜ธ์ž‘์šฉ๊ณผ ๊ฐ™์€ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์ ์šฉ๋˜๊ธฐ๊ฐ€ ํž˜๋“ค๋‹ค๋Š” ๋ฌธ์ œ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋Š” ๋†’์€ ์ •ํ™•๋„๋ฅผ ์œ„ํ•ด์„œ๋Š” ๋” ๊นŠ๊ณ  ๋ณต์žกํ•œ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ์œ ์ง€ํ•ด์•ผ ํ•˜๋Š”๋ฐ, ๋„คํŠธ์›Œํฌ๋ฅผ ๋ณต์žกํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋ฉด์„œ ์‹ค์‹œ๊ฐ„ ์ •๋„์˜ ๋น ๋ฅธ ์†๋„๋ฅผ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ด ์‰ฝ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ด์— ๋”ฐ๋ผ ์—ฌ๋Ÿฌ ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™” ๊ธฐ๋ฒ•๋“ค์ด ์ œ์•ˆ๋˜์—ˆ์œผ๋‚˜, ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ๊ฐ€ one-size-fits-all ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ผ๋Š” ์ ์„ ๋ฌธ์ œ์ ์œผ๋กœ ์ง€์ ํ•˜์˜€์Šต๋‹ˆ๋‹ค. (One size fits all)

2. Motivation

์ธ๊ฐ„์˜ ์ธ์‹ ์‹œ์Šคํ…œ์€ ์‚ฌ๋ฌผ ์ธ์‹์„ ํ•˜๋‚˜์˜ ๊ธฐ์ค€๋Œ€๋กœ ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ์‚ฌ๋ฌผ์˜ ์ข…๋ฅ˜๋‚˜ ์ฃผ๋ณ€์˜ ๋ฐฐ๊ฒฝ์— ๋”ฐ๋ผ ์‹œ๊ฐ„๊ณผ ์ค‘์š”๋„๋ฅผ ๋‹ค๋ฅด๊ฒŒ ๋ฐฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ณต์žกํ•œ ์ƒํ™ฉ๊ณผ ๋ฌผ์ฒด๋ฅผ ์ธ์‹ํ•ด์•ผ ํ•  ๊ฒฝ์šฐ์—๋Š” ํ‰์†Œ๋ณด๋‹ค ๋งŽ์€ ์‹œ๊ฐ„๊ณผ ๊ด€์‹ฌ์„ ๋ฌด์˜์‹์ค‘์— ๋” ํฌ๊ฒŒ ํ• ์• ํ•˜๊ณ , ๊ฐ„๋‹จํ•œ ์Šค์บ”์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ์—๋Š” ํฐ ์‹œ๊ฐ„๊ณผ ๊ด€์‹ฌ์„ ๋‘์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋งฅ๋ฝ์—์„œ, ๋ณธ ๋…ผ๋ฌธ์€ ์ธํ’‹ ์ด๋ฏธ์ง€์˜ ๋ถ„๋ฅ˜ ๋‚œ์ด๋„์— ๋”ฐ๋ผ ๋„คํŠธ์›Œํฌ์˜ ๋ ˆ์ด์–ด๋ฅผ ์„ ํƒ์ ์œผ๋กœ ์ œ๊ฑฐํ•˜๋Š” BlockDrop [1] ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

ResNet์€ ๋‘ ๊ฐœ ์ด์ƒ์˜ ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด๋กœ ๊ตฌ์„ฑ๋œ ๋ฆฌ์‚ฌ์ด์ฅฌ์–ผ ๋ธ”๋ก๊ณผ, ๋‘ ๋ธ”๋ก ์‚ฌ์ด์˜ ์ง์ ‘ ๊ฒฝ๋กœ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” Skip-connection์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ Skip-connection์€, ResNet์ด ๋™์ž‘ํ•  ๋•Œ ์ƒ๋Œ€์ ์œผ๋กœ ์–•์€ ๋„คํŠธ์›Œํฌ์˜ ์•™์ƒ๋ธ”์ฒ˜๋Ÿผ ์ž‘๋™ํ•˜๋„๋ก ํ•˜์—ฌ ResNet์˜ ํŠน์ • ๋ฆฌ์‚ฌ์ด์ฅฌ์–ผ ๋ธ”๋ก์ด ์ œ๊ฑฐ๋˜๋Š” ๊ฒฝ์šฐ์—๋„ ์ผ๋ฐ˜์ ์œผ๋กœ ์ „์ฒด ์„ฑ๋Šฅ์— ์•ฝ๊ฐ„์˜ ๋ถ€์ •์ ์ธ ์˜ํ–ฅ๋งŒ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

ํ•œํŽธ, Residual Network์˜ ๋ ˆ์ด์–ด๋ฅผ ์ œ๊ฑฐ (drop) ํ•˜๋Š” ๊ฒƒ์€ ์ผ๋ฐ˜์ ์œผ๋กœ Dropout [2] ๊ณผ DropConnect [3] ์™€ ๊ฐ™์ด ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ๊ณผ์ •์—์„œ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ๋ชจ๋‘ ์ธํผ๋Ÿฐ์Šค ๊ณผ์ •์—์„œ๋Š” ๋ ˆ์ด์–ด๋ฅผ dropํ•˜์ง€ ์•Š๊ณ  ๊ณ ์ •์‹œํ‚จ ์ฑ„๋กœ ์‹คํ—˜์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์ธํผ๋Ÿฐ์Šค ๊ณผ์ •์—์„œ ๋ ˆ์ด์–ด๋ฅผ ํšจ์œจ์ ์œผ๋กœ dropํ•œ๋‹ค๋ฉด ์„ฑ๋Šฅ์€ ๊ฑฐ์˜ ์œ ์ง€ํ•œ ์ฑ„๋กœ ์ธํผ๋Ÿฐ์Šค ๊ณผ์ •์—์„œ Speed up์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์†๋„ ๊ฐœ์„ ์„ ๋ชฉํ‘œ๋กœ ๋ ˆ์ด์–ด๋ฅผ ์ธํ’‹ ์ด๋ฏธ์ง€์— ๋”ฐ๋ผ ํšจ์œจ์ ์œผ๋กœ ๋“œ๋žํ•˜๋Š” ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Concept Figure

Residual Networks Behave Like Ensembles of Relatively Shallow Networks [4]

์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ResNet์ด ํ…Œ์ŠคํŠธ ๊ณผ์ •์—์„œ layer dropping์— resilientํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์„ฑ๋Šฅ ์ €ํ•˜๋Š” ์ตœ์†Œํ™”ํ•˜๋ฉด์„œ ๋ ˆ์ด์–ด๋ฅผ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ๋Š” dynamicํ•œ ๋ฐฉ๋ฒ•์€ ๋…ผ๋ฌธ์—์„œ ๊ตฌ์ฒด์ ์œผ๋กœ ์ œ์‹œ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, Data-Driven Sparse Structure Selection for Deep Neural Networks [5] ๋…ผ๋ฌธ์—์„œ๋Š” Sparsity constraint๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์–ด๋–ค ๋ฆฌ์‚ฌ์ด์ฅฌ์–ผ ๋ธ”๋ก์„ ์ œ๊ฑฐํ•  ๊ฒƒ์ธ์ง€ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ฃผ์–ด์ง„ ์ธํ’‹ ์ด๋ฏธ์ง€์— dependentํ•˜๊ฒŒ, ์ฆ‰ instance-specificํ•˜๊ฒŒ ์–ด๋–ค ๋ธ”๋ก์„ ์ œ๊ฑฐํ•  ๊ฒƒ์ธ์ง€ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์ง€๋Š” ๋ชปํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Idea

์ตœ์ ์˜ block dropping ๊ตฌ์กฐ๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ํ•ด๋‹น ๋…ผ๋ฌธ์€ reinforcement learning์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ฐ•ํ™”ํ•™์Šต์„ ํ†ตํ•ด ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€์— ์ ์ ˆํ•œ ๋ธ”๋ก ๊ตฌ์„ฑ์„ ์ฐพ์•„๋‚ด์ฃผ๋Š” binary vector๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ํ•ด๋‹น ๋…ผ๋ฌธ์€ ์ธํผ๋Ÿฐ์Šค ๊ณผ์ •์—์„œ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๊ฑฐ์˜ ์—†๋Š” ์ƒํƒœ๋กœ speed up์„ ์ด๋ค„๋ƒ…๋‹ˆ๋‹ค.

3. Method

์ž…๋ ฅ ์ด๋ฏธ์ง€๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์ตœ์ ์˜ block dropping ์ „๋žต์„ ์ฐพ๊ธฐ ์œ„ํ•ด์„œ ํ•ด๋‹น ๋…ผ๋ฌธ์€ binary policy vector๋ฅผ ์ถœ๋ ฅํ•˜๋Š” policy network๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต ๊ณผ์ •์—์„œ ๋ฆฌ์›Œ๋“œ๋Š” block usage์™€ ์˜ˆ์ธก ์ •ํ™•๋„๋ฅผ ๋ชจ๋‘ ๊ณ ๋ คํ•˜์—ฌ ๊ฒฐ์ •๋ฉ๋‹ˆ๋‹ค.

Policy Network

์ผ๋ฐ˜์ ์ธ ๊ฐ•ํ™”ํ•™์Šต๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ, ํ•ด๋‹น ๋…ผ๋ฌธ์€ all actions at once ๋ฐฉ์‹์œผ๋กœ ์ •์ฑ…์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ž…๋ ฅ ์ด๋ฏธ์ง€ x์™€ K๊ฐœ์˜ ๋ธ”๋ก์„ ๊ฐ€์ง€๋Š” ResNet์ด ์žˆ์„ ๋•Œ, block dropping ์ •์ฑ…์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด K์ฐจ์›์˜ ๋ฒ ๋ฅด๋ˆ„์ด ๋ถ„ํฌ๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค.

Bernoulli Distribution

์œ„ ์‹์—์„œ f๋Š” policy network์— ํ•ด๋‹นํ•˜๊ณ , ์ด์— ๋”ฐ๋ฅธ s๋Š” ํŠน์ • ๋ธ”๋ก์ด drop๋  likelihood๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ์— u๋Š” 0 ๋˜๋Š” 1์˜ ๊ฐ’์„ ๊ฐ€์ง€๋Š” drop ์—ฌ๋ถ€๋ฅผ ๋”ฐ์ง€๋Š” action์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ํšจ์œจ์ ์ธ block usage์™€ ๋™์‹œ์— ์ •ํ™•๋„๋ฅผ ๋†’์ด๊ธฐ ์œ„ํ•ด์„œ ์•„๋ž˜์™€ ๊ฐ™์€ reward function์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

Reward

๋ฆฌ์›Œ๋“œ ์ˆ˜์‹์˜ ์ฒซ์งธ์ค„์€ ์ „์ฒด ๋ธ”๋ก ์ค‘์—์„œ ๋“œ๋ž๋œ ๋ธ”๋ก์˜ ๋น„์œจ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ, ์œ„์™€ ๊ฐ™์€ ํ˜•ํƒœ๋กœ ์ ์€ ์–‘์˜ ๋ธ”๋ก์„ ์‚ฌ์šฉํ•˜๋Š” ์ •์ฑ…์— ํฐ ๋ฆฌ์›Œ๋“œ๋ฅผ ์ฃผ์–ด์„œ block dropping์„ ๊ถŒ์žฅํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต์ด ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ฆฌ์›Œ๋“œ ์ˆ˜์‹์˜ ๋‘˜์งธ์ค„์€ ํ‹€๋ฆฐ ์˜ˆ์ธก์— ํ•ด๋‹นํ•˜๋Š” ๊ฒฝ์šฐ๋ฅผ ์˜๋ฏธํ•˜๋Š”๋ฐ, ์ด๋•Œ ํ‹€๋ฆฐ ์˜ˆ์ธก์— ๋Œ€ํ•ด ๊ฐ๋งˆ์˜ ํŽ˜๋„ํ‹ฐ๋ฅผ ์ฃผ์–ด์„œ ์ •ํ™•๋„๋ฅผ ๋†’์ด๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต์ด ์ง„ํ–‰๋˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

4. Experiment & Result

Experimental Setup

CIFAR-10, CIFAR-100์˜ ๊ฒฝ์šฐ pretrained resnet์€ resnet-32์™€ resnet-110์œผ๋กœ ์‹คํ—˜์ด ์ง„ํ–‰๋˜์—ˆ์œผ๋ฉฐ, ImageNet์˜ ๊ฒฝ์šฐ pretrained resnet์€ resnet-101์œผ๋กœ ์‹คํ—˜์ด ์ง„ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Policy Network์˜ ๊ฒฝ์šฐ CIFAR์— ๋Œ€ํ•ด์„œ๋Š” resnet-8์„ ์‚ฌ์šฉํ•˜์˜€๊ณ  ImageNet์— ๋Œ€ํ•ด์„œ๋Š” resnet-10์„ ์‚ฌ์šฉํ•˜์˜€๋Š”๋ฐ, ImageNet์—์„œ๋Š” input image๋ฅผ 112x112๋กœ downsamplingํ•˜์—ฌ policy network์— ์ „๋‹ฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Result

ํ•ด๋‹น ๋…ผ๋ฌธ์€ ์ž„์˜๋กœ residual block์„ drop์‹œํ‚จ random ๋ฐฉ๋ฒ•๊ณผ ์ˆœ์„œ์ƒ ์•ž์— ์žˆ๋Š” residual block์„ drop ์‹œํ‚จ first ๋ฐฉ๋ฒ• ๋“ฑ์„ baseline์œผ๋กœ ํ•˜๊ณ  ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” BlockDrop ๋ฐฉ๋ฒ•๊ณผ์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜์˜€์Šต๋‹ˆ๋‹ค. CIFAR-10์—์„œ ResNet-32๋ฅผ pretrained backbone์œผ๋กœ ํ•˜๋Š” ๊ฒฝ์šฐ Full ResNet์˜ ์„ฑ๋Šฅ(accuracy)์ด 92.3์ด์—ˆ๋‹ค๋ฉด FirstK๋Š” 16.6์˜ ์„ฑ๋Šฅ์„ ๋ณด์˜€๊ณ  RandomK๋Š” 20.5์˜ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ BlockDrop์€ 88.6์˜ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

Policy vs. Heuristic

์ธํผ๋Ÿฐ์Šค ๊ณผ์ •์—์„œ์˜ ์†๋„ ๊ฐœ์„ ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™” ๊ธฐ๋ฒ• ์ค‘์—์„œ ACT, SACT, PFEC, LCCL์„ baseline ๋ชจ๋ธ๋กœ ํ•˜์—ฌ FLOPs-accuracy ์ปค๋ธŒ๋ฅผ ๋น„๊ตํ•˜์˜€์œผ๋ฉฐ, SACT์™€ ๋™์ผ ์ˆ˜์ค€์˜ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด์„œ 50%์˜ FLOPs๋งŒ์„ ํ•„์š”๋กœ ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Policy vs. SOTA

5. Conclusion

๋ณธ ๋…ผ๋ฌธ์€ ResNet์„ ํ™œ์šฉํ•  ๋•Œ ๋” ๋น ๋ฅธ ์†๋„๋กœ inferenceํ•  ์ˆ˜ ์žˆ๋„๋ก Residual Block์„ instance specificํ•˜๊ฒŒ dropํ•˜๋Š” BlockDrop์„ ์ œ์•ˆํ•˜์˜€๊ณ  CIFAR ๋ฐ ImageNet์— ๋Œ€ํ•œ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜์—ฌ efficiency-accuracy trade-off์—์„œ ์ƒ๋‹นํ•œ ์ด์ ์ด ์žˆ์Œ์„ ๊ด€์ฐฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์•„๋ž˜์˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด BlockDrop์˜ policy๊ฐ€ ์ด๋ฏธ์ง€์˜ semanticํ•œ information์„ ์„ฑ๊ณต์ ์œผ๋กœ ์ธ์ฝ”๋”ฉํ•œ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Qualitative Result

Take home message (์˜ค๋Š˜์˜ ๊ตํ›ˆ)

์ด ๋…ผ๋ฌธ์€ inference ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•ด instance specificํ•˜๊ฒŒ residual block์„ dropํ•˜๋Š” ๋ฐฉ๋ฒ•์„ RL ๊ธฐ๋ฐ˜์œผ๋กœ ํ™œ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Author / Reviewer information

Author

์ดํ˜„์ˆ˜ (Hyunsu Rhee)

  • KAIST AI

  • ryanrhee@kaist.ac.kr

Reviewer

  1. Korean name (English name): Affiliation / Contact information

  2. Korean name (English name): Affiliation / Contact information

  3. โ€ฆ

Reference & Additional materials

Last updated

Was this helpful?