GraSP [Kor]

Chaoqi Wang / Picking Winning Tickets Before Training by Preserving Gradient Flow / ICLR 2020

1. Problem definition

์ด๋ฏธ์ง€ ๋ชจ๋ธ์˜ ๋น ๋ฅธ ํ•™์Šต๊ณผ ๋ณ„๊ฐœ๋กœ, ๋งŽ์€ ์—ฐ์‚ฐ๋Ÿ‰๊ณผ ์ฆ๊ฐ€ํ•˜๋Š” ๋ชจ๋ธ ํฌ๊ธฐ, ๊ทธ๋ฆฌ๊ณ  ์ฆ๊ฐ€ํ•˜๋Š” ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ๋Š” ๋ชจ๋ธ ํ•™์Šต(Training)๊ณผ ์ถ”๋ก (inference)์— ๋งŽ์€ ์ œํ•œ์ ์„ ์ค๋‹ˆ๋‹ค. Figure 1: Parameter / FLops figure ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ์— ์‚ฌ์šฉ๋˜๋Š” parameter๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒฝ๋Ÿ‰ํ™” ๊ธฐ๋ฒ•์ด ๋‹ค์–‘ํ•œ ๋ฐฉํ–ฅ์œผ๋กœ ์—ฐ๊ตฌ๋˜์–ด ์™”์Šต๋‹ˆ๋‹ค.

  • (Han et al.)์€ ์‹ค์ œ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์˜ weight์ด ๊ต‰์žฅํžˆ ํฌ์†Œํ•˜๋‹ค(sparse)ํ•˜๋‹ค๋Š” ์ ์„ ์ด์šฉํ•ด์„œ ๋น„ํ™œ์„ฑํ™”๋œ ๋‰ด๋Ÿฐ๊ณผ ์ด๋“ค์˜ ์—ฐ๊ฒฐ์ ์„ ์ œ๊ฑฐํ•˜๋Š” ๊ฐ€์ง€์น˜๊ธฐ(Model Pruning) ๊ธฐ๋ฐ˜์˜ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • (Hinton et al.)์€ ๊ฒฝ๋Ÿ‰ํ™”๋œ ๋ชจ๋ธ์ด ์‹ค์ œ ํ•™์Šต๋œ ๋ชจ๋ธ๊ณผ ํ•ฉ์˜๋ฅผ ํ•˜๋„๋ก ํ•˜์—ฌ, ํฐ ๋ชจ๋ธ์˜ ์ง€์‹์„ ์ž‘์€ ๋ชจ๋ธ๋กœ ์˜ฎ๊ธฐ๋Š” ์ง€์‹ ์ฆ๋ฅ˜(Knowledge Distillation) ๊ธฐ๋ฐ˜์˜ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • (Polino et al.)์€ ๋ชจ๋ธ parameter๋ฅผ ๋” ์ ์€ ๋น„ํŠธ์˜ ํ˜•ํƒœ๋กœ ๋ฐ”๊พธ๋Š” ๋ชจ๋ธ ์–‘์žํ™”(Quantization)๊ธฐ๋ฐ˜์˜ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฐ ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™” ๊ธฐ๋ฒ•์„ ํ†ตํ•˜์—ฌ ์ถ”๋ก  ์‹œ ์ž์› ํ™œ์šฉ๋Ÿ‰(test-time resource requirement)๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๊พธ์ค€ํžˆ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

2. Motivation

  1. Lottery ticket hypothesis ํ•˜์ง€๋งŒ ์œ„ ๋ฐฉ๋ฒ•๋ก ๋“ค์€ ํ•™์Šต๋œ ๋ชจ๋ธ์˜ parameter๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ชจ๋ธ์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ์ค„์—ฌ๋‚˜๊ฐ€๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ, ์—ฌ์ „ํžˆ ํ•™์Šต ์‹œ ์ž์› ํ™œ์šฉ๋Ÿ‰(training-time resource requirement)์€ ํฐ ์ƒํƒœ๋กœ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค. ์ด์— ์—ฐ๊ตฌ์ž๋“ค์€ ์œ„์—์„œ ๊ฒฝ๋Ÿ‰ํ™”๋œ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋งŒ์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์‹œ ํ•™์Šตํ•ด๋ณธ๋‹ค๋ฉด training-time resource๋ฅผ ํฌ๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ์„๊ฒƒ์ด๋ผ ์ถ”์ธกํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ฒฝ๋Ÿ‰ํ™”๋œ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋กœ ํ•™์Šตํ•œ ๊ฒฝ์šฐ ์‹ฌ๊ฐํ•œ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋’ค๋”ฐ๋ž์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฐ ์ƒํ™ฉ์—์„œ, ICLR 2019๋…„์— ๋ฐœํ‘œ๋œ The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks๋ผ๋Š” ์—ฐ๊ตฌ์—์„œ Iterative Manitude Pruning์ด๋ผ๋Š” ๋ฐฉ๋ฒ•๊ณผ Re-Init์ด๋ผ๋Š” ๋ฐฉ๋ฒ• ๋‘๊ฐ€์ง€๋ฅผ ์‚ฌ์šฉํ•ด์„œ, ๊ฒฝ๋Ÿ‰ํ™”๋œ ๋ชจ๋ธ ๊ตฌ์กฐ๋กœ ์›๋ณธ์˜ ์„ฑ๋Šฅ์„ ๊ฑฐ์˜ ๋”ฐ๋ผ์žก๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ํฐ ๋ชจ๋ธ์—์„œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์—ฌ ์ ์ง„์ ์œผ๋กœ ์ค„์—ฌ๋‚˜๊ฐ€๋Š” ๋ฐฉ๋ฒ•์ด๊ธฐ์—, ์กฐ๊ธˆ ๋” ํšจ์œจ์ ์ธ ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • (Morcos et al.)์€ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ๋‹จ์ˆœํžˆ CIFAR-10๊ณผ MNIST์—์„œ๋งŒ ์ž‘๋™ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ, ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋ชจ๋ธ, ์˜ตํ‹ฐ๋งˆ์ด์ €์—์„œ๋„ ์ž‘๋™ํ•จ์„ ์‹คํ—˜์ ์œผ๋กœ reportํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • (Haoran et al.)์€ ํฐ learning rate๋กœ ๋น ๋ฅด๊ฒŒ ํ•™์Šตํ•˜๊ณ  ๊ฒฝ๋Ÿ‰ํ™”ํ•  ๊ตฌ์กฐ(Mask)๋ฅผ ๊ฐ€์ ธ์™€์„œ ๋” ๋น ๋ฅธ ํ•™์Šต์‹œ ์••์ถ• ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค. Figure 2: Lottery ticket hypothesis

Idea

๋ณธ ๋…ผ๋ฌธ์€, ๋‹จ์ˆœํžˆ ๋ชจ๋ธ parameter์˜ ํฌ๊ธฐ(magnitude)๋‚˜ ํ™œ์„ฑ๋„(activation) ๊ธฐ๋ฐ˜์œผ๋กœ ์—ฐ๊ฒฐ์ ์„ ๋Š์–ด๋‚ด๋Š” ๊ฒƒ์ด ์•„๋‹Œ, gradient๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋‰ด๋Ÿฐ์˜ output์˜ ํฌ๊ธฐ๊ฐ€ ์ž‘์•„๋„, ํ•ด๋‹น ๋‰ด๋Ÿฐ์— ์—ฐ๊ฒฐ๋œ ํ•˜์œ„ ๋‰ด๋Ÿฐ๋“ค์—๊ฒŒ ์ •๋ณด ์ „๋‹ฌ(information flow)์„ ํ•ด์ฃผ๋Š” ์ค‘์š”ํ•œ node์ผ์ˆ˜ ์žˆ๋‹ค๋Š” ์•„์ด๋””์–ด๊ฐ€ ๊ทธ ๊ธฐ๋ฐ˜์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ GraSP๋Š”, gradient์˜ norm์„ ๊ณ„์‚ฐํ•˜๊ณ , ๊ทธ norm์— ๊ฐ€์žฅ ๋ณ€ํ™”๋ฅผ ๋œ ์ฃผ๋Š” connection์„ ์ œ๊ฑฐํ•˜๋Š” ๊ฒฝ๋Ÿ‰ํ™”๋œ ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. Figure 3: GraSP

3. Method

๋จผ์ €, gradient norm์„ ์ˆ˜์‹ํ™” ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. Figure 4: formula1 ์—ฌ๊ธฐ์„œ, (LeCun et al.)๋“ฑ์ด ๋ฐํ˜€๋‚ธ, parameter์˜ ๋ณ€ํ™”(perturbation)์ด ์žˆ์„๋•Œ gradient norm ๋ณ€ํ™”๋Ÿ‰์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. Figure 5: formula2 ฮด\delta๋งŒํผ์˜ ๋ณ€ํ™”๋ฅผ ์ฃผ์—ˆ์„๋•Œ gradient norm์˜ ๋ณ€ํ™”๋Ÿ‰ S(ฮด)S(\delta)๋Š” 2ฮดTHg+O(โˆฅฮดโˆฅ2)2{\delta}^{T}Hg+\mathcal{O}({\| \delta \|}^{2})๋ผ๋Š” ๊ฒƒ์ด์ฃ . H๋Š” parameter์˜ hessian matrix์ด๊ณ , g๋Š” gradient๊ฐ’์ž…๋‹ˆ๋‹ค. ํ•ด๋‹น ์ˆ˜์‹์„ pruning task์— ๋งž๊ฒŒ ๋ณ€ํ™”์‹œํ‚จ๋‹ค๋ฉด ๋ณ€ํ™”๋Ÿ‰ ์ž์ฒด๋Š” ์›๋ž˜ parameter ํฌ๊ธฐ์˜ ์Œ์ˆ˜๋กœ ๊ณ ์ •์ด๊ธฐ ๋•Œ๋ฌธ์—, ๋’ค์ชฝ O(โˆฅฮดโˆฅ2)\mathcal{O}({\| \delta \|}^{2}) term์€ ์‚ฌ๋ผ์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ๋…ผ๋ฌธ์—์„œ๋Š”, ๊ฐ parameter์˜ ์ค‘์š”๋„๋ฅผ ๋‹ค์Œ ์ˆ˜์‹๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐํ•˜๊ณ , ๊ฐ€์žฅ ํฐ ์ค‘์š”๋„๋ฅผ ๊ฐ€์ง„ parameter๋ฅผ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ parameter๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Figure 6: formula3 ์ฆ‰, ํ•œ๋ฒˆ์˜ forward pass์™€ gradient ๊ณ„์‚ฐ์„ ํ†ตํ•ด ๊ฒฝ๋Ÿ‰ํ™”๋œ ๊ตฌ์กฐ๋ฅผ ๋ฐœ๊ฒฌํ•ด๋‚ผ์ˆ˜ ์žˆ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.

4. Experiment & Result

๋ณธ ๋…ผ๋ฌธ์€ ๋น„์Šทํ•œ ์‹œ๊ธฐ์— ๋‚˜์˜จ SNIP์ด๋ผ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ์„ฑ๋Šฅ ๋น„๊ต๋ฅผ ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ชจ๋‘ ํ•œ๋ฒˆ์˜ ํ•™์Šต์„ ํ†ตํ•ด์„œ ๊ฒฝ๋Ÿ‰ํ™”๋œ ๊ตฌ์กฐ๋ฅผ ์ฐพ์•„๋‚ด๋Š” ๋ฐฉ๋ฒ•์ด๊ณ , ๊ทธ๋ ‡๊ธฐ์— ์ตœ์ ์˜ ์„ฑ๋Šฅ๋ณด๋‹ค๋Š”, ์–ผ๋งˆ๋‚˜ ๊ธฐ์กด ์•Œ๊ณ ๋ฆฌ์ฆ˜(Lottery ticket, Deep Compression)๋“ค์˜ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋Š”์ง€๊ฐ€ ์ค‘์š”ํ•œ ์ง€ํ‘œ์ž…๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Result

Figure 7: chart1
Figure 8: chart2
Figure 9: chart3

๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹๊ณผ, ๋ชจ๋ธ์—์„œ ๊ฒฝ๋Ÿ‰ํ™” ์„ฑ๋Šฅ์ด stableํ•˜๊ฒŒ ์ข‹๊ฒŒ ๋‚˜์˜ด์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, Lottery Ticket Hypothesis๋‚˜(LT), OBD, MLPrune์™€ ๊ฐ™์ด iterativeํ•˜๊ณ , training-time resource๊ฐ€ ๋งŽ์ด ํ•„์š”ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅ๊ณผ ํฐ ์ฐจ์ด๊ฐ€ ์—†์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹จ์ˆœํžˆ ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ํ•™์Šต ์ปค๋ธŒ์™€, ๋…ผ๋ฌธ์—์„œ ๊ฐ•์กฐํ•˜์˜€๋˜ gradient norm์ด ์–ด๋–ป๊ฒŒ ๋ณ€ํ™”ํ•˜๋Š”์ง€ ๋˜ํ•œ ์ €์ž๋Š” ์‹คํ—˜ ๊ฒฐ๊ณผ๋กœ ์ œ๊ณตํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ figure์˜ ์šฐ์ธก ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด์‹œ๋ฉด, GraSP ๋ฐฉ๋ฒ•์œผ๋กœ ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™”๋ฅผ ์ง„ํ–‰ํ•˜์˜€์„๋•Œ gradient norm์ด ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋ก ์— ๋น„ํ•ด ํฌ๊ฒŒ ์œ ์ง€๊ฐ€ ๋จ์„ ์ €์ž๋Š” ์‹คํ—˜์ ์œผ๋กœ ์ž…์ฆํ•˜์˜€์Šต๋‹ˆ๋‹ค. Figure 9: figure1 ๋˜ํ•œ ๊ฐ layer๋งˆ๋‹ค ์–ผ๋งˆ๋‚˜ ๋งŽ์€ parameter๊ฐ€ ๊ฐ€์ง€์น˜๊ธฐ ๋‹นํ–ˆ๋Š”์ง€ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ณดํ†ต ๊ฐ€์ง€์น˜๊ธฐ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ก ์€, convolution ์—ฐ์‚ฐ์˜ ์ƒ์œ„ ๋‹จ๊ณ„ (Layer 10~)์˜ ๋‰ด๋Ÿฐ์„ ๋งŽ์ด ๊ฐ€์ง€์น˜๊ธฐ ํ•œ๋‹ค๊ณ  ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒ์œ„ ๋‹จ๊ณ„๋กœ ์˜ฌ๋ผ๊ฐˆ์ˆ˜๋ก ๋‰ด๋Ÿฐ ์•„์›ƒํ’‹์ด sparseํ•ด์ง€๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ GraSP ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ฒฝ์Ÿ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ SNIP ๋Œ€๋น„ ๋งŽ์€ ์ฑ„๋„์„ ๋‚จ๊ธด๋‹ค๋Š” ๊ฒฐ๊ณผ๋ฅผ ์•„๋ž˜ figure์—์„œ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Figure 10: figure2

5. Conclusion

ํ•œ๋ฒˆ์˜ ํ•™์Šต๋งŒ์œผ๋กœ ํšจ์œจ์ ์ธ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ๋ฐœ๊ฒฌํ• ์ˆ˜ ์žˆ๋‹ค๋Š” GraSP ์•Œ๊ณ ๋ฆฌ์ฆ˜์€, ๋‹จ์ˆœํžˆ ๋‰ด๋Ÿฐ์ด ํ™œ์„ฑํ™” ๋œ๊ฒƒ๋งŒ ์ค‘์š”ํ•œ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋‰ด๋Ÿฐ๊ณผ ๋‰ด๋Ÿฐ ์‚ฌ์ด์— ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์ •๋ณด๋ฅผ ์œ ์ง€ํ•˜๋А๋ƒ๊ฐ€ ์ค‘์š”ํ•˜๋‹ค๋Š” gradient flow ๊ธฐ๋ฐ˜ ๊ฒฝ๋Ÿ‰ํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด ์—ฐ๊ตฌ๋Š” ํ›„์† ์—ฐ๊ตฌ์— ์˜ํ–ฅ์„ ์ฃผ์–ด, ํ•œ๋ฒˆ๋„ ํ•™์Šตํ•˜์ง€ ์•Š๊ณ ๋„ ๊ฒฝ๋Ÿ‰ํ™”๋œ ๊ตฌ์กฐ๋ฅผ ๋ฐœ๊ฒฌํ•˜๋Š” ๋ฐฉ๋ฒ• ๋˜ํ•œ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Take home message (์˜ค๋Š˜์˜ ๊ตํ›ˆ)

ํ™œ์„ฑํ™”๋˜์ง€ ์•Š์€ ๋‰ด๋Ÿฐ์ด๋ผ๋„ ํ•˜์œ„ ๋‰ด๋Ÿฐ์—๊ฒŒ ๋” ๋งŽ์€ ์ •๋ณด๋ฅผ ์ค„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํŠนํžˆ ๋‹จ ํ•œ๋ฒˆ์œผ๋กœ ๊ฒฝ๋Ÿ‰ํ™”๋œ ๊ตฌ์กฐ๋ฅผ ์ฐพ์•„์•ผ ํ•˜๋Š” ์ƒํ™ฉ์—์„œ๋Š” ๋‰ด๋Ÿฐ ํ™œ์„ฑ๋„๋งŒ์ด ๋‹ต์ด ์•„๋‹ˆ๋‹ค.

Author / Reviewer information

Author

์ด์Šน์šฐ(Seungwoo Lee)

  • Affiliation (KAIST EE)

  • Research interest in Graph Neural Network

Reviewer

  1. Korean name (English name): Affiliation / Contact information

  2. Korean name (English name): Affiliation / Contact information

  3. ...

Reference & Additional materials

  1. Chaoqi Wang, Guodong Zhang, Roger Grosse, Picking Winning Tickets Before Training by Preserving Gradient Flow, In ICLR 2020

  2. Jonathan Frankle, Michael Carbin, The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, In ICLR 2019

  3. Namhoon Lee, Thalaiyasingam Ajanthan, Philip H. S. Torr, SNIP: Single-shot Network Pruning based on Connection Sensitivity, In ICLR 2019

  4. Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli, Pruning neural networks without any data by iteratively conserving synaptic flow, In NeurIPS 2021

Last updated

Was this helpful?