Improving the Transferability of Adversarial Samples With Adversarial Transformations [Kor]

Wu, Weibin, et al. / Improving the Transferability of Adversarial Samples with Adversarial Transformations / CVPR2021

1. Problem definition

์ ๋Œ€์  ์˜ˆ์ œ (Adversarial Samples)

์ ๋Œ€์  ์˜ˆ์ œ๋Š” ์‚ฌ๋žŒ์˜ ๋ˆˆ์œผ๋กœ๋Š” ์ธ์‹ํ•  ์ˆ˜ ์—†๋Š” ๋ฏธ์„ธํ•œ ์žก์Œ(perturbation)์„ ์˜๋„์ ์œผ๋กœ ์›๋ž˜์˜ ์ž…๋ ฅ์— ๋”ํ•ด ์ƒ์„ฑํ•œ ์˜ˆ์ œ์ด๋‹ค. ์ด๋ ‡๊ฒŒ ์ƒ์„ฑ๋œ ์˜ˆ์ œ๋Š” ์‹ ๊ฒฝ๋ง์„ ๋†’์€ ํ™•๋ฅ ๋กœ ์˜ค๋ถ„๋ฅ˜ํ•˜๋„๋ก ํ•œ๋‹ค.

๊ตฌ์ฒด์ ์œผ๋กœ ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ์›๋ณธ ์ด๋ฏธ์ง€ $ x $์— ๋ฏธ์„ธํ•œ ์žก์Œ $ \delta $๋ฅผ ๋”ํ•ด ์ ๋Œ€์  ์˜ˆ์ œ $ x_{adv} $๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

xadv=x+ฮดx_{adv} = x + \delta

adv_example

์ ๋Œ€์  ๊ณต๊ฒฉ (Adversarial Attacks)

์ ๋Œ€์  ๊ณต๊ฒฉ์€ ์˜๋„์ ์œผ๋กœ ์ƒ์„ฑ๋œ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋„คํŠธ์›Œํฌ๊ฐ€ ์˜ค์ž‘๋™ํ•˜๋„๋ก ํ•˜๋Š” ๊ณต๊ฒฉ์ด๋‹ค. ์ ๋Œ€์  ๊ณต๊ฒฉ์€ ๊ณต๊ฒฉ์ž๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋„คํŠธ์›Œํฌ์˜ ์ •๋ณด์— ๋”ฐ๋ผ ํฌ๊ฒŒ ๋‘๊ฐ€์ง€๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค.

  • white box ๊ณต๊ฒฉ: ๊ณต๊ฒฉ์ž๊ฐ€ ํƒ€๊ฒŸ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋‚˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์•„๋Š” ํ™˜๊ฒฝ์—์„œ ํ•˜๋Š” ์ ๋Œ€์  ๊ณต๊ฒฉ.

  • black box ๊ณต๊ฒฉ: ๊ณต๊ฒฉ์ž๊ฐ€ ํƒ€๊ฒŸ ๋ชจ๋ธ์˜ ๋‚ด๋ถ€ ์ •๋ณด๋ฅผ ์•Œ ์ˆ˜ ์—†๋Š” ํ™˜๊ฒฝ์—์„œ ํ•˜๋Š” ์ ๋Œ€์  ๊ณต๊ฒฉ.

์ ๋Œ€์  ์˜ˆ์ œ ์ƒ์„ฑ

์›๋ณธ ์ด๋ฏธ์ง€ $ x $, ์›๋ณธ ํด๋ž˜์Šค $ y $, ์ ๋Œ€์  ์˜ˆ์ œ $ x_{adv} $, ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๊ธฐ (image classifier) $ f(x) $ ๋ผ๊ณ  ํ•˜์ž. ์ ๋Œ€์  ์˜ˆ์ œ๋Š” ๋‹ค์Œ ๋‘ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•ด์•ผ ํ•œ๋‹ค.

argmaxf(xadv)โ‰ y,arg max f(x_{adv}) \neq y,

โˆฃโˆฃxadvโˆ’xโˆฃโˆฃpโ‰คฯต||x_{adv} - x ||_p \leq \epsilon

์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๊ธฐ $ f $์˜ ์†์‹ค ํ•จ์ˆ˜ (loss function)์„ $ J(f(x),y) $ ๋ผ๊ณ  ํ•  ๋•Œ, ์ ๋Œ€์  ์˜ˆ์ œ์˜ ์ƒ์„ฑ ๊ณผ์ • (๊ณต๊ฒฉ ๊ณผ์ •) ์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

maxxadvJ(f(xadv),y),max_{x_{adv}} J(f(x_{adv}),y),

s.t.โˆฃโˆฃxadvโˆ’xโˆฃโˆฃpโ‰คฯตs.t. ||x_{adv} - x ||_p \leq \epsilon

์ด์™€ ๊ฐ™์ด, ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œ์ผœ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

2. Motivation

์ „์ด์„ฑ ๊ธฐ๋ฐ˜ ์ ๋Œ€์  ๊ณต๊ฒฉ (Transfer-based Attack)

์†Œ์Šค ๋ชจ๋ธ์„ ์ด์šฉํ•ด ์ƒ์„ฑํ•œ ์ ๋Œ€์  ์˜ˆ์ œ๋กœ ํƒ€๊ฒŸ ๋ชจ๋ฐ์„ ๊ต๋ž€ํ•˜๋Š” ๊ณต๊ฒฉ์ด๋‹ค. black box ๊ณต๊ฒฉ์—์„œ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ํƒ€๊ฒŸ ๋ชจ๋ธ์—๋Š” ์ ‘๊ทผ ํ•  ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ, ์ „์ด์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ณต๊ฒฉํ•  ์ˆ˜ ์žˆ๋‹ค. ์ „์ด์„ฑ์ด ๋†’์€ ์ ๋Œ€์  ์˜ˆ์ œ๋Š” ์ „์ด์„ฑ ๊ธฐ๋ฐ˜ ์ ๋Œ€์  ๊ณต๊ฒฉ์˜ ์„ฑ๊ณต๋ฅ ์„ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ ๋Œ€์  ์˜ˆ์ œ๊ฐ€ ์†Œ์Šค ๋ชจ๋ธ์— ๊ณผ์ ํ•ฉ(overfitting)๋œ ๊ฒฝ์šฐ, ๋‚ฎ์€ ์ „์ด์„ฑ์„ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค.

  • ** ์ „์ด์„ฑ ** : ์–ด๋–ค ๋ชจ๋ธ A (์†Œ์Šค๋ชจ๋ธ)๋ฅผ ์ด์šฉํ–‰ ์ƒ์„ฑํ•œ ์ ๋Œ€์  ์˜ˆ์ œ๊ฐ€ ๊ตฌ์กฐ๊ฐ€ ๋‹ค๋ฅธ ์—ฌ๋Ÿฌ ๋ชจ๋ธ _ B, C, D, E, ... _(ํƒ€๊ฒŸ ๋ชจ๋ธ)์— ๋Œ€ํ•ด์„œ๋„ ์ ๋Œ€์ ์œผ๋กœ ์ž‘์šฉํ•˜๋Š” ํŠน์„ฑ.

์ž…๋ ฅ์˜ ๋‹ค๋ณ€ํ™” (Input Transformation)

์ ๋Œ€์  ์˜ˆ์ œ์˜ ์ „์ด์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ, ์ ๋Œ€์  ์˜ˆ์ œ์˜ ์ƒ์„ฑ๊ณผ์ •์—์„œ ์†Œ์Šค ๋ชจ๋ธ์˜ ์ž…๋ ฅ์„ ๋ณ€ํ™˜ํ•˜์—ฌ ์†Œ์Šค ๋ชจ๋ธ์— ๊ณผ์ ํ•ฉ๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

  • Translate-Invariant Method (TIM) ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ $ x $์ถ•, $ y $์ถ•์œผ๋กœ ๋ช‡ ํ”ฝ์…€์”ฉ ํ‰ํ–‰ ์ด๋™ ์‹œํ‚จ ์—ฌ๋Ÿฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•œ ํ›„, ๊ทธ ์ด๋ฏธ์ง€๋“ค์„ ์ด์šฉํ•˜์—ฌ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

  • Scale-Invariant Method (SIM) ์ž…๋ ฅ ์ด๋ฏธ์ง€๊ฐ’์˜ ํ”ฝ์…€ ๊ฐ’์˜ ํฌ๊ธฐ๋ฅผ ์กฐ์ ˆํ•˜์—ฌ ์ƒ์„ฑํ•œ ์—ฌ๋Ÿฌ ์žฅ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•œ ํ›„, ๊ทธ ์ด๋ฏธ์ง€ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

  • Diversity Input Method (DIM) ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ๋ฌด์ž‘์œ„ ๋ฆฌ์‚ฌ์ด์ง• (resizing) ํ•œ ํ›„, 0 ๊ฐ’์œผ๋กœ ํŒจ๋”ฉ (padding) ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๊ทธ ์ด๋ฏธ์ง€๋กœ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

Idea

์†Œ๊ฐœํ•˜๋Š” ๋…ผ๋ฌธ์—์„œ๋Š” ์ž…๋ ฅ ์ด๋ฏธ์ง€๊ฐ€ ์†Œ์Šค ๋ชจ๋ธ์— ๊ณผ์ ํ•ฉ๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ์ ๋Œ€์  ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ (Adversarial Transformation Network)๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋ชจ๋“  ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ๊ฐ™์€ ๋ณ€ํ™˜์„ ์ ์šฉํ•˜๊ฑฐ๋‚˜, ๋ณ€ํ™˜์˜ ์ •๋„๋งŒ ๋ฐ”๊ฟ”์„œ ์ ์šฉํ•˜๋Š” ๊ฒƒ์€ ๊ทธ ๋ณ€ํ™˜ ์ž์ฒด์— ๊ณผ์ ํ•ฉ๋˜์–ด, ์ „์ด์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐ ํ•œ๊ณ„๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค. ๊ฐ ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ์ ํ•ฉํ•œ ๋ณ€ํ™˜์„ ์ ์šฉํ•˜์—ฌ ํšจ๊ณผ์ ์œผ๋กœ ์†Œ์Šค ๋ชจ๋ธ์— ๋Œ€ํ•œ ๊ณผ์ ํ•ฉ์„ ํ”ผํ•˜๊ณ  ์ ๋Œ€์  ์˜ˆ์ œ์˜ ์ „์ด์„ฑ์„ ๋†’์ด๊ณ ์ž ํ•œ๋‹ค.

overview

3. Method

์ ๋Œ€์  ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ (Adversarial Transformation Network)

์ ๋Œ€์  ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ๋Š” 2๊ฐœ ์ธต์˜ CNN์œผ๋กœ ์ด๋ฃจ์–ด์ ธ blur, sharpening ๋“ฑ์˜ ๋ณ€ํ™˜ ํšจ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค. illustration

์ ๋Œ€์  ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ๋Š” ์ ๋Œ€์  ์˜ˆ์ œ๊ฐ€ ์†Œ๊ฐœํ•˜๋Š” ๋…ผ๋ฌธ์—์„œ๋Š” ์ ๋Œ€์  ์˜ˆ์ œ์˜ ์ƒ์„ฑ๊ณผ์ •์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ธ๋‹ค. H๋Š” ์ ๋Œ€์  ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

minฮธHmaxxadvJ(f(H(xadv)),y),(1)min_{\theta_H} max_{x_{adv}} J(f(H(x_{adv})),y), (1)

s.t.โˆฃโˆฃxadvโˆ’xโˆฃโˆฃpโ‰คฯต,s.t. ||x_{adv} - x ||_p \leq \epsilon,

argmaxf(H(x))=y.arg max f(H(x)) = y .

์†Œ์Šค ๋ชจ๋ธ์˜ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œ์ผœ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์—…๋ฐ์ดํŠธํ•œ ํ›„, ๋ฐ˜๋Œ€๋กœ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ๊ฐ์†Œ์‹œ์ผœ ์ ๋Œ€์  ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ์˜ ํŒŒ๋ฆฌ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•œ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•™์Šต๋œ ์ ๋Œ€์  ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ๋Š” ์ ๋Œ€์ ์ธ ์˜ˆ์ œ๊ฐ€ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๋ถ„๋ฅ˜๋  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๋ณ€ํ™˜์„ ํ‘œํ˜„ํ•œ๋‹ค.

๊ฐ ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ํ•™์Šต๋œ ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ๋Š” ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ณผ์ •์—์„œ ์ ๋Œ€์  ์˜ˆ์ œ๊ฐ€ ์ ๋Œ€์ ์ด์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๋ณ€ํ™˜์„ ์ ์šฉํ•œ๋‹ค. ์ด๋Š” ์ ๋Œ€์  ์˜ˆ์ œ ์ƒ์„ฑ๊ณผ์ •์—์„œ ๋„์›€์ด ๋˜์ง€ ์•Š๋Š” (harmful)ํ•œ ๋ณ€ํ™˜์„ ์ ์šฉ์‹œ์ผœ, ์ƒ์„ฑ๋œ ์ ๋Œ€์  ์˜ˆ์ œ๊ฐ€ ์–ด๋– ํ•œ ์ด๋ฏธ์ง€ ์™œ๊ณก์—๋„ ๊ฐ•ํ•˜๋„๋ก (robust) ํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค.

Since only adopting a fixed transformation may lead to poor generalization to unknown ones, we endeavor to address the issue of explicitly modeling the applied image transformations by figuring out the most harmful image transformations to each adversarial image. We expect that if the generated adversarial samples can resist the toughest image deformations, they can also survive under other weaker distortions.

์œ„์˜ ์‹(1)์—์„œ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” inner loop ์—์„œ ์‚ฌ์šฉ๋˜๋Š” loss๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

Lfool=โˆ’J(f(T(xadv)),y)โˆ’ฮฒJ(f(xadv),y).L_{fool} = - J(f(T(x_{adv})),y) - \beta J(f(x_{adv}),y).

$ T $๋Š” ๋ณ€ํ™˜์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์ ๋Œ€์  ์˜ˆ์ œ๊ณผ ๋ณ€ํ™˜์„ ์ ์šฉํ•œ ์ ๋Œ€์  ์˜ˆ์ œ์— ๋Œ€ํ•œ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์ฆ๊ฐ€ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ ๋Œ€์  ์˜ˆ์ œ $ x_{adv} $๋ฅผ ์—…๋ฐ์ดํŠธํ•œ๋‹ค.

์‹ (1)์˜ outer loop ์—์„œ ์‚ฌ์šฉ๋˜๋Š” loss๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

LT=J(f(T(xadv)),y)+ฮฑ1J(f(T(x)),y)+ฮฑ2โˆฃโˆฃxadvโˆ’T(xadv)โˆฃโˆฃ2.L_{T} = J(f(T(x_{adv})),y) + \alpha_1 J(f(T(x)),y) +\alpha_2 ||x_{adv}-T(x_{adv})||^2.

์ด๋ฏธ์ง€์˜ ์ •๋ณด๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ์ ๋Œ€์  ์˜ˆ์ œ $ x_{adv} $์˜ ์ ๋Œ€์ ์ธ ํšจ๊ณผ๋ฅผ ์—†์• ๋„๋ก ํ•˜๋Š” $ T $๋ฅผ ์ฐพ๋Š”๋‹ค. ์ด๋Ÿฌํ•œ ๋ณ€ํ™˜์€ ์ž…๋ ฅ ์ด๋ฏธ์ง€ ํ•˜๋‚˜ ํ•˜๋‚˜์— ์ ์šฉ๋˜์–ด adaptiveํ•œ ๋ณ€ํ™˜์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

์ ๋Œ€์  ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. algorithm1

์ ๋Œ€์  ์˜ˆ์ œ ์ƒ์„ฑ ( Syntehsizing Adversarial Samples)

์ ๋Œ€์  ์˜ˆ์ œ์˜ ์ƒ์„ฑ๊ณผ์ •์€ ๋‹ค๋ฅธ ์ž…๋ ฅ ๋‹ค๋ณ€ํ™” ๋ฐฉ๋ฒ•๊ณผ ์œ ์‚ฌํ•˜๋‹ค. ์ฐจ์ด์ ์œผ๋กœ, ์ƒ์„ฑ๊ณผ์ •์—์„œ ์‚ฌ์šฉ๋˜๋Š” loss์— ๋‘๋ฒˆ์งธ term์„ ์ถ”๊ฐ€ํ–ˆ๋‹ค.

Lattack=J(f(xadv),y)+ฮณJ(f(T(xadv)),y)L_{attack} = J(f(x_{adv}),y) + \gamma J(f(T(x_{adv})),y)

์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. algorithm2

4. Experiment & Result

This section should cover experimental setup and results. Please focus on how the authors of paper demonstrated the superiority / effectiveness of the proposed method.

Note that you can attach tables and images, but you don't need to deliver all materials included in the original paper.

Experimental setup

  • Dataset : ImageNet dataset

  • ์ ๋Œ€์  ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ ํ•™์Šต์—๋Š” ILSVRC 2012 training set์„ ์‚ฌ์šฉ.

  • ์ ๋Œ€์  ์˜ˆ์ œ ์ƒ์„ฑ์—๋Š” ILSVRC 2012 validation set ์ค‘ ๊ฐ ๋‹ค๋ฅธ ์นดํ…Œ๊ณ ๋ฆฌ์—์„œ ๋žœ๋คํ•˜๊ฒŒ ๊ณ ๋ฅธ 1000์žฅ์˜ ์ด๋ฏธ์ง€ ์‚ฌ์šฉ. ์ด๋Š” [2]{}์—์„œ ์‚ฌ์šฉ๋œ ์ด๋ฏธ์ง€์™€ ๊ฐ™์Œ.

  • Baselines : FGSM, I-FGSM, MI-FGSM, NI-FGSM, TIM, SIM, DIM

  • Source Model

  • ResNet v2 (Res-v2)

  • Inception v3 (Incv3)

  • Inception v4 (Inc-v4)

  • Inception-ResNetv2 (IncRes-v2)

  • Training setup

  • ์ ๋Œ€์  ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ : $ Conv_{3\times 3} \bullet LeakyReLu \bullet Conv_{3\times 3} $

  • ์ ๋Œ€์  ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ ํ•™์Šต iteration: $ K_{inner} = 10 $, $ K_{outer} =10 $

  • $ \epsilon = 16 $

  • ์ ๋Œ€์  ์˜ˆ์ œ ์ƒ์„ฑ iteration : $ K = 10 $

  • $ \alpha_1 =1.0, \alpha_2 =10.0, \beta = 1.0, \gamma = 1.0 $

  • Evaluation metric ** ๊ณต๊ฒฉ ์„ฑ๊ณต๋ฅ  (Attack Success Rate) ** ๋Š” ํƒ€๊ฒŸ ๋ชจ๋ธ์ด ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์˜ค๋ถ„๋ฅ˜ํ•œ ๋น„์œจ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ ๋Œ€์  ์˜ˆ์ œ์— ๋Œ€ํ•œ ํƒ€๊ฒŸ ๋ชจ๋ธ์˜ ์ •ํ™•๋„๊ฐ€ ๋‚ฎ์„ ์ˆ˜๋ก ๊ณต๊ฒฉ ์„ฑ๊ณต๋ฅ ์ด ๋†’๋‹ค. ๊ณต๊ฒฉ ์„ฑ๊ณต๋ฅ ์ด ๋†’์„ ์ˆ˜๋ก, ์ ๋Œ€์  ์˜ˆ์ œ์˜ ์ „์ด์„ฑ์ด ๋†’๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.

Result

main
defense

์œ„์˜ ๊ฒฐ๊ณผ์—์„œ ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ์—์„œ ** ATTA(Ours) ** ์˜ ๊ณต๊ฒฉ ์„ฑ๊ณต๋ฅ ์ด ๋†’๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‘๋ฒˆ์งธ ํ…Œ์ด๋ธ”์—์„œ ๋ฐฉ์–ด(defense) method์— ๋Œ€ํ•ด์„œ๋„ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ธ๋‹ค.

5. Conclusion

์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์ ๋Œ€์  ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ๊ฐ€ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€ํ™˜์œผ๋กœ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์ƒ์„ฑํ–ˆ์„ ๋•Œ, ์ „์ด์„ฑ์„ ๋†’ํž ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ฒ˜์Œ์œผ๋กœ ๊ฐ ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด adaptive ํ•œ ๋ณ€ํ™˜์„ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด CNN์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ CNN์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋ณ€ํ™˜์— ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค๋Š” ์ ์ด ์•„์‰ฝ๋‹ค.

Take home message (์˜ค๋Š˜์˜ ๊ตํ›ˆ)

Adaptive method๋Š” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ ํšจ๊ณผ์ ์ด๋‹ค.

Author / Reviewer information

Author

** ์†๋ฏผ์ง€ (Son Minji) **

  • KAIST Electrical Engineering

Reviewer

  1. Korean name (English name): Affiliation / Contact information

  2. Korean name (English name): Affiliation / Contact information

  3. ...

Reference & Additional materials

  1. Wu, Weibin, et al. "Improving the transferability of adversarial samples with adversarial transformations." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

  2. Dong, Yinpeng, et al. "Evading defenses to transferable adversarial examples by translation-invariant attacks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

  3. Lin, Jiadong, et al. "Nesterov accelerated gradient and scale invariance for adversarial attacks." arXiv preprint arXiv:1908.06281 (2019).

  4. Xie, Cihang, et al. "Improving transferability of adversarial examples with input diversity." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

Last updated

Was this helpful?