AMP [Kor]

Xue Bin Peng et al. / AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control / Transactions on Graphics (Proc. ACM SIGGRAPH 2021)

English version of this article is NOT YET available.

1. Problem definition

์‹ค์ œ ์ƒ๋ช…์ฒด์ฒ˜๋Ÿผ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์›€์ง์ด๋Š” ๋ฌผ๋ฆฌ์  ๋ชจ๋ธ์€ ์˜ํ™” ๋ฐ ๊ฒŒ์ž„ ๋“ฑ์—์„œ ํ•„์ˆ˜์ ์ธ ์š”์†Œ์ด๋‹ค. ์ด๋Ÿฌํ•œ ์‹ค๊ฐ๋‚˜๋Š” ์›€์ง์ž„์— ๋Œ€ํ•œ ์š”๊ตฌ๋Š” VR์˜ ๋“ฑ์žฅ์œผ๋กœ ๋”์šฑ ์ปค์กŒ๋‹ค. ๋˜ํ•œ, ์ด๋Ÿฌํ•œ ์ž์—ฐ์Šค๋Ÿฌ์šด ์›€์ง์ž„์€ ์•ˆ์ „๊ณผ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ๋‚ด์žฌํ•˜๊ณ  ์žˆ๊ธฐ์— ๋กœ๋ด‡๊ณผ ์—ฐ๊ด€๋œ ์ฃผ์š” ๊ด€์‹ฌ์‚ฌ์ด๋‹ค. ์ด๋Ÿฌํ•œ ์ž์—ฐ์Šค๋Ÿฌ์šด ์›€์ง์ž„์˜ ์˜ˆ์‹œ๋Š” ํ’๋ถ€ํ•œ ๋ฐ˜๋ฉด, ๊ทธ ํŠน์„ฑ์„ ์ดํ•ดํ•˜๊ณ  ๋ฐํ˜€๋‚ด๋Š” ๊ฒƒ์€ ๋‚œํ•ดํ•˜๋ฉฐ ์ด๋ฅผ ์ปจํŠธ๋กค๋Ÿฌ์— ๋ณต์ œํ•˜๋Š” ๊ฒƒ์€ ๋”์šฑ ์–ด๋ ต๋‹ค.

์‹ค์ œ๋กœ PPO ๋“ฑ ๋ชจ๋ฐฉํ•™์Šต์ด ์—†์ด ์ƒ์„ฑ๋œ ๊ฑธ์Œ์„ ๋ณด๋ฉด, ๋ฌด๋ฆŽ์„ ๊ตฝํžˆ๊ณ  ๊ฑท๊ฑฐ๋‚˜ ํŒ”์„ ๋ถ€์ž์—ฐ์Šค๋Ÿฌ์šด ํ˜•ํƒœ๋กœ ํ•˜๋Š” ๋“ฑ "์ฃผ์–ด์ง„ ๋ชฉํ‘œ"๋งŒ ์ž˜ ์ˆ˜ํ–‰ํ•˜๋Š”, ์•ˆ์ •์„ฑ๊ณผ ๊ธฐ๋Šฅ์„ฑ ๋“ฑ์„ ๊ณ ๋ คํ•˜๋ฉด ๋งค์šฐ ๋ถ€์ ํ•ฉํ•œ ํ–‰๋™์„ ํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ์„œ๋Š” ์•„๋งˆ ๋งค์šฐ ๋ณต์žกํ•œ ๋ฆฌ์›Œ๋“œ์˜ ์„ค๊ณ„๊ฐ€ ํ•„์š”ํ•  ๊ฒƒ์ด๋‚˜, ์ด๋ฏธ ์ด๋Ÿฌํ•œ ์‚ฌํ•ญ๋“ค์ด ๊ณ ๋ ค๋˜์–ด์žˆ๋Š” ์‹ค์ œ ์ƒ๋ช…์ฒด์˜ ํ–‰๋™๊ณผ ๋น„์Šทํ•œ ํ–‰๋™์„ ์žฅ๋ คํ•จ์œผ๋กœ์จ ํ•ด๊ฒฐ ๊ฐ€๋Šฅํ•˜๋‹ค. ์ด๊ฒƒ์ด ๋กœ๋ณดํ‹ฑ์Šค์—์„œ ๋ชจ๋ฐฉํ•™์Šต์ด ๊ฐ๊ด‘๋ฐ›๊ธฐ ์‹œ์ž‘ํ•œ ์ด์œ ์ด๋‹ค.

๊ทธ๋Ÿฌ๋‚˜, ๋‹จ์ˆœํžˆ ๋™์ž‘์„ ๋”ฐ๋ผํ•˜๋„๋ก ํ•˜๋Š” ๊ฒƒ์€ ๊ฒฐ๊ตญ ์—์ด์ „ํŠธ๊ฐ€ ํ•™์Šต๋œ ํ•œ ๊ฐ€์ง€ ๋™์ž‘ ์ด์™ธ์—๋Š” ๋ฐฐ์šธ ์ˆ˜ ์—†๋„๋ก ๋งŒ๋“ ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์‚ฌ์šฉ์ž๊ฐ€ high-level task objective๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ทธ์— ๋”ฐ๋ฅธ ์›€์ง์ž„์˜ low-level style์€ ์ •๋ˆ๋˜์ง€ ์•Š์€ ํ˜•ํƒœ๋กœ ์ œ๊ณต๋˜๋Š” ๋ชจ์…˜ ์บก์ณ ์˜ˆ์‹œ๋“ค๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋˜๋Š” ์‹œ์Šคํ…œ์˜ ๊ฐœ๋ฐœ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค.

2. Motivation

๋™๋ฌผ์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ์›€์ง์ž„์€ ์•ˆ์ •์ ์ด๊ณ  ํšจ์œจ์ ์ด๋ฉฐ, ๋ณด๊ธฐ์— ์ž์—ฐ์Šค๋Ÿฝ๋‹ค. ์ด๋ฅผ ๋ฌผ๋ฆฌ์  ํ™˜๊ฒฝ์—์„œ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์€ ๋กœ๋ณดํ‹ฑ์Šค ๋ฐ ๊ฒŒ์ž„ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์—ฐ๊ตฌ๋˜์–ด์™”๋‹ค. ๋ณธ ์ฑ•ํ„ฐ์—์„œ๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ•๋ก  ๋„ค ๊ฐ€์ง€๋ฅผ ์†Œ๊ฐœํ•˜๊ณ ์ž ํ•œ๋‹ค.

Kinematic Methods: Kinematic method์— ๊ธฐ๋ฐ˜ํ•œ ์—ฐ๊ตฌ๋“ค์€ ๋ชจ์…˜ ์บก์ณ ๋“ฑ์˜ motion clip์„ ์‚ฌ์šฉํ•˜์—ฌ ์บ๋ฆญํ„ฐ์˜ ์›€์ง์ž„์„ ์ƒ์„ฑํ•œ๋‹ค. ๋ชจ์…˜ ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒํ™ฉ์— ๋”ฐ๋ฅธ ์ ์ ˆํ•œ ๋ชจ์…˜ ํด๋ฆฝ์„ ์‹คํ–‰ํ•˜๋Š” ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ด ๋Œ€ํ‘œ์ ์ด๋ฉฐ, ์„ ํ–‰ ์—ฐ๊ตฌ๋“ค์—์„œ ์ด๋ฅผ ์œ„ํ•˜์—ฌ Gaussian process๋‚˜ neural network ๋“ฑ์˜ generator๋“ค์ด ์‚ฌ์šฉ๋œ๋‹ค. ์ถฉ๋ถ„ํ•œ ์–‘์˜ ์งˆ ์ข‹์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์ œ๊ณต๋  ๋•Œ, kinematic method๋Š” ๋‹ค์–‘ํ•œ ๋ณต์žกํ•œ ์›€์ง์ž„์„ ์‹ค์ œ์ฒ˜๋Ÿผ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Œ์ด ๋งŽ์€ ์—ฐ๊ตฌ์—์„œ ๋ณด์—ฌ์กŒ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์˜ค๋กœ์ง€ ์‹ค์ œ dataset์—๋งŒ ์˜์กดํ•˜๋Š” ๊ฒƒ์ด ์ด ๋ฐฉ๋ฒ•๋ก ์˜ ํ•œ๊ณ„์ด๋‹ค. ์ƒˆ๋กœ์šด ์ƒํ™ฉ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ kinematic method๋Š” ์‚ฌ์šฉ์ด ์–ด๋ ค์šฐ๋ฉฐ, ๋ณต์žกํ•œ task์™€ ํ™˜๊ฒฝ์— ๋Œ€ํ•ด ์ถฉ๋ถ„ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๋Š” ๊ฒƒ์€ ์‰ฝ์ง€ ์•Š๋‹ค.

Physics-Based Methods: Physics-based mothod๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ™œ์šฉํ•˜๊ฑฐ๋‚˜ ์šด๋™๋ฐฉ์ ์‹ค์„ ํ™œ์šฉํ•˜์—ฌ ์บ๋ฆญํ„ฐ์˜ ์›€์ง์ž„์„ ์ƒ์„ฑํ•œ๋‹ค. ์ด ๋ฐฉ๋ฒ•๋ก ์—์„œ ๊ฒฝ๋กœ ์ตœ์ ํ™” ๋ฐ ๊ฐ•ํ™”ํ•™์Šต๊ณผ ๊ฐ™์€ ์ตœ์ ํ™” ์ด๋ก ๋“ค์ด ์ฃผ๋กœ objective function ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด ์บ๋ฆญํ„ฐ์˜ ์›€์ง์ž„์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ์— ์‚ฌ์šฉ๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ž์—ฐ์Šค๋Ÿฌ์šด ์›€์ง์ž„์„ ์œ ๋„ํ•˜๋Š” objective function์„ ๋””์ž์ธํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ค์šด ์ผ์ด๋‹ค. ๋Œ€์นญ์„ฑ, ์•ˆ์ •์„ฑ ํ˜น์€ ์—๋„ˆ์ง€ ์†Œ๋ชจ ์ตœ์ ํ™”์™€ ๊ฐ™์€ ์š”์†Œ๋ฅผ ์ตœ์ ํ™”ํ•˜๊ณ  ์ƒ๋ช…์ฒด์˜ ๊ตฌ์กฐ์™€ ๋น„์Šทํ•œ actuator ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๋“ฑ์˜ ์—ฐ๊ตฌ๋“ค์ด ์žˆ์–ด์™”์œผ๋‚˜, ์ž์—ฐ์Šค๋Ÿฌ์šด ์›€์ง์ž„์„ ์™„๋ฒฝํžˆ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์€ ์„ฑ๊ณตํ•˜์ง€ ๋ชปํ–ˆ๋‹ค.

Imitation Learning: ์•ž์„œ ์–ธ๊ธ‰๋œ objective function ์„ค๊ณ„์˜ ์–ด๋ ค์›€์œผ๋กœ ์ธํ•˜์—ฌ ์ž์—ฐ์Šค๋Ÿฌ์šด ์›€์ง์ž„์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋Š” imitation learning์ด ํ™œ๋ฐœํ•˜๊ฒŒ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ๋™์ž‘ ์ƒ์„ฑ์—์„œ imitation objective๋Š” ์ฃผ๋กœ ์ƒ์„ฑ๋œ ๋™์ž‘๊ณผ ์‹ค์ œ ๋™์ž‘ ๋ฐ์ดํ„ฐ์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ด ๊ณผ์ •์—์„œ ์ƒ์„ฑ๋œ ๋ชจ์…˜๊ณผ ์‹ค์ œ ๋ชจ์…˜ ๋ฐ์ดํ„ฐ์˜ ๋™๊ธฐํ™”๋ฅผ ์œ„ํ•˜์—ฌ ํŽ˜์ด์ฆˆ ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ input data๋กœ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ œ์‹œ๋œ ๋ฐฉ๋ฒ•๋“ค์€ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋™์ž‘ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๋Š”๋ฐ ์–ด๋ ค์›€์ด ์žˆ์œผ๋ฉฐ, ํŠนํžˆ ํŽ˜์ด์ฆˆ ์ •๋ณด๊ฐ€ ์žˆ์„ ๋•Œ์—๋Š” ์—ฌ๋Ÿฌ ๋™์ž‘๊ฐ„์˜ ๋™๊ธฐํ™”๊ฐ€ ๊ฑฐ์˜ ๋ถˆ๊ฐ€๋Šฅํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, ์ด๋Ÿฌํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋™์ž‘ ์ถ”์  ์ตœ์ ํ™”์—์„œ pose error metric๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ด๋Š” ์ฃผ๋กœ ์‚ฌ๋žŒ์ด ์ง์ ‘ ๋””์ž์ธํ•˜๋ฉฐ, ์บ๋ฆญํ„ฐ์—๊ฒŒ ์—ฌ๋Ÿฌ ๋™์ž‘์„ ํ•™์Šต์‹œํ‚ฌ ๋•Œ ๋ชจ๋“  ๋™์ž‘์— ์ ์šฉ ๊ฐ€๋Šฅํ•œ metric์˜ ์„ค๊ณ„๋Š” ์‰ฝ์ง€ ์•Š๋‹ค. Adversarial imitation learning์€ ๋‹ค์–‘ํ•œ ๋Œ€์•ˆ์„ ์ œ์‹œํ•˜๋Š”๋ฐ, ์‚ฌ๋žŒ์ด ์ง์ ‘ ์„ค๊ณ„ํ•˜์ง€ ์•Š๊ณ ๋„ adversarial learning ๊ณผ์ •์„ ํ†ตํ•˜์—ฌ ์ฃผ์–ด์ง„ ๋™์ž‘์˜ ํŠน์„ฑ์„ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, adversarial learning ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ต‰์žฅํžˆ unstableํ•œ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ์ €์ž์˜ ์ง€๋‚œ ์—ฐ๊ตฌ๋Š” information bottleneck์„ ํ†ตํ•˜์—ฌ discriminator์˜ ๋น ๋ฅธ ํ•™์Šต์„ ์ œํ•œํ•จ์œผ๋กœ์จ ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์— ์„ฑ๊ณตํ•˜์˜€๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์€ ์—ฌ์ „ํžˆ ๋™๊ธฐํ™”๋ฅผ ์œ„ํ•ด ํŽ˜์ด์ฆˆ ์ •๋ณด๋ฅผ ์š”๊ตฌํ•˜์˜€์œผ๋ฉฐ, ๋”ฐ๋ผ์„œ ์ •์ฑ…์— ์—ฌ๋Ÿฌ ์‹ค์ œ ๋ชจ์…˜ ์ •๋ณด๋ฅผ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์—†์—ˆ๋‹ค.

Latent Space Models: Latent space model ๋˜ํ•œ motion prior์˜ ํ˜•ํƒœ๋กœ ์ž‘๋™ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Ÿฌํ•œ ๋ชจ๋ธ๋“ค์€ latent representation ์ •๋ณด์—์„œ ํŠน์ •ํ•œ control์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•œ๋‹ค. latent representation์ด ์ฐธ์กฐ๋˜๋Š” ๋ชจ์…˜ ๋ฐ์ดํ„ฐ์˜ ํ–‰๋™์„ encodeํ•˜๋„๋ก ํ•™์Šต์‹œํ‚ด์œผ๋กœ์จ ์ž์—ฐ์Šค๋Ÿฌ์šด ๋™์ž‘์˜ ์ƒ์„ฑ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋˜ํ•œ, latent representation์—์„œ๋Š” latent space model์„ low-level controller๋กœ ์‚ฌ์šฉํ•˜๊ณ  high-level controller๋Š” latent space๋ฅผ ํ†ตํ•˜์—ฌ ๋”ฐ๋กœ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•˜์—ฌ ์ปจํŠธ๋กค์˜ ์šฐ์„ ์ˆœ์œ„ ์„ค์ •์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ƒ์„ฑ๋˜๋Š” ๋ชจ์…˜์€ latent representation์„ ํ†ตํ•ด ํ•จ์ถ•์ ์œผ๋กœ ์‹ค์ œ ์›€์ง์ž„์„ ์ฐธ๊ณ ํ•˜๊ธฐ์—, high-level control์˜ ์˜ํ–ฅ์œผ๋กœ ์ž์—ฐ์Šค๋Ÿฝ์ง€ ์•Š์€ ์›€์ง์ž„์ด ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ๋‹ค.

Idea

์„ ํ–‰๋œ ์—ฐ๊ตฌ๋“ค์—์„œ ์„ค๋ช…๋œ ๊ฒƒ๊ณผ ๊ฐ™์ด, ์ด์ „์˜ ์—ฐ๊ตฌ๋“ค์€ ์ž์—ฐ์Šค๋Ÿฌ์šด ์›€์ง์ž„์˜ ์ƒ์„ฑ์— ์–ด๋ ค์›€์ด ์žˆ๊ฑฐ๋‚˜ ํ•œ ๊ฐ€์ง€ ๋™์ž‘๋งŒ ํ•™์Šต์— ์ฐธ๊ณ ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๋ฌธ์ œ์ ์ด ์žˆ์—ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์˜ ์›๋ฆฌ๋Š” ํ•™์Šต๋œ ์—์ด์ „ํŠธ๊ฐ€ ํ•˜๋Š” ํ–‰๋™์ด "์‹ค์ œ ์ƒ๋ช…์ฒด์˜ ํ–‰๋™"์˜ ๋ฒ”์ฃผ์— ํฌํ•จ๋˜๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ฆ‰, ์ •์ฑ…์—์„œ ์ƒ์„ฑ๋œ ํ–‰๋™์˜ ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ ์‹ค์ œ ์ƒ๋ช…์ฒด์˜ ํ™•๋ฅ ๋ถ„ํฌ์™€ ์œ ์‚ฌํ•˜๋„๋ก ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” GAN์˜ ๋ชฉํ‘œ์™€ ๋งค์šฐ ์œ ์‚ฌํ•˜๋ฉฐ, ์‹ค์ œ๋กœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ดํ•ดํ•  ๋•Œ์—๋„ "action" domain์—์„œ์˜ GAN์œผ๋กœ ์ดํ•ดํ•˜๋ฉด ํŽธํ•  ๊ฒƒ์ด๋‹ค. ์ด๋Ÿฌํ•œ ๋ชฉํ‘œ๋Š” style reward ์„ค๊ณ„๋ฅผ ํ†ตํ•˜์—ฌ ์„ฑ์ทจํ•˜๊ฒŒ ๋˜๋ฉฐ, style reward์˜ ํŒ๋‹จ ๊ทผ๊ฑฐ๋Š” distribution์˜ ์œ ์‚ฌ์„ฑ ํŒ๋‹จ, ์ฆ‰ discriminator๋ฅผ ํ†ตํ•˜์—ฌ ์ด๋ฃจ์–ด์ง„๋‹ค. ์ด ์—ฐ๊ตฌ์—์„œ ์ €์ž๋Š” Generative Adversarial Learning์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ฃผ์–ด์ง„ task์— ๋”ฐ๋ผ ์‹ค์ œ ๋™์ž‘์„ ์ฐธ๊ณ ํ•  ์ˆ˜ ์žˆ๋Š” ์—์ด์ „ํŠธ์˜ ์ƒ์„ฑ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชฉํ‘œ๋ฅผ ์œ„ํ•˜์—ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ task reward์™€ ํ•จ๊ป˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ชจ์…˜๊ณผ ์‹ค์ œ ๋ชจ์…˜์˜ ์œ ์‚ฌ์„ฑ์— ๋Œ€ํ•œ style reward๋ฅผ ํฌํ•จํ•˜๊ฒŒ ๋œ๋‹ค.

๋‹ค์Œ ์žฅ์—์„œ style reward์— ๋Œ€ํ•œ ์ƒ์„ธํ•œ ์„ค๋ช… ๋ฐ ์ „์ฒด ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•ด ์„ค๋ช…ํ•  ๊ฒƒ์ด๋‹ค.

3. Method

Backgroud

๋กœ๋ณดํ‹ฑ์Šค ์‹œ๋ฎฌ๋ ˆ์ด์…˜์— ๊ด€ํ•˜์—ฌ

๋กœ๋ด‡ ๋ถ„์•ผ์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์€ ๊ธฐ๋ณธ์ ์œผ๋กœ Agent๊ฐ€ ์ฃผ์–ด์ง„ ํ™˜๊ฒฝ์—์„œ Goal(ex. ๊ฑท๊ธฐ)์„ ์ž˜ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ ํ™˜๊ฒฝ์€ ๋ฌผ๋ฆฌ ์—”์ง„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, ์—ฌ๊ธฐ์—์„œ Agent๊ฐ€ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋˜๋Š” action์€ "๋ชจํ„ฐ์— ๋“ค์–ด๊ฐ€๋Š” ์ž…๋ ฅ(์ „๋ฅ˜, ๊ฐ„ํ˜น ํ† ํฌ๋กœ ํ‘œํ˜„)"์ด๋‹ค.

์ฆ‰, ๋กœ๋ด‡์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด๋ž€ ๋กœ๋ด‡(์ฃผ์–ด์ง„ ๋ชจํ„ฐ ์ „๋ฅ˜์— ๋Œ€ํ•œ ๊ด€์ ˆ ์›€์ง์ž„ ์ˆ˜ํ–‰)๊ณผ ํ™˜๊ฒฝ(๋ฌผ๋ฆฌ์  ์ถฉ๋Œ๊ณผ ์ค‘๋ ฅ ๋“ฑ)์€ ์ด๋ฏธ ์ •ํ•ด์ ธ ์žˆ๋Š” ์ƒํƒœ์—์„œ, "๊ด€์ธก๋œ ํ˜„์žฌ ์ƒํƒœ(observed state)"๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋กœ๋ด‡์˜ ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ๊ฐ ๋ชจํ„ฐ์— "์–ด๋–ค ์ถœ๋ ฅ(action)"์„ ๋‚ด๋ณด๋‚ด์•ผ ๋กœ๋ด‡์ด ํ™˜๊ฒฝ์—์„œ ๋ชฉํ‘œ๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์„์ง€๋ฅผ ์„ค๊ณ„ํ•˜๋Š” ๊ณผ์ •์ด๋‹ค.

๋ชฉํ‘œ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต

๋ชฉํ‘œ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต์€ ์„ค๊ณ„๋œ reward function์„ ๊ธฐ๋ฐ˜์œผ๋กœ, reward๋ฅผ ์ตœ๋Œ€๋กœ ๋งŒ๋“œ๋Š” agent๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ด ๊ทธ ๋ชฉํ‘œ์ด๋‹ค. (๊ธฐ๋ณธ์ ์ธ ๊ฐ•ํ™”ํ•™์Šต์˜ ์šฉ์–ด๋“ค์€ ์„ค๋ช…์„ ์ƒ๋žตํ•œ๋‹ค.)

eq1

๊ฒฐ๊ณผ์ ์œผ๋กœ, agent๋Š” ์œ„ ์ˆ˜์‹์œผ๋กœ ์ •์˜๋œ optimization objective๋ฅผ ์ตœ๋Œ€์น˜๋กœ ํ•˜๋Š” policy๋ฅผ ํ•™์Šตํ•˜๊ฒŒ ๋œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” PPO ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ธฐ๋ฐ˜์œผ๋กœ agent๋ฅผ ํ•™์Šต์‹œํ‚จ๋‹ค.

Generative Adversarial Imitation Learining

์ด ์—ฐ๊ตฌ์˜ ํ•ต์‹ฌ์€ GAIL ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•œ motion prior์˜ ์ƒ์„ฑ์ด๋‹ค.

GAIL ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ objective๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

eq2

๋˜ํ•œ, reward๋Š” ์•„๋ž˜ ์ˆ˜์‹์œผ๋กœ ์ •์˜๋œ๋‹ค.

eq3

(๋ฐ”ํƒ•์ด ๋˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ GAN๊ณผ ๊ฐ™์œผ๋ฉฐ, data๊ฐ€ ์•„๋‹Œ state-action์„ ๋Œ€์ƒ์œผ๋กœ ํ•œ๋‹ค) ์œ„์™€ ๊ฐ™์€ optimization์„ ํ†ตํ•˜์—ฌ agent๋Š” ์‹ค์ œ ๋ชจ์…˜ ์บก์ณ ๋ฐ์ดํ„ฐ์˜ distribution๊ณผ ์ตœ๋Œ€ํ•œ ๊ตฌ๋ถ„์ด ๋ถˆ๊ฐ€๋Šฅํ•œ action์„ ์ƒ์„ฑํ•˜๊ฒŒ ๋œ๋‹ค.

Notations

๊ธฐ๋ณธ Notations gg: ๋ชฉํ‘œ ss: ์ƒํƒœ(state) aa: ํ–‰๋™(action) ฯ€\pi: ์ •์ฑ…(policy)

๋…ผ๋ฌธ์˜ Notatations MM: ์‹ค์ œ ์‚ฌ๋žŒ์˜ ๋ชจ์…˜ํด๋ฆฝ ๋ฐ์ดํ„ฐ ๋„๋ฉ”์ธ dMd^{M}: ์‹ค์ œ ์‚ฌ๋žŒ ํ–‰๋™์˜ probability distribution dฯ€d^{\pi}: ์ •์ฑ…์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ probability distribution

System

fig2

์œ„์˜ ๊ทธ๋ฆผ์€ ๋ณธ ๋…ผ๋ฌธ์˜ ์ „์ฒด ์‹œ์Šคํ…œ ๊ตฌ์กฐ๋„์ด๋‹ค. ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ œ์•ˆ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ค๋ช…ํ•  ๊ฒƒ์ด๋‹ค. ์•ž์„œ ๋งํ•œ ๊ฒƒ๊ณผ ๊ฐ™์ด, ๋ณธ ๋…ผ๋ฌธ์˜ ์ „์ฒด ๊ตฌ์กฐ๋Š” PPO agent๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์ด ์ฃผ์š” ๋‚ด์šฉ์ด๋‹ค. ์œ„ agent๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ reward function๋ฅผ ์ตœ๋Œ€ํ™” ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•™์Šต๋œ๋‹ค.

eq4

์œ„ ์ˆ˜์‹์—์„œ rGr^G๋Š” high-level์˜ ๋ชฉํ‘œ(ex. ํŠน์ • ์ง€์  ํ–ฅํ•˜๊ธฐ, ๊ณต ๋“œ๋ฆฌ๋ธ” ๋“ฑ)์— ๋Œ€ํ•œ reward์ด๋ฉฐ, ์ด๋Š” ์ง์ ‘ ๋””์ž์ธ๋œ ๊ฐ„๋‹จํ•œ ์ˆ˜์‹์ด ๋  ๊ฒƒ์ด๋‹ค. ๋ฐ˜๋ฉด์—, rSr^S๋Š” agent๊ฐ€ ์ƒ์„ฑํ•˜๋Š” ์›€์ง์ž„์— ๋Œ€ํ•œ style-reward์ด๋‹ค. Style reward๋ฅผ ํ†ตํ•˜์—ฌ agent๋Š” ์ตœ๋Œ€ํ•œ ์ฃผ์–ด์ง„ motion data์™€ ์œ ์‚ฌํ•œ ๋™์ž‘์„ ์ƒ์„ฑํ•˜๋„๋ก ํ•™์Šต๋œ๋‹ค. ์ด style reward์˜ ๊ฒฐ์ •์ด ๋ณธ ์—ฐ๊ตฌ์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์ด ๋  ๊ฒƒ์ด๋‹ค. wGw^G์™€ wSw^S๋Š” ๊ฐ reward์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜์ด๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ ๋ชจ๋“  ๋‚ด์šฉ์€ ๋‘ ๊ฐ€์ค‘์น˜ ๋ชจ๋‘ 0.5๋กœ ์„ค์ •ํ•˜์—ฌ ์ง„ํ–‰๋˜์—ˆ๋‹ค.

Style reward

์•ž์„œ ๋ฐํ˜”๋“ฏ, style reward๋Š” GAIL ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ ํŒ๋‹จ๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ชจ์…˜ ํด๋ฆฝ๋“ค์€ action์ด ์•„๋‹Œ state์˜ ํ˜•ํƒœ๋กœ ์ œ๊ณต๋œ๋‹ค. ๋”ฐ๋ผ์„œ action์ด ์•„๋‹Œ state transitions์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ตœ์ ํ™”๋˜๋ฉฐ, ์ด๋Š” GAIL objective๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ณ€๊ฒฝํ•˜๊ฒŒ ๋œ๋‹ค.

eq5

์ด์— ๋”ํ•ด์„œ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์„ ํ–‰ ์—ฐ๊ตฌ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ vanishing gradient์˜ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•˜์—ฌ cross-entropy ๊ฐ€ ์•„๋‹Œ least-squares loss์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ discriminator๋ฅผ ์ตœ์ ํ™”ํ•œ๋‹ค.

eq6

GAN์œผ๋กœ ์ƒ์„ฑ๋œ dyanmics์˜ instability์˜ ์ฃผ์š” ์›์ธ ์ค‘ ํ•˜๋‚˜๋Š” discriminator์—์„œ์˜ function approximation error์— ๊ธฐ์ธํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ํ˜„์ƒ์˜ ์™„ํ™”๋ฅผ ์œ„ํ•˜์—ฌ nonzero gradient์— ํŽ˜๋„ํ‹ฐ๋ฅผ ์ฃผ๋Š” ๋ฐฉ์‹์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, gradient penalty๋ฅผ ์ ์šฉํ•œ ์ตœ์ข…์ ์ธ objective๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

eq7

๊ทธ๋ฆฌ๊ณ , style reward๋Š” ์•ž์„œ ํ˜•์„ฑ๋œ objective๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •ํ•ด์ง„๋‹ค.

eq8

์œ„ reward๊ฐ€ ์•ž์„œ ์ •์˜๋œ style-reward๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

Discriminator observations

์•ž์„œ discriminator๊ฐ€ state transtion์— ๊ธฐ๋ฐ˜ํ•จ์„ ์„ค๋ช…ํ•˜์˜€๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด, discriminator์˜ ๊ด€์ฐฐ ๋Œ€์ƒ์ด ๋  feature๋“ค์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ feature๋“ค์˜ ์ง‘ํ•ฉ์„ input(observed states)์œผ๋กœ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

  • Global coordinate์—์„œ ์บ๋ฆญํ„ฐ์˜ ์›์ (pelvis)์˜ ์„ ์†๋„ ๋ฐ ํšŒ์ „์†๋„

  • ๊ฐ joint์˜ local rotation / velocity

  • ๊ฐ end-effector์˜ local coordinate

Training

๋ณธ ์—ฐ๊ตฌ์˜ actor(generator), critic, ๊ทธ๋ฆฌ๊ณ  discriminator๋Š” ๋ชจ๋‘ 2-layer 1024 and 512 ReLU ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์— ๊ธฐ๋ฐ˜ํ•œ๋‹ค.

์ „์ฒด ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

algorithm

4. Experiment & Result

Experimental setup

  • Dataset

    Descriminator๊ฐ€ ๋น„๊ตํ•˜๊ฒŒ ๋  ์‹ค์ œ motion data๋ฅผ ์œ„ํ•˜์—ฌ ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์˜ motion capture data๊ฐ€ ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ๋ณต์žกํ•œ task์˜ ๊ฒฝ์šฐ, ํ•˜๋‚˜์˜ task์— ์—ฌ๋Ÿฌ motion data๊ฐ€ ํ•จ๊ป˜ ์‚ฌ์šฉ๋˜๊ธฐ๋„ ํ•˜์˜€๋‹ค.

  • Baselines

    ๋น„๊ต์—๋Š” ์ €์ž์˜ ์ด์ „ ์—ฐ๊ตฌ์ธ Deepmimic ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. (ํ•ด๋‹น ์—ฐ๊ตฌ๊ฐ€ state-of-the-art ์ด์—ˆ๊ธฐ ๋•Œ๋ฌธ)

  • Training setup

    ์‹คํ—˜๋œ high-level task๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

    • Target heading: ์บ๋ฆญํ„ฐ๊ฐ€ ์ •ํ•ด์ง„ heading direction์„ ํ–ฅํ•˜์—ฌ target speed์— ๋งž์ถฐ ์›€์ง์ธ๋‹ค.

    • Target location: ์บ๋ฆญํ„ฐ๊ฐ€ ํŠน์ • target location์„ ํ–ฅํ•ด ์›€์ง์ธ๋‹ค.

    • Dribbling: ๋ณต์žกํ•œ task์— ๋Œ€ํ•œ ํ‰๊ฐ€๋ฅผ ์œ„ํ•˜์—ฌ, ์บ๋ฆญํ„ฐ๋Š” ์ถ•๊ตฌ๊ณต์„ target location์œผ๋กœ ์˜ฎ๊ธฐ๋Š” task๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค.

    • Strike: ๋‹ค์–‘ํ•œ ๋ชจ์…˜ ์ •๋ณด๋ฅผ ํ˜ผํ•ฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ์บ๋ฆญํ„ฐ๊ฐ€ target object๋ฅผ ์ •ํ•ด์ง„ end-effector๋กœ ํƒ€๊ฒฉํ•˜๋Š” task๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค.

    • Obstacles: ๋ณต์žกํ•œ ํ™˜๊ฒฝ์—์„œ ์‹œ๊ฐ์  ์ธ์‹ ์ •๋ณด์™€ interaction์ด ๊ฐ€๋Šฅํ•œ์ง€ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ์บ๋ฆญํ„ฐ๊ฐ€ ์žฅ์• ๋ฌผ๋กœ ์ฑ„์›Œ์ง„ ์ง€ํ˜•์„ ๊ฐ€๋กœ์ง€๋ฅด๋Š” task๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค.

  • Evaluation metric

    Task์— ๋Œ€ํ•œ ํ‰๊ฐ€๋กœ๋Š” task return ๊ฐ’์„ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ์ฃผ์–ด์ง„ ๋™์ž‘๊ณผ์˜ ์œ ์‚ฌ์„ฑ ๋น„๊ต์—๋Š” average pose error๊ฐ€ ๊ณ„์‚ฐ๋˜์—ˆ๋‹ค. ํŠน์ • time step์—์„œ์˜ pose error์˜ ๊ณ„์‚ฐ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

    eq10

Result

fig3

์ €์ž๊ฐ€ ๊ณต๊ฐœํ•œ ๋™์˜์ƒ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ, ์ œ์‹œ๋œ ๋ฐฉ๋ฒ•๋“ค๋กœ ํ›ˆ๋ จ๋œ agent๋Š” ๋ณต์žกํ•œ ํ™˜๊ฒฝ๊ณผ ๋‹ค์–‘ํ•œ task๋“ค์— ๋Œ€ํ•˜์—ฌ ๊ต‰์žฅํžˆ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ ์ƒ์„ฑ๋œ ์›€์ง์ž„ ๋˜ํ•œ ์‚ฌ๋žŒ์ฒ˜๋Ÿผ ์ž์—ฐ์Šค๋Ÿฌ์›€์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์ œ์‹œ๋œ task๋“ค์— ๋Œ€ํ•œ return๊ฐ’์€ ๋‹ค์Œ๊ณผ ๊ฐ™์œผ๋ฉฐ, ์‹ค์ œ ์‹คํ–‰์—์„œ ๋ฌธ์ œ ์—†์ด ์—ฌ๋Ÿฌ ์›€์ง์ž„์„ ์กฐํ•ฉํ•˜์—ฌ task๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์ค€๋‹ค.

table1

๊ธฐ์กด์˜ state-of-the-art์™€ ๋น„๊ตํ•˜์˜€์„ ๋•Œ, ๋‹ค์Œ์˜ ํ‘œ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ ๋™์ž‘์˜ ์žฌํ˜„์—์„œ๋Š” ์ •๋Ÿ‰์ ์œผ๋กœ ์กฐ๊ธˆ ๋‚ฎ์€ ์ˆ˜์น˜๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ ˆ๋Œ€์ ์ธ ์ˆ˜์น˜๋กœ ๋ณด์•˜์„ ๋•Œ ๋ถ€์กฑํ•จ์ด ์—†๋Š” ์ˆ˜์ค€์ด๋ฉฐ, ํ•˜๋‚˜์˜ motion data๋งŒ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๊ณผ ๋น„๊ตํ•˜์—ฌ ์ด ์—ฐ๊ตฌ์—์„œ๋Š” ์—์ด์ „ํŠธ๊ฐ€ task์— ๋”ฐ๋ผ ์—ฌ๋Ÿฌ motion data ์ค‘์— ํ•„์š”ํ•œ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

table3

5. Conclusion

๋ณธ ๋…ผ๋ฌธ์€ ํ˜„์žฌ locomotion simulation์˜ state-of-the-art์ด๋‹ค.

์ด ๋…ผ๋ฌธ์˜ ๊ฐ€์žฅ ํฐ ๊ธฐ์—ฌ๋Š” agent๊ฐ€ ์—ฌ๋Ÿฌ ๋™์ž‘ ๋ฐ์ดํ„ฐ๋“ค์„ ํ•œ๋ฒˆ์— ํ•™์Šตํ•˜๋ฉฐ, ์ฃผ์–ด์ง„ ์ƒํ™ฉ์— ๋งž์ถฐ ํ•„์š”ํ•œ motion์„ ์ƒ์„ฑํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

๋™์˜์ƒ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ, strike task์—์„œ agent๋Š” ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ object๋กœ ๊ฑธ์–ด๊ฐ€ ์ฃผ๋จน์„ ๋ป—์–ด object๋ฅผ ํƒ€๊ฒฉํ•œ๋‹ค. ์ด ๋™์ž‘์˜ ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๊ฒƒ์€ ์˜ค์ง ์‹ค์ œ ์‚ฌ๋žŒ์˜ ๊ฑท๋Š” ๋™์ž‘๊ณผ ์ฃผ๋จน์„ ๋ป—๋Š” ๋™์ž‘ ๋ฐ์ดํ„ฐ ๋ฟ์ด๋‹ค.

๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ์žฅ์• ๋ฌผ ์ง€ํ˜•์—์„œ์˜ ๋‹ฌ๋ฆฌ๊ธฐ ๋“ฑ ๋†€๋ผ์šธ ์ •๋„๋กœ ๋ณต์žกํ•œ ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์— ๋น„ํ•˜์—ฌ ํ•™์Šต์— ํ•„์š”๋กœ ํ•˜๋Š” data๋Š” ๋งค์šฐ ๋‹จ์ˆœํ•˜๋ฉฐ ์‰ฝ๊ฒŒ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ๋”ฅ๋Ÿฌ๋‹์˜ ๊ฐ€์žฅ ํฐ ์–ด๋ ค์›€ ์ค‘ ํ•˜๋‚˜๊ฐ€ ํ•™์Šต์„ ์œ„ํ•œ ์ถฉ๋ถ„ํ•œ ๋ฐ์ดํ„ฐ์˜ ํš๋“์ด๋ผ๋Š” ์‚ฌ์‹ค์„ ๊ฐ์•ˆํ•  ๋•Œ, ์ด๋Ÿฌํ•œ ํŠน์„ฑ์€ ๊ต‰์žฅํ•œ ์žฅ์ ์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

์ด ์—ฐ๊ตฌ์˜ ๊ฒฐ๊ณผ๋Š” ๋กœ๋ด‡, ๊ฒŒ์ž„, ์—๋‹ˆ๋ฉ”์ด์…˜ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์— ํฐ ์ง„๋ณด๋ฅผ ๊ฐ€์ ธ๋‹ค ์ค„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.

Take home message (์˜ค๋Š˜์˜ ๊ตํ›ˆ)

๋ณธ ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค๋งŒ์„ ์ ์ ˆํžˆ ์กฐํ•ฉํ•˜์—ฌ ์ตœ๊ณ ์˜ ์„ฑ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค.

์ƒˆ๋กœ์šด ์—ฐ๊ตฌ๋“ค์— ๋Œ€ํ•œ ๋Š์ž„์—†๋Š” ๊ณต๋ถ€์™€ ์‹œ๋„๊ฐ€ ์ค‘์š”ํ•˜๋‹ค.

Author / Reviewer information

Author

์•ˆ์„ฑ๋นˆ (Seongbin An)

  • KAIST Robotics Program

  • I am really new to this field. Thanks in advance for any advice.

  • sbin@kaist.ac.kr

Reviewer

  1. Korean name (English name): Affiliation / Contact information

  2. Korean name (English name): Affiliation / Contact information

  3. ...

Reference & Additional materials

  1. Peng, Xue Bin, et al. "AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control." arXiv preprint arXiv:2104.02180 (2021).

  2. Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).

  3. Ho, Jonathan, and Stefano Ermon. "Generative adversarial imitation learning." Advances in neural information processing systems 29 (2016): 4565-4573.

  4. Peng, Xue Bin, et al. "Deepmimic: Example-guided deep reinforcement learning of physics-based character skills." ACM Transactions on Graphics (TOG) 37.4 (2018): 1-14.

  5. Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems 27 (2014).

Last updated

Was this helpful?