Standardized Max Logits [Kor]

Jung et al. / Standardized Max Logits - A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation / ICCV 2021

English version of this article is available.

์•ˆ๋…•ํ•˜์„ธ์š”, ์ด ํฌ์ŠคํŒ…์—์„œ ์†Œ๊ฐœ๋“œ๋ฆฌ๊ณ ์ž ํ•˜๋Š” ๋…ผ๋ฌธ์€ ์ด๋ฒˆ ICCV 2021์— Oral presentation์œผ๋กœ ๋“ฑ์žฌ๋œ ๋…ผ๋ฌธ์ธ Standardized Max Logits (SML)์— ๋Œ€ํ•ด ์„ค๋ช…๋“œ๋ฆฌ๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ํ•ด๋‹น ๋…ผ๋ฌธ์—๋Š” ์ €์™€ ์ด์ •์ˆ˜ ์„์‚ฌ๊ณผ์ •์ƒ์ด ๊ณต๋™ 1์ €์ž๋กœ ์ฐธ์—ฌํ•˜์˜€์œผ๋ฉฐ ๋„๋กœ ์ฃผํ–‰ semantic segmentation์—์„œ์˜ Out-of-Distribution ํƒ์ง€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•œ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ €ํฌ์˜ ๋ฐฉ๋ฒ•๋ก ์€ Fishyscapes๋ผ๋Š” public leaderboard์—์„œ state-of-the-art ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค (Fishyscapes).

1. Problem definition

Teaser Image

์ตœ๊ทผ ๋„๋กœ ์ฃผํ–‰ semantic segmentation์˜ ๋ฐœ์ „์€ ๋‹ค์–‘ํ•œ benchmarking dataset์—์„œ ํฐ ์„ฑ๊ณผ๋ฅผ ์ด๋ฃจ์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฐ ๋…ธ๋ ฅ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์—ฌ์ „ํžˆ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ๋“ค์€ ์‹ค์ œ ์ฃผํ–‰ ํ™˜๊ฒฝ์— ์ ์šฉ๋˜๊ธฐ ํž˜๋“ญ๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋Š” ๋ชจ๋ธ์˜ ํ•™์Šต ์‹œ์— ์ €ํฌ๊ฐ€ ๊ฐ€์ •ํ•œ ๋ช‡ ๊ฐœ์˜ pre-define๋œ class๋งŒ์„ ์ด์šฉํ•ด์„œ ํ•™์Šตํ•˜๊ฒŒ ๋˜๊ณ , ์ด๋ ‡๊ฒŒ ํ•™์Šตํ•œ ๋ชจ๋ธ์€ input image์˜ ๋ชจ๋“  ํ”ฝ์…€์„ pre-define๋œ class์ค‘ ํ•˜๋‚˜๋กœ ์˜ˆ์ธกํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์‹ค์ œ ์ฃผํ–‰ ์‹œ์— pre-define๋œ class๊ฐ€ ์•„๋‹Œ unexpected obstacle์ด ๋“ฑ์žฅํ•˜๋ฉด ์œ„ ๊ทธ๋ฆผ์—์„œ ๋ณด์ด๋‹ค์‹œํ”ผ ์ œ๋Œ€๋กœ ๋Œ€์ฒ˜ํ•  ์ˆ˜ ์—†๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, anomalousํ•œ ์˜์—ญ์€ ์ฐพ์•„๋‚ด๋Š” ๊ฒƒ์ด ์•ˆ์ „์ด ์ค‘์š”ํ•œ application์ธ ๋„๋กœ ์ฃผํ–‰์—์„œ ํฐ ๋ฌธ์ œ์ด๋ฉฐ ์ €ํฌ์˜ ๋ฐฉ๋ฒ•๋ก ์€ ์ด๋Ÿฌํ•œ ์˜์—ญ์„ ๋”ฐ๋กœ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๊ฒŒ ๋„์™€์ฃผ๋Š” ์‹œ๋ฐœ์  ์—ญํ• ์„ ํ•ด์ค๋‹ˆ๋‹ค.

์ž์„ธํ•œ ์„ค๋ช…์— ๋“ค์–ด๊ฐ€๊ธฐ ์•ž์„œ, semantic segmentation task์˜ ์ •์˜์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ฃผ์–ด์ง„ input image xโˆˆXtrainHร—Wx\in{\mathbb{X}_{train}}^{H\times{W}}์™€ ๊ทธ ํ”ฝ์…€๋ณ„๋กœ์˜ ์ •๋‹ต์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” yโˆˆYtrainHร—Wy\in{\mathbb{Y}_{train}}^{H\times{W}} ์— ๋Œ€ํ•˜์—ฌ ์šฐ๋ฆฌ๋Š” xx์— ๋Œ€ํ•œ ์˜ˆ์ธก ๊ฐ’ y^\hat{y}๋ฅผ ๋‚ด๋ฑ‰๋Š” segmentation model GG๋ฅผ cross-entropy loss๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

CrossEntropy=โˆ’โˆ‘xโˆˆXylogโกy^,CrossEntropy = -\sum\limits_{x\in\mathbb{X}}{y\log{\hat{y}}},

์—ฌ๊ธฐ์—์„œ๋„ ์•Œ ์ˆ˜ ์žˆ๋‹ค์‹œํ”ผ, ๋ชจ๋ธ $G$๋Š” anomalousํ•œ ์˜์—ญ์— ๋Œ€ํ•ด์„œ๋„ pre-defined class๋กœ ์˜ˆ์ธกํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ ์„ ํ•ด๊ฒฐํ•˜๊ณ ์ž ์ €ํฌ์˜ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ ํ”ฝ์…€์— ๋Œ€ํ•ด anomaly score๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฐ„๋‹จํ•˜๊ณ  ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•˜๋ฉฐ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋ก ๋“ค๊ณผ ๋‹ฌ๋ฆฌ ์ถ”๊ฐ€์ ์ธ training์ด๋‚˜ ๋‹ค๋ฅธ network module์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

2. Preliminary

Out-of-distribution (OoD) ํƒ์ง€๋ฅผ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์ด์ „ ์—ฐ๊ตฌ๋“ค์ด ์žˆ์–ด์™”์Šต๋‹ˆ๋‹ค. ๊ทธ ์ค‘, ์ €ํฌ๊ฐ€ ์ฃผ๋ชฉํ•œ ๋ฐฉ๋ฒ•๋ก ๋“ค์€ Maximum Softmax Probability (MSP) [1]๊ณผ Max Logit [2] ์ž…๋‹ˆ๋‹ค. ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•๋ก ์€ ๋ชจ๋‘ in-distribution pixel๋“ค ๋Œ€๋น„, OoD ํ”ฝ์…€๋“ค์˜ prediction score๊ฐ€ ๋” ์ž‘์€ ์ ์„ ์ด์šฉํ•œ detection measure์ž…๋‹ˆ๋‹ค. ๋จผ์ € MSP [1]๋Š” network prediction์— softmax๋ฅผ ์ทจํ•œ ๊ฐ’์„ anomaly score๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•œ seminal ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ MSP score์˜ ๊ฒฝ์šฐ, exponentialํ•จ์ˆ˜๊ฐ€ ๋น ๋ฅด๊ฒŒ ์ฆ๊ฐ€ํ•˜๋Š” ์„ฑ์งˆ์„ ๊ฐ€์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— anomaly image๋“ค์ด ๋†’์€ MSP score๋ฅผ ๊ฐ–๋Š” (๋‚ฎ์€ anomaly score) ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์ด Max Logit [2] ์ž…๋‹ˆ๋‹ค. Softmax์— ๋“ค์–ด๊ฐ€๊ธฐ ์ „์˜ logit ๊ฐ’์„ anomaly score๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€์œผ๋ฉฐ exponential function์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— MSP์—์„œ์˜ over-confident ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ €ํฌ์˜ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด Max Logit์ด semantic segmentation์—์„œ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ œ๋ฅผ ์ด์•ผ๊ธฐํ•˜๊ณ  ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

Semantic segmentation์˜ OoD ํƒ์ง€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ๋“ค [3, 4, 5, 6, 7, 8]์ด ์ œ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ช‡๋ช‡์˜ ์—ฐ๊ตฌ [3, 4]๋“ค์€ PASCAL VOC์—์„œ pre-defined class์— ํ•ด๋‹นํ•˜์ง€ ์•Š๋Š” object๋“ค์„ ์ฐพ์•„์„œ training dataset์ธ Cityscapes์— ํ•ฉ์„ฑํ•˜์—ฌ segmentation model์„ ํ•™์Šต์‹œ์ผฐ๊ณ  ๋‹ค๋ฅธ ์ข…๋ฅ˜์˜ ์—ฐ๊ตฌ [5, 6, 7, 8]๋“ค์€ image resynthesis ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•๋ก ๋“ค์€ image resynthesis ๋ชจ๋ธ์ด unseen object๋Š” ๋งž๊ฒŒ ์ƒ์„ฑํ•ด๋‚ด์ง€ ๋ชปํ•œ๋‹ค๋Š” ์ง๊ด€์—์„œ ์‹œ์ž‘๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด ๋‘ ๋ฐฉ๋ฒ•๋ก  ๋ชจ๋‘ ์ถ”๊ฐ€์ ์ธ OoD dataset์„ ํ•„์š”๋กœ ํ•˜๊ฑฐ๋‚˜ ๋˜๋Š” ์ถ”๊ฐ€์ ์ธ ํ•™์Šต์ด ํ•„์š”ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

3. Motivation

Motivation

Findings from previous work

์ €ํฌ ์—ฐ๊ตฌ์˜ motivation์€ ์œ„ ์ด๋ฏธ์ง€๋ฅผ ํ†ตํ•ด ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ„์˜ ์ด๋ฏธ์ง€๋Š” pre-train๋œ segmentation network๋ฅผ Fishyscapes Lost&Found dataset์— inferenceํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ๊ฐ๊ฐ์˜ bar๋Š” pixel ๊ฐ’๋“ค์˜ ๋ถ„ํฌ๋ฅผ ์˜๋ฏธํ•˜๊ณ , ์ฃผํ™ฉ์ƒ‰ bar๋Š” in-distribution (pre-defined classes) ๊ทธ๋ฆฌ๊ณ  ํŒŒ๋ž€์ƒ‰์€ unexpected (pre-define๋˜์ง€ ์•Š์€ class) pixel๋“ค๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ํšŒ์ƒ‰ ์˜์—ญ์€ in-distribution๊ณผ unexpected pixel๋“ค์ด ๊ฒน์น˜๋Š” ์˜์—ญ (false positives and false negatives)์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋ณด์—ฌ์ง€๋Š” ๋ฐ”์™€ ๊ฐ™์ด MSP์˜ ๊ฒฝ์šฐ over-confident ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์„ ๋ณด์‹ค ์ˆ˜๊ฐ€ ์žˆ๊ณ  ๊ทธ ๊ฒฐ๊ณผ ๊ฐ€์žฅ ํฐ ํšŒ์ƒ‰ ์˜์—ญ์„ ๊ฐ–๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Max Logit์˜ ๊ฒฝ์šฐ, ๋ณด์‹œ๋Š” ๋ฐ”์™€ ๊ฐ™์ด ๊ฐ class๋ณ„๋กœ ๋ถ„ํฌ๊ฐ€ ์ƒ์ดํ•œ ๊ฒƒ์„ ๋ณด์‹ค ์ˆ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ˜„์ƒ์€ anomaly detection์—์„œ ๋ฌธ์ œ๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š”๋ฐ, ๊ทธ ์ด์œ ๋Š” anomaly๋ฅผ ์ฐพ์•„๋‚ด๊ธฐ ์œ„ํ•ด ๊ฐ class๋ณ„๋กœ ๋‹ค๋ฅธ threshold๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ , ํ•˜๋‚˜์˜ threshold๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

Idea

์ด๋Ÿฌํ•œ ๋ฐœ๊ฒฌ์œผ๋กœ๋ถ€ํ„ฐ, ์ €ํฌ๋Š” Standardized Max Logit (SML)์ด๋ผ๋Š” ์ƒˆ๋กœ์šด anomaly score๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด SML์€ Max Logit์—์„œ class๋ณ„๋กœ ๋‹ค๋ฅด๊ฒŒ ํ˜•์„ฑ๋˜์–ด์žˆ๋˜ ๋ถ„ํฌ๋ฅผ standardization์„ ํ†ตํ•ด ๊ฐ™์€ ๋ถ„ํฌ๋ฅผ ๋งž์ถฐ์ค€ anomaly score์ž…๋‹ˆ๋‹ค. ์œ„ ์ด๋ฏธ์ง€์—์„œ ๋ณด์‹ค ์ˆ˜ ์žˆ๋‹ค์‹œํ”ผ SML์„ ์ ์šฉํ•  ๊ฒฝ์šฐ overlap๋œ ์˜์—ญ์ด ํฌ๊ฒŒ ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ SML์—์„œ ๋” ๋‚˜์•„๊ฐ€, ์ €ํฌ์˜ ์—ฐ๊ตฌ์—์„œ๋Š” class boundary์™€ ์ž‘์€ irregular๋“ค์— ์ง‘์ค‘ํ•ด์„œ ์ด ๊ฒฐ๊ณผ๋ฅผ ๋” ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ถ”๊ฐ€์ ์ธ ๋ชจ๋“ˆ๋“ค์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

4. Method

Method

์œ„ ๊ทธ๋ฆผ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค์‹œํ”ผ, ์ €ํฌ๋Š” ์šฐ์„  pre-train๋œ ๋ชจ๋ธ์„ ์ด์šฉํ•ด์„œ Max Logit๊ฐ’์„ ๊ตฌํ•ด๋ƒ…๋‹ˆ๋‹ค. ๊ทธ ํ›„์—, ์ €ํฌ๋Š” ์ด Max Logit๊ฐ’๋“ค์„ class๋ณ„๋กœ training statistics๋ฅผ ์ด์šฉํ•ด์„œ standardize๋ฅผ ํ•ด์ฃผ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋” ๋‚˜์•„๊ฐ€, uncertainํ•œ boundary ์˜์—ญ์„ ๋” certainํ•œ ๊ฐ’์ธ ์ฃผ๋ณ€์˜ non-boundary ๊ฐ’๋“ค์„ ์ด์šฉํ•ด์„œ ์ „ํŒŒ์‹œ์ผœ์ฃผ๊ณ  ๋งˆ์ง€๋ง‰์œผ๋กœ dilated smoothing์„ ์ ์šฉํ•˜์—ฌ ์ž‘์€ irregular๋“ค์„ ์ œ๊ฑฐํ•ด์ค๋‹ˆ๋‹ค.

๋‹ค์Œ์˜ ๊ณผ์ •์€ ์ €ํฌ๊ฐ€ ์–ด๋–ป๊ฒŒ Max Logit๊ณผ prediction์„ ์–ป์—ˆ๋Š”์ง€ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ฃผ์–ด์ง„ input image XโˆˆR3ร—Hร—WX\in\mathbb{R}^{3\times{H}\times{W}}์™€ pre-define๋œ class์˜ ๊ฐœ์ˆ˜ CC์— ๋Œ€ํ•˜์—ฌ logit output์ธ FโˆˆRCร—Hร—WF\in\mathbb{R}^{C\times{H}\times{W}}๋Š” ๋„คํŠธ์›Œํฌ์˜ softmax layer ์ „์˜ output์œผ๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, Max Logit LโˆˆRHร—WL\in\mathbb{R}^{H\times{W}}๊ณผ prediction Y^โˆˆRHร—W\hat{Y}\in\mathbb{R}^{H\times{W}}์€ cโˆˆ{1,...,C}c\in\{1, ..., C\}์— ๋Œ€ํ•˜์—ฌ input image์˜ h,wh, w ์œ„์น˜์—์„œ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.

4-1. Standardized Max Logits (SML)

Standardization์„ ์œ„ํ•ด์„œ ์ €ํฌ๋Š” ์šฐ์„  training sample๋“ค์˜ statistics๋ฅผ ๊ตฌํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ณด๋‹ค ๊ตฌ์ฒด์ ์œผ๋กœ, ์ €ํฌ๋Š” train sample๋“ค์˜ ๊ฐ class๋ณ„ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ๊ตฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ํ”„๋กœ์„ธ์Šค๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.

ฮผc=โˆ‘iโˆ‘h,w1(Y^h,w(i)=c)โ‹…Lh,w(i)โˆ‘iโˆ‘h,w1(Y^h,w(i)=c)ฯƒc=โˆ‘iโˆ‘h,w1(Y^h,w(i)=c)โ‹…(Lh,w(i)โˆ’ฮผc)2โˆ‘iโˆ‘h,w1(Y^h,w(i)=c)\mu_c = \frac{\sum_i\sum_{h,w}\mathbb{1}(\boldsymbol{\hat{Y}}^{(i)}_{h,w} = c)\cdot{\boldsymbol{L}^{(i)}_{h,w}}}{\sum_i\sum_{h,w}\mathbb{1}(\boldsymbol{\hat{Y}}^{(i)}_{h,w} = c)}\\ \sigma_c = \frac{\sum_i\sum_{h,w}\mathbb{1}(\boldsymbol{\hat{Y}}^{(i)}_{h,w} = c)\cdot{(\boldsymbol{L}^{(i)}_{h,w} - \mu_c)^2}}{\sum_i\sum_{h,w}\mathbb{1}(\boldsymbol{\hat{Y}}^{(i)}_{h,w}=c)}

์ด ์‹์—์„œ ii๋Š” ii๋ฒˆ์งธ training sample ๊ทธ๋ฆฌ๊ณ  1(โ‹…)\mathbb{1}(\cdot)์€ indicator function์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

์œ„ ์ฒ˜๋Ÿผ ์–ป์–ด์ง„ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ์ด์šฉํ•ด ์ €ํฌ๋Š” test image์— ๋Œ€ํ•ด SML SโˆˆRHร—W\boldsymbol{S}\in\mathbb{R}^{H\times{W}}๋ฅผ Max Logit ๊ฐ’๋“ค์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด standardizeํ•จ์œผ๋กœ์„œ ์–ป์–ด๋ƒ…๋‹ˆ๋‹ค.

Sh,w=Lh,wโˆ’ฮผY^h,wฯƒY^h,w\boldsymbol{S}_{h,w}=\frac{\boldsymbol{L_{h,w}}-\mu_{\boldsymbol{\hat{Y}_{h,w}}}}{\sigma_{\hat{Y}_{h,w}}}

์ด๋ ‡๊ฒŒ SML์€ standardization์„ ํ†ตํ•ด Max Logit๊ฐ’๋“ค์„ ๊ฐ™์€ ์˜๋ฏธ๋ฅผ ๊ฐ–๋„๋ก ๋ฐ”๊ฟ”์ค๋‹ˆ๋‹ค. ์ •ํ™•ํžˆ๋Š” ๊ฐ ํ”ฝ์…€์— ๋Œ€ํ•œ ๊ฐ’๋“ค์„ ๊ทธ๋“ค์˜ class์•ˆ์—์„œ์˜ ์ƒ๋Œ€์ ์ธ ํฌ๊ธฐ ์ ์ˆ˜๋กœ ๋ฐ”๊ฟ”์ค๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ mapping์€ ์ €ํฌ๊ฐ€ ์ถ”๊ฐ€์ ์œผ๋กœ ์ œ์•ˆํ•˜๋Š” Boundary Suppression ๊ณผ Dilated Smoothing์ด ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค.

4-2. Iterative Boundary Suppression

Boundary Suppression

Boundary์˜์—ญ์€ class์˜ ์•ˆ์ชฝ ์˜์—ญ ๋Œ€๋น„ ๋”์šฑ uncertainํ•œ ํŠน์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋Š” ์ด๋Ÿฌํ•œ boundary ์˜์—ญ์€ ํ•˜๋‚˜์˜ class์—์„œ ๋‹ค๋ฅธ class๋กœ์˜ ๋ณ€ํ™”๊ฐ€ ์ผ์–ด๋‚˜๋Š” ๊ณณ์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ €ํฌ๋Š” Iterative Boundary Suppression์ด๋ผ๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์ด๋Ÿฌํ•œ uncertainํ•œ ์˜์—ญ์„ certainํ•œ ๊ฐ’์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์œ„ ๊ทธ๋ฆผ์— ์„ค๋ช…๋œ ๊ฒƒ์ฒ˜๋Ÿผ ๋จผ์ € ์ €ํฌ๋Š” prediction map์—์„œ boundary ์˜์—ญ์„ ๊ตฌํ•ด๋ƒˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  Boundary Average Aware Pooling (BAP)๋ฅผ ์ ์šฉํ•˜์—ฌ boundary์ฃผ๋ณ€์˜ non-boundary๊ฐ’๋“ค์ด boundary ์˜์—ญ์„ ์—…๋ฐ์ดํŠธํ•˜๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ €ํฌ๋Š” ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ boundary width์ธ rir_{i}๋ฅผ ์ค„์—ฌ๊ฐ€๋ฉฐ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๋”์šฑ ๊ตฌ์ฒด์ ์œผ๋กœ, ์ €ํฌ๋Š” initial boundary width๋ฅผ r0r_0๋กœ ์ •์˜ํ•˜์˜€๊ณ  ๋งค iteration๋งˆ๋‹ค ฮ”r\Delta{r}์”ฉ ์ค„์—ฌ๊ฐ€๋ฉฐ ์ ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ii๋ฒˆ์งธ width์ธ rir_{i}์™€ prediction Y^\hat{Y}์— ๋Œ€ํ•˜์—ฌ, ์ €ํฌ๋Š” non-boundary mask M(i)โˆˆRHร—WM^{(i)}\in\mathbb{R}^{H\times{W}}๋ฅผ ๊ฐ pixel h,wh, w์— ๋Œ€ํ•˜์—ฌ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

M(i)={0,ifโˆƒhโ€ฒ,wโ€ฒย ย s.t.,ย Y^h,wโ‰ Y^hโ€ฒ,wโ€ฒ1,otherwise\boldsymbol{M}^{(i)} = \begin{cases} 0, & \text{if} ^\exists{h^\prime, w^\prime}\ \ \text{\textit{s.t.,}}\ \boldsymbol{\hat{Y}}_{h, w} \neq \boldsymbol{\hat{Y}}_{h^\prime, w^\prime} \\ 1, & \text{otherwise} \end{cases}\quad

์—ฌ๊ธฐ์„œ โˆ€hโ€ฒ,wโ€ฒ^\forall{h^\prime, w^\prime} ๋Š” โˆฃhโˆ’hโ€ฒโˆฃ+โˆฃwโˆ’wโ€ฒโˆฃโ‰คri|h - h^\prime| + |w - w^\prime| \leq r_i ๋ฅผ ๋งŒ์กฑ์‹œํ‚ค๋Š” ๋ชจ๋“  hโ€ฒ,wโ€ฒh^\prime, w^\prime์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

๊ทธ ํ›„, ์ €ํฌ๋Š” BAP๋ฅผ ์œ„์—์„œ ๊ตฌํ•œ ๋งˆ์Šคํฌ M(i)M^{(i)}์„ ์ด์šฉํ•ด ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

BAP(SR(i),MR(i))=โˆ‘h,wSh,w(i)ร—Mh,w(i)โˆ‘h,wMh,w(i)BAP(\boldsymbol{S}^{(i)}_\mathcal{R}, \boldsymbol{M}^{(i)}_{\mathcal{R}}) = \frac{\sum_{h,w}{\boldsymbol{S}^{(i)}_{h,w} \times \boldsymbol{M}^{(i)}_{h,w}}}{\sum_{h,w}{\boldsymbol{M}^{(i)}_{h,w}}}

SR(i)\boldsymbol{S}^{(i)}_\mathcal{R} ๊ณผ MR(i)\boldsymbol{M}^{(i)}_\mathcal{R}์€ ๊ฐ๊ฐ R\mathcal{R}์˜ S(i)S^{(i)}์™€ M(i)\boldsymbol{M}^{(i)}์— ๋Œ€ํ•œ receptive๋ฅผ ์˜๋ฏธํ•˜๊ณ  (h,w)โˆˆR(h,w)\in\mathcal{R}์€ R\mathcal{R} ์œ„์˜ pixel์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ ์ด ๊ณผ์ •์„ nn๋ฒˆ ๋ฐ˜๋ณตํ•˜๋ฉฐ boundary์˜์—ญ์˜ ๊ฐ’์ด confidentํ•œ ์ฃผ๋ณ€์˜ ๊ฐ’๋“ค๋กœ ์ฑ„์›Œ์ง€๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ €ํฌ๋Š” initial boundary width r0r_0๋ฅผ 8, reduce rate ฮ”r\Delta{r}์„ 2, iteration ํšŸ์ˆ˜๋ฅผ 44, ๊ทธ๋ฆฌ๊ณ  receptive field R\mathcal{R}์˜ ํฌ๊ธฐ๋ฅผ 3ร—33\times3๋กœ ์ •์˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ, ์ €ํฌ๋Š” boundary ์˜์—ญ์˜ false positive์™€ false negative๊ฐ’๋“ค์„ ํšจ๊ณผ์ ์œผ๋กœ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

4-3. Dilated Smoothing

์œ„์˜ Iterative Boundary Suppression์€ boundary ์˜์—ญ์— ๋Œ€ํ•ด์„œ๋งŒ ๋™์ž‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฏธ์ง€์— ์กด์žฌํ•˜๋Š” ๋‹ค๋ฅธ false positive์™€ false negative์— ๋Œ€ํ•ด์„œ๋Š” ์ œ๊ฑฐํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. Gaussian smoothing์€ ์ด๋ฏธ์ง€ ๋‚ด์˜ ์ž‘์€ noise๋“ค์„ ํšจ๊ณผ์ ์œผ๋กœ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ €ํฌ๋Š” ์ž‘์€ irregular๋“ค (์ž‘์€ false positive, false negative๋“ค)์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด Gaussian Smoothing์„ ์ ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋” ๋‚˜์•„๊ฐ€ dilation์„ ์ฃผ์–ด ๋” ๋„“์€ receptive๋ฅผ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ณ ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค.

5. Experiment & Result

Experimental setup

์„ฑ๋Šฅ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด, ์ €ํฌ๋Š” area under receiver operating characteristics (AUROC)์™€ average precision (AP)๋ฅผ ์ธก์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ true positive rate 95%์—์„œ์˜ false positive rate (FPR95_{95})์„ ์ธก์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค. Qualitative analysis๋ฅผ ์œ„ํ•ด ์ €ํฌ๋Š” TPR95_{95}์—์„œ์˜ threshold๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๊ฐํ™”ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ €ํฌ๋Š” ์ €ํฌ์˜ ๋ฐฉ๋ฒ•๋ก ์„ ์•„๋ž˜์˜ ๋ฐ์ดํ„ฐ์…‹๋“ค์— ๋Œ€ํ•˜์—ฌ ๊ฒ€์ฆํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • Fishyscapes Lost & Found [9] - ๋ฐ•์Šค, ๊ณต ๋“ฑ์˜ 37 ์ข…๋ฅ˜์˜ ์ž‘์€ unexpected ์žฅ์• ๋ฌผ์ด ์žˆ๋Š” ์‹ค์ œ ์ฃผํ–‰ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์…‹

  • Fishyscapes Static [9] - Unexpected ์žฅ์• ๋ฌผ์ด Cityscapes validation ์ด๋ฏธ์ง€๋“ค์— ํ•ฉ์„ฑ๋œ ๋ฐ์ดํ„ฐ์…‹

  • Road Anomaly [5] -์ฃผํ–‰ ์ค‘์— ๋งˆ์ฃผ์น  ์ˆ˜ ์žˆ๋Š” ํ”ํ•˜์ง€ ์•Š์€ ๋„๋กœ ์œ„ ์œ„ํ—˜ ์ƒํ™ฉ ์ด๋ฏธ์ง€๋“ค (web collected)

Implementation Details

์ €ํฌ๋Š” DeepLabv3+ [10]์„ ์ €ํฌ์˜ segmentation architecture๋กœ ์„ ํƒํ•˜์˜€๊ณ  ResNet101 [11]์„ ์ €ํฌ์˜ backbone์œผ๋กœ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. Output stride๋Š” 8 ๊ทธ๋ฆฌ๊ณ  batch size๋ฅผ 8๋กœ ์„ค์ •ํ•˜์˜€์œผ๋ฉฐ ์ดˆ๊ธฐ learning rate๋ฅผ 1e-2 ๊ทธ๋ฆฌ๊ณ  momentum์„ 0.9๋กœ ์„ค์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ €ํฌ๋Š” segmentation model์„ Cityscapes ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด 60K iteration๋™์•ˆ pretrain์‹œ์ผฐ์œผ๋ฉฐ power๋ฅผ 0.9๋กœ ์„ค์ •ํ•œ polynomial learning rate scheduling์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ PSPNet [12]์—์„œ ์ œ์•ˆ๋œ auxiliary loss๋ฅผ loss weight ฮป\lambda 0.4๋กœ ์„ค์ •ํ•˜์—ฌ ํ•™์Šต์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. Data augmentation์„ ์œ„ํ•ด color ๊ทธ๋ฆฌ๊ณ  positional augmentation์„ ์ ์šฉํ•˜์˜€์œผ๋ฉฐ ๊ตฌ์ฒด์ ์œผ๋กœ color jittering, Gaussian blur, random horizontal flip, ๊ทธ๋ฆฌ๊ณ  random cropping์„ ์ ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ €ํฌ๋Š” class-uniform sampling [13, 14]๋ฅผ rate 0.5์˜ ๊ฐ’์œผ๋กœ ์ ์šฉ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

Iterative Boundary Suppression์˜ ๊ฒฝ์šฐ, boundary mask๋Š” dilated๋œ prediction map์—์„œ eroded ๋œ prediction map์„ ๋นผ์„œ ๊ตฌํ•˜์˜€์œผ๋ฉฐ ๊ทธ ๊ณผ์ •์—์„œ filter๋Š” L1 filter๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ €ํฌ๋Š” initial boundary width r0r_0๋ฅผ 8, iteration ํšŸ์ˆ˜ nn์„ 4, dilation rate dd๋ฅผ 6, ๊ทธ๋ฆฌ๊ณ  receptive field R\mathcal{R}๊ณผ smoothing kernel์˜ ํฌ๊ธฐ๋ฅผ 3ร—33\times3๊ณผ 7ร—77\times7๋กœ ๊ฐ๊ฐ ์ •์˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ตœ์ข… anomaly score๋Š” ํ•ด๋‹น ๊ณผ์ •์ด ๋๋‚œ ๋งˆ์ง€๋ง‰ SML๊ฐ’์— โˆ’1-1์„ ๊ณฑํ•œ ๊ฐ’์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ณต์‹์ ์ธ ๊ตฌํ˜„์€ ๋‹ค์Œ ๋งํฌ์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. https://github.com/shjung13/Standardized-max-logits

Qualitative Result

LostandFound
Static

์œ„์˜ ์ด๋ฏธ์ง€๋“ค์€ MSP, Max Logit, ๊ทธ๋ฆฌ๊ณ  ์ €ํฌ ๋ฐฉ๋ฒ•๋ก ์˜ Fishyscapes Lost&Found์™€ Static ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ qualitative ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํ•˜์–€์ƒ‰ pixel๋“ค์€ unexpected๋กœ ์˜ˆ์ธก๋œ pixel๋“ค์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋ณด์—ฌ์ง€๋Š” ๋ฐ”์™€ ๊ฐ™์ด, ์ €ํฌ์˜ ๋ฐฉ๋ฒ•๋ก ์€ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ก  ๋Œ€๋น„ false positive๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ง€์›Œ์ค๋‹ˆ๋‹ค.

Analysis

์œ„์˜ ์ด๋ฏธ์ง€๋Š” ์ €ํฌ์˜ SML, Iterative Boundary Suppression, ๊ทธ๋ฆฌ๊ณ  Dilated Smoothing ๊ฐ๊ฐ์„ ์ ์šฉํ–ˆ์„ ๋•Œ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋…ธ๋ž€์ƒ‰ ๋ฐ•์Šค์—์„œ๋Š” Iterative Boundary Suppression์ด ํšจ๊ณผ์ ์œผ๋กœ boundary ์˜์—ญ์„ ์ง€์šฐ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๊ณ  ๋…น์ƒ‰ ๋ฐ•์Šค์—์„œ๋Š” ์ž‘์€ false positive๋“ค์ด ํšจ๊ณผ์ ์œผ๋กœ ์‚ฌ๋ผ์ง€๋Š” ๊ฒƒ์„ ๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Quantitative Results

์ฒซ ๋ฒˆ์งธ๋กœ, public leaderboard์—์„œ์˜ ๊ฒฐ๊ณผ ๋จผ์ € ๋ณด์—ฌ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ทธ ํ›„, ๋‹ค์–‘ํ•œ validation set๋“ค์—์„œ์˜ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

Leaderboard

์œ„์˜ ํ‘œ๋Š” Fishyscapes Lost&Found test set๊ณผ Static test set์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ๋ณด์—ฌ์ง€๋Š” ๊ฒƒ์ฒ˜๋Ÿผ additional training๊ณผ ์ถ”๊ฐ€์ ์ธ OoD data๋ฅผ ์š”๊ตฌํ•˜์ง€ ์•Š๋Š” ๋ชจ๋ธ๋“ค ์ค‘์— ์ €ํฌ์˜ ๋ชจ๋ธ์ด Fishyscapes Lost&Found ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด state-of-the-art์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์„ ๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Validation

์œ„ ํ‘œ๋Š” Fishyscapes Lost&Found ์™€ Static์˜ validation set ๊ทธ๋ฆฌ๊ณ  Road Anomaly ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ๊ฒ€์ฆํ•œ ํ‘œ์ž…๋‹ˆ๋‹ค. ์ €ํฌ์˜ ๋ฐฉ๋ฒ•๋ก ์ด ๋‹ค๋ฅธ baseline๋“ค ๋Œ€๋น„ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋” ๋‚˜์•„๊ฐ€, ์ €ํฌ์˜ ๋ฐฉ๋ฒ•๋ก ์€ ์ ์€ ์–‘์˜ parameter์™€ computational cost๋“ค ๋™๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค. Image resynthesis ๋ฐฉ๋ฒ•๋ก  ๊ธฐ๋ฐ˜์ธ SynthCP์™€ ๋น„๊ตํ•˜์˜€์„ ๋•Œ, ์ถ”๊ฐ€์ ์œผ๋กœ ํ•„์š”ํ•œ ๊ณ„์‚ฐ ๋Ÿ‰์ด ์•„์ฃผ ์ ์€ ๊ฒƒ์„ ๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

6. Conclusion

์ €ํฌ์˜ ๋ฐฉ๋ฒ•๋ก ์€ ๋„๋กœ ์ฃผํ–‰ ์ค‘ unexpected obstacle์„ ์ฐพ์•„๋‚ด๊ธฐ ์œ„ํ•œ ๊ฐ„๋‹จํ•˜๋ฉด์„œ ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์ €ํฌ์˜ ๋ฐฉ๋ฒ•๋ก ์€ inference time๊ณผ memory์— overhead๊ฐ€ ์ ์Šต๋‹ˆ๋‹ค. ๋” ๋‚˜์•„๊ฐ€, ์ €ํฌ์˜ ๋ฐฉ๋ฒ•๋ก ์€ ๊ธฐ์กด์˜ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋ก ๋“ค๊ณผ ์ƒํ˜ธ ๋ณด์™„์ ์œผ๋กœ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์—ฌ์ „ํžˆ ๋‹จ์ ๋„ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ, ์ €ํฌ๋Š” ๋ชจ๋ธ์˜ output์ธ Max Logit์˜ ๋ถ„ํฌ์— ์˜์กดํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ถ”๊ฐ€์ ์ธ training์„ ์š”๊ตฌํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— pre-train๋œ ๋ชจ๋ธ์— ๋”ฐ๋ผ ์„ฑ๋Šฅ์ด ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ Dilated Smoothing์„ ํ†ตํ•˜๊ณ  ๋‚œ ํ›„, noise์ฒ˜๋Ÿผ ์ž‘์€ OoD ๋“ค์€ ์ œ๊ฑฐ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋‹จ์ ๋“ค์€ ์—ฌ์ „ํžˆ further work ์œผ๋กœ ๋‚จ์•„์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

Take home message

Class๋“ค ๊ฐ„์˜ ์„œ๋กœ ๋‹ค๋ฅธ ๋ถ„ํฌ๋ฅผ ๋งž์ถฐ ์ฃผ๋Š” ๊ฒƒ์€ Out-of-Distribution ํƒ์ง€์— ํšจ๊ณผ์ ์ผ ์ˆ˜ ์žˆ๋‹ค.

Post-processing ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์€ ์ž„์˜์˜ main segmentation network์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ํšจ๊ณผ์ ์ผ ์ˆ˜ ์žˆ๋‹ค.

Semantic segmentation์˜ Out-of-Distribution ํƒ์ง€์—์„œ, boundary ์˜์—ญ์€ ๋‹ค๋ฅธ ์˜์—ญ ๋Œ€๋น„ uncertainํ•˜๊ณ , ์ด๋Ÿฌํ•œ ์˜์—ญ์„ ์•Œ๋งž๊ฒŒ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ์€ ๋ช‡๋ช‡์˜ ๊ฒฝ์šฐ์— ํšจ๊ณผ์ ์ผ ์ˆ˜ ์žˆ๋‹ค.

Author / Reviewer information

Author

์ •์ƒํ—Œ (Sanghun Jung)

  • KAIST AI

  • Personal page: https://shjung13.github.io

  • Github: https://github.com/shjung13

  • LinkedIn: https://www.linkedin.com/in/sanghun-jung-b17a4b1b8/

Reviewer

  1. Korean name (English name): Affiliation / Contact information

  2. Korean name (English name): Affiliation / Contact information

  3. ...

Reference & Additional materials

  1. Sanghun Jung, Jungsoo Lee, Daehoon Gwak, Sungha Choi, and Jaegul Choo. Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation. In Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 15425-15434, 2021.

  2. Github: https://github.com/shjung13/Standardized-max-logits

  3. Citation of related work

    1. [1] Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In Proc. of the International Conference on Learning Representations (ICLR), 2017.

    2. [2] Dan Hendrycks, Steven Basart, Mantas Mazeika, Mohammadreza Mostajabi, Jacob Steinhardt, and Dawn Song. Scaling out-of-distribution detection for real-world settings. arXiv preprint arXiv:1911.11132, 2020.

    3. [3] Petra Bevandic, Ivan Kre ยด so, Marin Or ห‡ siห‡ c, and Sini ยด saห‡ Segvi ห‡ c.ยด Dense outlier detection and open-set recognition based on training with noisy negative images. arXiv preprint arXiv:2101.09193, 2021.

    4. [4] Robin Chan, Matthias Rottmann, and Hanno Gottschalk. Entropy maximization and meta classification for out-ofdistribution detection in semantic segmentation. arXiv preprint arXiv:2012.06575, 2020.

    5. [5] Krzysztof Lis, Krishna Nakka, Pascal Fua, and Mathieu Salzmann. Detecting the unexpected via image resynthesis. In Proc. of IEEE international conference on computer vision (ICCV), pages 2151โ€“2161, 2019.

    6. [6] Krzysztof Lis, Sina Honari, Pascal Fua, and Mathieu Salzmann. Detecting road obstacles by erasing them. arXiv preprint arXiv:2012.13633, 2020.

    7. [7] Yingda Xia, Yi Zhang, Fengze Liu, Wei Shen, and Alan L. Yuille. Synthesize then compare: Detecting failures and anomalies for semantic segmentation. In Proc. of the European Conference on Computer Vision (ECCV), pages 145โ€“ 161, 2020.

    8. [8] Toshiaki Ohgushi, Kenji Horiguchi, and Masao Yamanaka. Road obstacle detection method based on an autoencoder with semantic segmentation. In Proc. of the Asian Conference on Computer Vision (ACCV), pages 223โ€“238, 2020.

    9. [9] Hermann Blum, Paul-Edouard Sarlin, Juan Nieto, Roland Siegwart, and Cesar Cadena. The fishyscapes benchmark: Measuring blind spots in semantic segmentation. arXiv preprint arXiv:1904.03215, 2019.

    10. [10] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. of the European Conference on Computer Vision (ECCV), pages 801โ€“818, 2018.

    11. [11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pages 770โ€“778, 2016.

    12. [12] Hanchao Li, Pengfei Xiong, Jie An, and Lingxue Wang. Pyramid attention network for semantic segmentation. In Proc. of the British Machine Vision Conference (BMVC), page 285, 2018.

    13. [13] Samuel Rota Bulo, Lorenzo Porzi, and Peter Kontschieder. In-place activated batchnorm for memory-optimized training of dnns. In Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pages 5639โ€“5647, 2018.

    14. [14] Yi Zhu, Karan Sapra, Fitsum A Reda, Kevin J Shih, Shawn Newsam, Andrew Tao, and Bryan Catanzaro. Improving semantic segmentation via video propagation and label relaxation. In Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pages 8856โ€“8865, 2019.

  4. Other useful materials

  5. ...

Last updated

Was this helpful?