OutlierExposure [Eng]

Hendrycks et al. / Deep Anomaly Detection using Outlier Exposure / ICLR 2019

1. Problem definition

Detection of Out-Of-Distribution data is of paramount importance. Despite numerous researches in the domain, it is not possible to make outlier detectors learn representations or gain knowledge of out-of-distribution data beforehand. Anomaly detection problems have previously been explored with the usage of Deep Learning classifiers which assigns anomaly scores to inputs. They do so by using knowledge from in-distribution data. However, in the real world outlier distributions are imperceptible by the detector and this necessitates a reciprocal method that could potentially handle it.

2. Motivation

Hendrycks et al. proposed the usage of Maximum Softmax Probability of prediction to evaluate whether the example is out-of-distribution[1]. The MSP will have a lower softmax probability on samples that are out of distribution. It is also possible to use a supplementary branch on the pre-trained classifier and produce a OOD score[2]. By using adversarial perturbations on the input data [3], Goodfellow et al. successfully made the maximum softmax probability between in-distribution data and OOD more distinctive. In object detection tasks, models often classify OOD samples with high confidence therefore Lee et al. trains GAN concurrently with a classifier [4] and this results in the classifier having lower confidence for the GAN samples. Salakhutdinov et al. pre trained deep learning classifier on web images to learn feature representations[5], Zeiler & Fergus et al. pre-trained on ImageNet and deems it useful for fine tuning applications[6]. Similarly, Radford et al. pretrained unsupervised networks on Amazon reviews to acquire customer sentiment representations which could further be fine-tuned on sentiment analysis tasks[7].

Idea

L denotes the cross-entropy loss function for the classification task and LOE represents the loss for OOD samples. To minimize this loss, is the hyperparameter to be tuned. The contributions of this paper include:

  • The comparison of results between baseline Out of distribution Detectors such as Maximum Softmax Probability and PixelCNN++ * coupled with proposed Outlier Exposure method. These are for Computer Vision tasks.

  • The authors demonstrate the benefits of OE on NLP tasks as well.

  • Compares the proposed OE approach with synthetic data exposure.

3. Methodology

The following were the in-distribution datasets:

  • SVHN dataset

  • CIFAR

  • Tiny ImageNet

  • Places365

  • 20 Newsgroups

  • TREC

  • SST The authors had used three datasets for the Outlier Exposure task and they included:

  • 80 Million Tiny Images: to prevent any overlaps, all examples of 80 million Tiny images from the CIFAR dataset were removed.

  • ImageNet-22K: This dataset was used as the OOD dataset for Tiny ImageNet and Places365. Images from the ImageNet were removed to ensure disjointedness between test OOD and Outlier exposure dataset.

  • WikiText-2: Each of the texts in WikiText-2 were treated as a sample.

4. Experiment & Results

5. Conclusion

Outlier exposure, when combined with baseline state of the art Out of distribution detection techniques can produce very reliable results both in the field of Computer Vision and Natural Language Processing. The proposed technique has also proven to be computationally efficient.

References:

[1]Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. International Conference on Learning Representations, 2017. [2] Terrance DeVries and Graham W Taylor. Learning confidence for out-of-distribution detection in neural networks. arXiv preprint arXiv:1802.04865, 2018. [3]Goodfellow, Ian & Shlens, Jonathon & Szegedy, Christian. (2014). Explaining and Harnessing Adversarial Examples. arXiv 1412.6572. [4]Kimin Lee, Honglak Lee, and Jinwoo Shing. Training confidence calibrated classifiers for detecting out of distribution samples. International Conference on Learning Representations, 2018. [5]Ruslan Salakhutdinov, Joshua Tenenbaum, and Antonio Torralba. Learning to learn with compound hd models. In Neural Information Processing Systems, 2011. [6]Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. Springer, 2014.

[7]Alec Radford, Rafal Jozefowicz, and Ilya Sutskever. Learning to generate reviews and discovering sentiment. arXiv preprint, 2017.

Last updated