SCAN [Eng]
Gansbeke et al. / SCAN Learning to Classify Images without Labels / ECCV 2020
1. Problem definition
The goal of unsupervised image classification is to group images into clusters such that images within the same cluster belong to the same or similar semantic classes, while images in different clusters are semantically dissimilar. This happens when there is no access to ground-truth semantic labels at training time, or the semantic classes or even their total number are not priori known. This paper proposes SCAN (Semantic Clustering by Adopting Nearest neighbors) which is a two-step approach for unsupervised image classification.
2. Motivation
In this section, we cover the related works and main idea of the proposed method, SCAN.
What if we have no ground-truth semantic labels during training?
What if we do not know the number of semantic labels?
It is very difficult to address these issues, however, it is very common in various real scenarios.
Thus, it is important to design a model that can learn semantic classes without any supervision, which we call unsupervised learning.
Related work
The task of unsupervised image classification have recently attracted considerable attention in two dominant paradigms.
Representation Learning
Step 1: Self-supervised learning (e.g., SimCLR, MoCo)
Step 2: Clustering (e.g., K-Means)
Problems: Imbalanced clusters or mismatch with semantic labels
End-to-end Learning
Iteratively refine the clusters based on the supervision from confident samples.
Maximize mutual information between an image and its augmentations.
Problems: Initialization-sensitive or heuristic mechanisms
Idea
To address the limitations of the existing methods, SCAN is designed as a two-step algorithm for unsupervised image classification.
Step 1: Learn feature representations and mine K-nearest neighbors.
Step 2: Train a clustering model to integrate nearest neighbors.
In step 1, instead applying K-means directly to the image features, SCAN mines the nearest neighbors of each image. In step 2, SCAN encourages invariance with respect to the nearest neighbors and not only with respect to augmentations.
3. Method
Step 1: Learn feature representations and mine K-nearest neighbors.
Certain pretext tasks may yield undesired features for semantic clustering.
Thus, SCAN selects a pretext task that minimizes the distance between an image and its augmentations.
Instance discrimination satisfies this condition.
For each image, mine K nearest neighbors.
The nearest neighbors tend to belong to the same semantic labels.
Step 2: Train a clustering model to integrate nearest neighbors.
Adopt the nearest neighbors as the prior for semantic clustering.
The first term imposes neighbors to have similar labels.
The second term maximizes the entropy to avoid assigning all samples to a single cluster.
Fine-tune the clustering model.
Some of the nearest neighbors may not belong to the same cluster.
But highly confident predictions tend to be classified to the proper cluster.
Filter the confident images whose soft assignment is above the threshold.
For the confident images, fine-tune the clustering model by minimizing the cross entropy loss.
4. Experimental Results
In this section, we summarize the experimental results of this paper.
Experimental setup
Dataset: CIFAR10, CIFAR100-20, STL10, ImageNet
Backbone: RestNet-18
Pretext task: SimCLR and MoCo
Baselines: DeepCluster, IIC, GAN, DAC, etc.
Evaluation metric: Accuracy, NMI, and ARI
The results are reported as the mean from 10 different runs of the models.
All experiments are performed with the same setting, e.g., augmentation, backbone, and pretext tasks.
Result
Here are the results of SCAN.
Comparison with SOTA
SCAN outperforms the prior work by large margins on ACC, NMI, and ARI.
Qualitative results
The obtained clusters are semantically meaningful.
Ablation study: Pretext tasks
SCAN selects a pretext task that minimizes the distance between an image and its augmentations.
RotNet does not minimize the distances.
Instance discrimination tasks satisfy the invariance criterion.
Ablation study: Self-labeling
Fine-tuning the network through self-labeling enhances the quality of clusters.
5. Conclusion
SCAN is a two-step algorithm for unsupervised image classification.
SCAN adopts nearest neighbors to be semantically similar.
SCAN outperforms the SOTA methods in unsupervised image classification.
Take home message
Nearest neighbors are likely to be semantically similar.
Filtering confident images and using them for supervision enhances the performance.
Author / Reviewer information
Author
이건 (Geon Lee)
KAIST AI
geonlee0325@kaist.ac.kr
Reviewer
TBD
Reference & Additional materials
Van Gansbeke, Wouter, et al. "Scan: Learning to classify images without labels." European Conference on Computer Vision. Springer, Cham, 2020.
Slides: https://wvangansbeke.github.io/pdfs/unsupervised_classification.pdf
Codes: https://github.com/wvangansbeke/Unsupervised-Classification
Last updated