📝
Awesome reviews
  • Welcome
  • Paper review
    • [2022 Spring] Paper review
      • RobustNet [Eng]
      • DPT [Kor]
      • DALL-E [Kor]
      • VRT: A Video Restoration Transformer [Kor]
      • Barbershop [Kor]
      • Barbershop [Eng]
      • REFICS [ENG]
      • Deep texture manifold [Kor]
      • SlowFast Networks [Kor]
      • SCAN [Eng]
      • DPT [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Eng]
      • Patch Cratf : Video Denoising by Deep Modeling and Patch Matching [Eng]
      • LAFITE: Towards Language-Free Training for Text-to-Image Generation [Kor]
      • RegSeg [Eng]
      • D-NeRF [Eng]
      • SimCLR [Kor]
      • LabOR [Kor]
      • LabOR [Eng]
      • SegFormer [Kor]
      • Self-Calibrating Neural Radiance Fields [Kor]
      • Self-Calibrating Neural Radiance Fields [Eng]
      • GIRAFFE [Kor]
      • GIRAFFE [Eng]
      • DistConv [Kor]
      • SCAN [Eng]
      • slowfastnetworks [Kor]
      • Nesterov and Scale-Invariant Attack [Kor]
      • OutlierExposure [Eng]
      • TSNs [Kor]
      • TSNs [Eng]
      • Improving the Transferability of Adversarial Samples With Adversarial Transformations [Kor]
      • VOS: OOD detection by Virtual Outlier Synthesis [Kor]
      • MultitaskNeuralProcess [Kor]
      • RSLAD [Eng]
      • Deep Learning for 3D Point Cloud Understanding: A Survey [Eng]
      • BEIT [Kor]
      • Divergence-aware Federated Self-Supervised Learning [Eng]
      • NeRF-W [Kor]
      • Learning Multi-Scale Photo Exposure Correction [Eng]
      • ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [Eng]
      • ViT [Eng]
      • CrossTransformer [Kor]
      • NeRF [Kor]
      • RegNeRF [Kor]
      • Image Inpainting with External-internal Learning and Monochromic Bottleneck [Eng]
      • CLIP-NeRF [Kor]
      • CLIP-NeRF [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Kor]
      • DatasetGAN [Eng]
      • MOS [Kor]
      • MOS [Eng]
      • PlaNet [Eng]
      • MAE [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Eng]
      • Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning [Kor]
      • PointNet [Kor]
      • PointNet [Eng]
      • MSD AT [Kor]
      • MM-TTA [Kor]
      • MM-TTA [Eng]
      • M-CAM [Eng]
      • MipNerF [Kor]
      • The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [Eng]
      • Calibration [Eng]
      • CenterPoint [Kor]
      • YOLOX [Kor]
    • [2021 Fall] Paper review
      • DenseNet [Kor]
      • Time series as image [Kor]
      • mem3d [Kor]
      • GraSP [Kor]
      • DRLN [Kor]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Eng]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Kor]
      • NeSyXIL [Kor]
      • NeSyXIL [Eng]
      • RCAN [Kor]
      • RCAN [Eng]
      • MI-AOD [Kor]
      • MI-AOD [Eng]
      • DAFAS [Eng]
      • HyperGAN [Eng]
      • HyperGAN [Kor]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Eng]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Kor]
      • UPFlow [Eng]
      • GFP-GAN [Kor]
      • Federated Contrastive Learning [Kor]
      • Federated Contrastive Learning [Eng]
      • BGNN [Kor]
      • LP-KPN [Kor]
      • Feature Disruptive Attack [Kor]
      • Representative Interpretations [Kor]
      • Representative Interpretations [Eng]
      • Neural Discrete Representation Learning [KOR]
      • Neural Discrete Representation Learning [ENG]
      • Video Frame Interpolation via Adaptive Convolution [Kor]
      • Separation of hand motion and pose [kor]
      • pixelNeRF [Kor]
      • pixelNeRF [Eng]
      • SRResNet and SRGAN [Eng]
      • MZSR [Kor]
      • SANforSISR [Kor]
      • IPT [Kor]
      • Swin Transformer [kor]
      • CNN Cascade for Face Detection [Kor]
      • CapsNet [Kor]
      • Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Kor]
      • CSRNet [Kor]
      • ScrabbleGAN [Kor]
      • CenterTrack [Kor]
      • CenterTrack [Eng]
      • STSN [Kor]
      • STSN [Eng]
      • VL-BERT:Visual-Linguistic BERT [Kor]
      • VL-BERT:Visual-Linguistic BERT [Eng]
      • Squeeze-and-Attention Networks for Semantic segmentation [Kor]
      • Shot in the dark [Kor]
      • Noise2Self [Kor]
      • Noise2Self [Eng]
      • Dynamic Head [Kor]
      • PSPNet [Kor]
      • PSPNet [Eng]
      • CUT [Kor]
      • CLIP [Eng]
      • Local Implicit Image Function [Kor]
      • Local Implicit Image Function [Eng]
      • MetaAugment [Eng]
      • Show, Attend and Tell [Kor]
      • Transformer [Kor]
      • DETR [Eng]
      • Multimodal Versatile Network [Eng]
      • Multimodal Versatile Network [Kor]
      • BlockDrop [Kor]
      • MDETR [Kor]
      • MDETR [Eng]
      • FSCE [Kor]
      • waveletSR [Kor]
      • DAN-net [Eng]
      • Boosting Monocular Depth Estimation [Eng]
      • Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow [Kor]
      • Syn2real-generalization [Kor]
      • Syn2real-generalization [Eng]
      • GPS-Net [Kor]
      • Frustratingly Simple Few Shot Object Detection [Eng]
      • DCGAN [Kor]
      • RealSR [Kor]
      • AMP [Kor]
      • AMP [Eng]
      • RCNN [Kor]
      • MobileNet [Eng]
  • Author's note
    • [2022 Spring] Author's note
      • Pop-Out Motion [Kor]
    • [2021 Fall] Author's note
      • Standardized Max Logits [Eng]
      • Standardized Max Logits [Kor]
  • Dive into implementation
    • [2022 Spring] Implementation
      • Supervised Contrastive Replay [Kor]
      • Pose Recognition with Cascade Transformers [Eng]
    • [2021 Fall] Implementation
      • Diversity Input Method [Kor]
        • Source code
      • Diversity Input Method [Eng]
        • Source code
  • Contributors
    • [2022 Fall] Contributors
    • [2021 Fall] Contributors
  • How to contribute?
    • (Template) Paper review [Language]
    • (Template) Author's note [Language]
    • (Template) Implementation [Language]
  • KAIST AI
Powered by GitBook
On this page
  • 1. Problem definition
  • 2. Motivation
  • Related work
  • Idea
  • 3. Method
  • Step 1: Learn feature representations and mine K-nearest neighbors.
  • Step 2: Train a clustering model to integrate nearest neighbors.
  • 4. Experimental Results
  • Experimental setup
  • Result
  • 5. Conclusion
  • Take home message
  • Author / Reviewer information
  • Author
  • Reviewer
  • Reference & Additional materials

Was this helpful?

  1. Paper review
  2. [2022 Spring] Paper review

SCAN [Eng]

Gansbeke et al. / SCAN Learning to Classify Images without Labels / ECCV 2020

1. Problem definition

The goal of unsupervised image classification is to group images into clusters such that images within the same cluster belong to the same or similar semantic classes, while images in different clusters are semantically dissimilar. This happens when there is no access to ground-truth semantic labels at training time, or the semantic classes or even their total number are not priori known. This paper proposes SCAN (Semantic Clustering by Adopting Nearest neighbors) which is a two-step approach for unsupervised image classification.

2. Motivation

In this section, we cover the related works and main idea of the proposed method, SCAN.

What if we have no ground-truth semantic labels during training?

What if we do not know the number of semantic labels?

It is very difficult to address these issues, however, it is very common in various real scenarios.

Thus, it is important to design a model that can learn semantic classes without any supervision, which we call unsupervised learning.

Related work

The task of unsupervised image classification have recently attracted considerable attention in two dominant paradigms.

Representation Learning

  • Step 1: Self-supervised learning (e.g., SimCLR, MoCo)

  • Step 2: Clustering (e.g., K-Means)

  • Problems: Imbalanced clusters or mismatch with semantic labels

End-to-end Learning

  • Iteratively refine the clusters based on the supervision from confident samples.

  • Maximize mutual information between an image and its augmentations.

  • Problems: Initialization-sensitive or heuristic mechanisms

Idea

To address the limitations of the existing methods, SCAN is designed as a two-step algorithm for unsupervised image classification.

  • Step 1: Learn feature representations and mine K-nearest neighbors.

  • Step 2: Train a clustering model to integrate nearest neighbors.

In step 1, instead applying K-means directly to the image features, SCAN mines the nearest neighbors of each image. In step 2, SCAN encourages invariance with respect to the nearest neighbors and not only with respect to augmentations.

3. Method

Step 1: Learn feature representations and mine K-nearest neighbors.

  • Certain pretext tasks may yield undesired features for semantic clustering.

    • Thus, SCAN selects a pretext task that minimizes the distance between an image and its augmentations.

    • Instance discrimination satisfies this condition.

  • For each image, mine K nearest neighbors.

    • The nearest neighbors tend to belong to the same semantic labels.

Step 2: Train a clustering model to integrate nearest neighbors.

  • Adopt the nearest neighbors as the prior for semantic clustering.

    • The first term imposes neighbors to have similar labels.

    • The second term maximizes the entropy to avoid assigning all samples to a single cluster.

  • Fine-tune the clustering model.

    • Some of the nearest neighbors may not belong to the same cluster.

    • But highly confident predictions tend to be classified to the proper cluster.

    • Filter the confident images whose soft assignment is above the threshold.

    • For the confident images, fine-tune the clustering model by minimizing the cross entropy loss.

4. Experimental Results

In this section, we summarize the experimental results of this paper.

Experimental setup

  • Dataset: CIFAR10, CIFAR100-20, STL10, ImageNet

  • Backbone: RestNet-18

  • Pretext task: SimCLR and MoCo

  • Baselines: DeepCluster, IIC, GAN, DAC, etc.

  • Evaluation metric: Accuracy, NMI, and ARI

The results are reported as the mean from 10 different runs of the models.

All experiments are performed with the same setting, e.g., augmentation, backbone, and pretext tasks.

Result

Here are the results of SCAN.

Comparison with SOTA

SCAN outperforms the prior work by large margins on ACC, NMI, and ARI.

Qualitative results

The obtained clusters are semantically meaningful.

Ablation study: Pretext tasks

SCAN selects a pretext task that minimizes the distance between an image and its augmentations.

  • RotNet does not minimize the distances.

  • Instance discrimination tasks satisfy the invariance criterion.

Ablation study: Self-labeling

Fine-tuning the network through self-labeling enhances the quality of clusters.

5. Conclusion

  • SCAN is a two-step algorithm for unsupervised image classification.

  • SCAN adopts nearest neighbors to be semantically similar.

  • SCAN outperforms the SOTA methods in unsupervised image classification.

Take home message

Nearest neighbors are likely to be semantically similar.

Filtering confident images and using them for supervision enhances the performance.

Author / Reviewer information

Author

이건 (Geon Lee)

  • KAIST AI

  • geonlee0325@kaist.ac.kr

Reviewer

TBD

Reference & Additional materials

  • Van Gansbeke, Wouter, et al. "Scan: Learning to classify images without labels." European Conference on Computer Vision. Springer, Cham, 2020.

  • Slides: https://wvangansbeke.github.io/pdfs/unsupervised_classification.pdf

  • Codes: https://github.com/wvangansbeke/Unsupervised-Classification

PreviousSlowFast Networks [Kor]NextChaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Kor]

Last updated 2 years ago

Was this helpful?