📝
Awesome reviews
  • Welcome
  • Paper review
    • [2022 Spring] Paper review
      • RobustNet [Eng]
      • DPT [Kor]
      • DALL-E [Kor]
      • VRT: A Video Restoration Transformer [Kor]
      • Barbershop [Kor]
      • Barbershop [Eng]
      • REFICS [ENG]
      • Deep texture manifold [Kor]
      • SlowFast Networks [Kor]
      • SCAN [Eng]
      • DPT [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Eng]
      • Patch Cratf : Video Denoising by Deep Modeling and Patch Matching [Eng]
      • LAFITE: Towards Language-Free Training for Text-to-Image Generation [Kor]
      • RegSeg [Eng]
      • D-NeRF [Eng]
      • SimCLR [Kor]
      • LabOR [Kor]
      • LabOR [Eng]
      • SegFormer [Kor]
      • Self-Calibrating Neural Radiance Fields [Kor]
      • Self-Calibrating Neural Radiance Fields [Eng]
      • GIRAFFE [Kor]
      • GIRAFFE [Eng]
      • DistConv [Kor]
      • SCAN [Eng]
      • slowfastnetworks [Kor]
      • Nesterov and Scale-Invariant Attack [Kor]
      • OutlierExposure [Eng]
      • TSNs [Kor]
      • TSNs [Eng]
      • Improving the Transferability of Adversarial Samples With Adversarial Transformations [Kor]
      • VOS: OOD detection by Virtual Outlier Synthesis [Kor]
      • MultitaskNeuralProcess [Kor]
      • RSLAD [Eng]
      • Deep Learning for 3D Point Cloud Understanding: A Survey [Eng]
      • BEIT [Kor]
      • Divergence-aware Federated Self-Supervised Learning [Eng]
      • NeRF-W [Kor]
      • Learning Multi-Scale Photo Exposure Correction [Eng]
      • ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [Eng]
      • ViT [Eng]
      • CrossTransformer [Kor]
      • NeRF [Kor]
      • RegNeRF [Kor]
      • Image Inpainting with External-internal Learning and Monochromic Bottleneck [Eng]
      • CLIP-NeRF [Kor]
      • CLIP-NeRF [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Kor]
      • DatasetGAN [Eng]
      • MOS [Kor]
      • MOS [Eng]
      • PlaNet [Eng]
      • MAE [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Eng]
      • Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning [Kor]
      • PointNet [Kor]
      • PointNet [Eng]
      • MSD AT [Kor]
      • MM-TTA [Kor]
      • MM-TTA [Eng]
      • M-CAM [Eng]
      • MipNerF [Kor]
      • The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [Eng]
      • Calibration [Eng]
      • CenterPoint [Kor]
      • YOLOX [Kor]
    • [2021 Fall] Paper review
      • DenseNet [Kor]
      • Time series as image [Kor]
      • mem3d [Kor]
      • GraSP [Kor]
      • DRLN [Kor]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Eng]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Kor]
      • NeSyXIL [Kor]
      • NeSyXIL [Eng]
      • RCAN [Kor]
      • RCAN [Eng]
      • MI-AOD [Kor]
      • MI-AOD [Eng]
      • DAFAS [Eng]
      • HyperGAN [Eng]
      • HyperGAN [Kor]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Eng]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Kor]
      • UPFlow [Eng]
      • GFP-GAN [Kor]
      • Federated Contrastive Learning [Kor]
      • Federated Contrastive Learning [Eng]
      • BGNN [Kor]
      • LP-KPN [Kor]
      • Feature Disruptive Attack [Kor]
      • Representative Interpretations [Kor]
      • Representative Interpretations [Eng]
      • Neural Discrete Representation Learning [KOR]
      • Neural Discrete Representation Learning [ENG]
      • Video Frame Interpolation via Adaptive Convolution [Kor]
      • Separation of hand motion and pose [kor]
      • pixelNeRF [Kor]
      • pixelNeRF [Eng]
      • SRResNet and SRGAN [Eng]
      • MZSR [Kor]
      • SANforSISR [Kor]
      • IPT [Kor]
      • Swin Transformer [kor]
      • CNN Cascade for Face Detection [Kor]
      • CapsNet [Kor]
      • Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Kor]
      • CSRNet [Kor]
      • ScrabbleGAN [Kor]
      • CenterTrack [Kor]
      • CenterTrack [Eng]
      • STSN [Kor]
      • STSN [Eng]
      • VL-BERT:Visual-Linguistic BERT [Kor]
      • VL-BERT:Visual-Linguistic BERT [Eng]
      • Squeeze-and-Attention Networks for Semantic segmentation [Kor]
      • Shot in the dark [Kor]
      • Noise2Self [Kor]
      • Noise2Self [Eng]
      • Dynamic Head [Kor]
      • PSPNet [Kor]
      • PSPNet [Eng]
      • CUT [Kor]
      • CLIP [Eng]
      • Local Implicit Image Function [Kor]
      • Local Implicit Image Function [Eng]
      • MetaAugment [Eng]
      • Show, Attend and Tell [Kor]
      • Transformer [Kor]
      • DETR [Eng]
      • Multimodal Versatile Network [Eng]
      • Multimodal Versatile Network [Kor]
      • BlockDrop [Kor]
      • MDETR [Kor]
      • MDETR [Eng]
      • FSCE [Kor]
      • waveletSR [Kor]
      • DAN-net [Eng]
      • Boosting Monocular Depth Estimation [Eng]
      • Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow [Kor]
      • Syn2real-generalization [Kor]
      • Syn2real-generalization [Eng]
      • GPS-Net [Kor]
      • Frustratingly Simple Few Shot Object Detection [Eng]
      • DCGAN [Kor]
      • RealSR [Kor]
      • AMP [Kor]
      • AMP [Eng]
      • RCNN [Kor]
      • MobileNet [Eng]
  • Author's note
    • [2022 Spring] Author's note
      • Pop-Out Motion [Kor]
    • [2021 Fall] Author's note
      • Standardized Max Logits [Eng]
      • Standardized Max Logits [Kor]
  • Dive into implementation
    • [2022 Spring] Implementation
      • Supervised Contrastive Replay [Kor]
      • Pose Recognition with Cascade Transformers [Eng]
    • [2021 Fall] Implementation
      • Diversity Input Method [Kor]
        • Source code
      • Diversity Input Method [Eng]
        • Source code
  • Contributors
    • [2022 Fall] Contributors
    • [2021 Fall] Contributors
  • How to contribute?
    • (Template) Paper review [Language]
    • (Template) Author's note [Language]
    • (Template) Implementation [Language]
  • KAIST AI
Powered by GitBook
On this page
  • 1. Problem definition
  • 2. Motivation
  • Related work
  • Idea
  • 3. Method
  • 3.1 Method Summary
  • 3.2 Details of methods
  • 3.3 Segment-based and Pinted-based
  • 4. Experiment & Result
  • Experimental setup
  • Result
  • 5. Conclusion
  • Take home message (오늘의 교훈)
  • Author / Reviewer information
  • Author
  • Reference & Additional materials

Was this helpful?

  1. Paper review
  2. [2022 Spring] Paper review

LabOR [Eng]

PreviousLabOR [Kor]NextSegFormer [Kor]

Last updated 2 years ago

Was this helpful?

한국어로 쓰인 리뷰를 읽으려면 를 누르세요.

1. Problem definition

  • Domain Adaptation (DA)

    • Domain adaptation is a field of computer vision.

    • The main goal of DA is to train a neural network on a source dataset and secure a good accuracy on the target dataset which is significantly different from the source dataset.

  • Unsupervised Domain Adaptation (UDA)

    • UDA has been actively studied, which is to transfer the knowledge from the labeled source dataset to the unlabeled target domain.

    • However, UDA still has a long way to go to reach the fully supervised performance

  • Domain Adaptation with Few Labels.

    • Due to the weakness of UDA, some researchers propose to use small parts of ground truth labels of the target dataset.

  • Semantic segmentation

    • The task of clustering parts of an image together which belong to the same object class, so-called a form of pixel-level prediction.

2. Motivation

  • In order to reduce the efforts of the human annotator, this work studies domain adaptation with the least labels of target dataset.

  • What points should be labeled to maximize the performance of the segmentation model?

  • This work aims to find these points, i.e, an efficient pixel-level sampling approach.

Related work

  1. Unsupervised Domain Adaptation

    • Adversarial learning approaches have aimed to minimize the discrepancy between source and target feature distribution

    • But, despite much research on UDA, the performance of UDA is much lower than that of supervised learning.

  2. Domain Adaptation with Few Labels

    • These works aim to find data (full image label) that would increase the performance of the model the most.

    • In contrast, this work focuses on the pixel-level label that would have the best potential performance increase.

Idea

  • This work utilizes "the novel predictor" to find uncertain regions that require human annotations and train these regions with ground truth labels in a supervised manner.

  • The uncertain regions would have the best potential performance increase instead of randomly picking labels.

3. Method

3.1 Method Summary

  • Please refer to [the below note-taken figure] and [the ordered method summary] together.

  • The below specification is corresponding to the green number in the figure.

  1. "The novel predictor"(pixel selector model (model B)) consists of a shared backbone model (Ex, Resnet50) and two pixel-level classifiers.

  2. A mini-batch of target data is passed through the backbone and both classifiers. Then we can extract two segmentation results.

  3. With two segmentation results, Inconsistent Mask(= Masks with different prediction results) is calculated.

  4. Among inconsistent mask, the human annotator give the real label, which is supervision signal for semantic segmentation model (model A)

3.2 Details of methods

  • Loss1,2: With the labeled source data and a few labeled target data, Train a model by minimizing cross-entropy loss

  • Equ 4: Inconsistent Mask(= Masks with different prediction results)

  • Loss5: Pseudo label loss for pixel selector model

3.3 Segment-based and Pinted-based

  • This work proposes two different labeling strategies, namely “Segment based Pixel-Labeling (SPL)” and “Point based Pixel-Labeling (PPL).”

  • SPL labels every pixel on the inconsistency mask in a segment-like manner.

  • PPL places its focus more on the labeling effort efficiency by finding the representative points. The process of finding this point is described below.

    1. Define the set of uncertain pixels D^(k)

    2. Compute the class prototype vector µ_(k) for each class k as the mean vectors of D^(k)

    3. Select the points that have the most similar probability pixels for each prototype vector.

4. Experiment & Result

Experimental setup

  • Implementation detail: (1) ResNet101 (2) Deeplab-V2

Result

  • Figure 1

    1. SPL shows the performance comparable with fully supervised learning

  • Table 1

    1. They show the quantitative results of both of our methods PPL and SPL compared to other state-of-the-art UDA methods.

    2. Even when compared to the fully supervised method, SPL is only down by 0.1 mIoU in comparison.

    3. PPL also shows significant performance gains over previous state-of the-art UDA or WDA methods.

  • Figure 2

    1. Qualitative result of SPL

    2. The proposed method, SPL, shows the correct segmentation result similar to the fully supervised approach.

5. Conclusion

  • This work proposes a new framework for domain adaptive semantic segmentation in a human-in-the-loop manner.

  • Two pixel-selection methods that we call “Segment based Pixel-Labeling” and “Point based Pixel-Labeling.” are introduced.

  • limitation (my thought)

Take home message (오늘의 교훈)

It may be more efficient to obtain a supervision signal at a low cost than using complex unsupervised methods to achieve very small performance gains.

Author / Reviewer information

Author

  1. 신인규 (Inkyu Shin)

    • KAIST / RCV Lab

    • https://dlsrbgg33.github.io/

  2. 김동진 (DongJin Kim)

    • KAIST / RCV Lab

    • https://sites.google.com/site/djkimcv/

  3. 조재원 (JaeWon Cho)

    • KAIST / RCV Lab

    • https://chojw.github.io/

Reference & Additional materials

  1. Citation of this paper

  2. Reference for this post

this approach has been studied mainly on output-level alignment [, ].

In order to mitigate the aforementioned limitation, some researchers attempt to use a few target labels.[, , , ]

Both above-labeled target data and originally labeled source data are used for optimize semantic segmentation model (model A) while output-level adversarial learning[] is also utilized.

For updating classifiers in pixel selector model (model B), parameters in each classifier are applied to the loss to push away from each other, i.e, maximization of the discrepancy between the two classifiers [].

Loss3: Adversarial learning (Details are in , paper)

Loss6: The classifier discrepancy maximization (Details are in paper)

Dataset: The source dataset is (synthetic dataset) and The target dataset is (real-world data).

LabOR (PPL and SPL) significantly outperforms previous UDA models ().

PPL achieves compatible performance improvements compared to .

In SPL and PPL, human annotators need to label [2.2% area] and [40 labeled points] per image. It sounds like it needs a few labeled annotations and efforts. But if I were a human annotator, I may think the effort for labeling [2.2% area] and [40 labeled points] is equal to one for labeling a full image using . Specifically, labeling [2.2% area] is likely to be harder to label (see the image in Sec. 3.3).

,

AdaptSeg
ADVENT
Alleviating semantic-level shift
Active Adversarial Domain Adaptation
Playing for Data
DA_weak_labels
AdaptSeg
Maximum classifier discrepancy
AdaptSeg
IAST
MCDDA
GTA5
Cityscape
IAST
WDA
Interactive segmentation tool
Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
D2ADA: Dynamic Density-aware Active Domain Adaptation for Semantic Segmentation
Unsupervised Domain Adaptation for Semantic Image Segmentation: a Comprehensive Survey
ADeADA: Adaptive Density-aware Active Domain Adaptation for Semantic Segmentation
MCDAL: Maximum Classifier Discrepancy for Active Learning
AdaptSeg
ADVENT
IAST
Alleviating semantic-level shift
Active Adversarial Domain Adaptation
Playing for Data
DA_weak_labels
Maximum classifier discrepancy
WDA
여기
drawing
drawing
drawing
drawing
drawing
drawing