LabOR [Eng]

ν•œκ΅­μ–΄λ‘œ 쓰인 리뷰λ₯Ό 읽으렀면 μ—¬κΈ°λ₯Ό λˆ„λ₯΄μ„Έμš”.

1. Problem definition

  • Domain Adaptation (DA)

    • Domain adaptation is a field of computer vision.

    • The main goal of DA is to train a neural network on a source dataset and secure a good accuracy on the target dataset which is significantly different from the source dataset.

  • Unsupervised Domain Adaptation (UDA)

    • UDA has been actively studied, which is to transfer the knowledge from the labeled source dataset to the unlabeled target domain.

    • However, UDA still has a long way to go to reach the fully supervised performance

  • Domain Adaptation with Few Labels.

    • Due to the weakness of UDA, some researchers propose to use small parts of ground truth labels of the target dataset.

  • Semantic segmentation

    • The task of clustering parts of an image together which belong to the same object class, so-called a form of pixel-level prediction.

2. Motivation

  • In order to reduce the efforts of the human annotator, this work studies domain adaptation with the least labels of target dataset.

  • What points should be labeled to maximize the performance of the segmentation model?

  • This work aims to find these points, i.e, an efficient pixel-level sampling approach.

  1. Unsupervised Domain Adaptation

    • Adversarial learning approaches have aimed to minimize the discrepancy between source and target feature distribution

    • this approach has been studied mainly on output-level alignment [AdaptSeg, ADVENT].

    • But, despite much research on UDA, the performance of UDA is much lower than that of supervised learning.

  2. Domain Adaptation with Few Labels

Idea

  • This work utilizes "the novel predictor" to find uncertain regions that require human annotations and train these regions with ground truth labels in a supervised manner.

  • The uncertain regions would have the best potential performance increase instead of randomly picking labels.

3. Method

3.1 Method Summary

  • Please refer to [the below note-taken figure] and [the ordered method summary] together.

  • The below specification is corresponding to the green number in the figure.

  1. "The novel predictor"(pixel selector model (model B)) consists of a shared backbone model (Ex, Resnet50) and two pixel-level classifiers.

  2. A mini-batch of target data is passed through the backbone and both classifiers. Then we can extract two segmentation results.

  3. With two segmentation results, Inconsistent Mask(= Masks with different prediction results) is calculated.

  4. Among inconsistent mask, the human annotator give the real label, which is supervision signal for semantic segmentation model (model A)

  5. Both above-labeled target data and originally labeled source data are used for optimize semantic segmentation model (model A) while output-level adversarial learning[AdaptSeg] is also utilized.

  6. For updating classifiers in pixel selector model (model B), parameters in each classifier are applied to the loss to push away from each other, i.e, maximization of the discrepancy between the two classifiers [Maximum classifier discrepancy].

3.2 Details of methods

  • Loss1,2: With the labeled source data and a few labeled target data, Train a model by minimizing cross-entropy loss

  • Loss3: Adversarial learning (Details are in AdaptSeg, IAST paper)

  • Equ 4: Inconsistent Mask(= Masks with different prediction results)

  • Loss5: Pseudo label loss for pixel selector model

  • Loss6: The classifier discrepancy maximization (Details are in MCDDA paper)

3.3 Segment-based and Pinted-based

  • This work proposes two different labeling strategies, namely β€œSegment based Pixel-Labeling (SPL)” and β€œPoint based Pixel-Labeling (PPL).”

  • SPL labels every pixel on the inconsistency mask in a segment-like manner.

  • PPL places its focus more on the labeling effort efficiency by finding the representative points. The process of finding this point is described below.

    1. Define the set of uncertain pixels D^(k)

    2. Compute the class prototype vector Β΅_(k) for each class k as the mean vectors of D^(k)

    3. Select the points that have the most similar probability pixels for each prototype vector.

4. Experiment & Result

Experimental setup

  • Dataset: The source dataset is GTA5 (synthetic dataset) and The target dataset is Cityscape(real-world data).

  • Implementation detail: (1) ResNet101 (2) Deeplab-V2

Result

  • Figure 1

    1. LabOR (PPL and SPL) significantly outperforms previous UDA models (IAST).

    2. SPL shows the performance comparable with fully supervised learning

    3. PPL achieves compatible performance improvements compared to WDA.

  • Table 1

    1. They show the quantitative results of both of our methods PPL and SPL compared to other state-of-the-art UDA methods.

    2. Even when compared to the fully supervised method, SPL is only down by 0.1 mIoU in comparison.

    3. PPL also shows significant performance gains over previous state-of the-art UDA or WDA methods.

  • Figure 2

    1. Qualitative result of SPL

    2. The proposed method, SPL, shows the correct segmentation result similar to the fully supervised approach.

5. Conclusion

  • This work proposes a new framework for domain adaptive semantic segmentation in a human-in-the-loop manner.

  • Two pixel-selection methods that we call β€œSegment based Pixel-Labeling” and β€œPoint based Pixel-Labeling.” are introduced.

  • limitation (my thought)

    1. In SPL and PPL, human annotators need to label [2.2% area] and [40 labeled points] per image. It sounds like it needs a few labeled annotations and efforts. But if I were a human annotator, I may think the effort for labeling [2.2% area] and [40 labeled points] is equal to one for labeling a full image using Interactive segmentation tool. Specifically, labeling [2.2% area] is likely to be harder to label (see the image in Sec. 3.3).

Take home message (였늘의 κ΅ν›ˆ)

It may be more efficient to obtain a supervision signal at a low cost than using complex unsupervised methods to achieve very small performance gains.

Author / Reviewer information

Author

  1. μ‹ μΈκ·œ (Inkyu Shin)

    • KAIST / RCV Lab

    • https://dlsrbgg33.github.io/

  2. 김동진 (DongJin Kim)

    • KAIST / RCV Lab

    • https://sites.google.com/site/djkimcv/

  3. μ‘°μž¬μ› (JaeWon Cho)

    • KAIST / RCV Lab

    • https://chojw.github.io/

Reference & Additional materials

Last updated