LabOR [Eng]
Last updated
Last updated
νκ΅μ΄λ‘ μ°μΈ 리뷰λ₯Ό μ½μΌλ €λ©΄ μ¬κΈ°λ₯Ό λλ₯΄μΈμ.
Domain Adaptation (DA)
Domain adaptation is a field of computer vision.
The main goal of DA is to train a neural network on a source dataset and secure a good accuracy on the target dataset which is significantly different from the source dataset.
Unsupervised Domain Adaptation (UDA)
UDA has been actively studied, which is to transfer the knowledge from the labeled source dataset to the unlabeled target domain.
However, UDA still has a long way to go to reach the fully supervised performance
Domain Adaptation with Few Labels.
Due to the weakness of UDA, some researchers propose to use small parts of ground truth labels of the target dataset.
Semantic segmentation
The task of clustering parts of an image together which belong to the same object class, so-called a form of pixel-level prediction.
In order to reduce the efforts of the human annotator, this work studies domain adaptation with the least labels of target dataset.
What points should be labeled to maximize the performance of the segmentation model?
This work aims to find these points, i.e, an efficient pixel-level sampling approach.
Unsupervised Domain Adaptation
Adversarial learning approaches have aimed to minimize the discrepancy between source and target feature distribution
But, despite much research on UDA, the performance of UDA is much lower than that of supervised learning.
Domain Adaptation with Few Labels
In order to mitigate the aforementioned limitation, some researchers attempt to use a few target labels.[Alleviating semantic-level shift, Active Adversarial Domain Adaptation, Playing for Data, DA_weak_labels]
These works aim to find data (full image label) that would increase the performance of the model the most.
In contrast, this work focuses on the pixel-level label that would have the best potential performance increase.
This work utilizes "the novel predictor" to find uncertain regions that require human annotations and train these regions with ground truth labels in a supervised manner.
The uncertain regions would have the best potential performance increase instead of randomly picking labels.
Please refer to [the below note-taken figure] and [the ordered method summary] together.
The below specification is corresponding to the green number in the figure.
"The novel predictor"(pixel selector model (model B)) consists of a shared backbone model (Ex, Resnet50) and two pixel-level classifiers.
A mini-batch of target data is passed through the backbone and both classifiers. Then we can extract two segmentation results.
With two segmentation results, Inconsistent Mask(= Masks with different prediction results) is calculated.
Among inconsistent mask, the human annotator give the real label, which is supervision signal for semantic segmentation model (model A)
Both above-labeled target data and originally labeled source data are used for optimize semantic segmentation model (model A) while output-level adversarial learning[AdaptSeg] is also utilized.
For updating classifiers in pixel selector model (model B), parameters in each classifier are applied to the loss to push away from each other, i.e, maximization of the discrepancy between the two classifiers [Maximum classifier discrepancy].
Loss1,2: With the labeled source data and a few labeled target data, Train a model by minimizing cross-entropy loss
Equ 4: Inconsistent Mask(= Masks with different prediction results)
Loss5: Pseudo label loss for pixel selector model
Loss6: The classifier discrepancy maximization (Details are in MCDDA paper)
This work proposes two different labeling strategies, namely βSegment based Pixel-Labeling (SPL)β and βPoint based Pixel-Labeling (PPL).β
SPL labels every pixel on the inconsistency mask in a segment-like manner.
PPL places its focus more on the labeling effort efficiency by finding the representative points. The process of finding this point is described below.
Define the set of uncertain pixels D^(k)
Compute the class prototype vector Β΅_(k) for each class k as the mean vectors of D^(k)
Select the points that have the most similar probability pixels for each prototype vector.
Implementation detail: (1) ResNet101 (2) Deeplab-V2
Table 1
They show the quantitative results of both of our methods PPL and SPL compared to other state-of-the-art UDA methods.
Even when compared to the fully supervised method, SPL is only down by 0.1 mIoU in comparison.
PPL also shows significant performance gains over previous state-of the-art UDA or WDA methods.
Figure 2
Qualitative result of SPL
The proposed method, SPL, shows the correct segmentation result similar to the fully supervised approach.
This work proposes a new framework for domain adaptive semantic segmentation in a human-in-the-loop manner.
Two pixel-selection methods that we call βSegment based Pixel-Labelingβ and βPoint based Pixel-Labeling.β are introduced.
limitation (my thought)
In SPL and PPL, human annotators need to label [2.2% area] and [40 labeled points] per image. It sounds like it needs a few labeled annotations and efforts. But if I were a human annotator, I may think the effort for labeling [2.2% area] and [40 labeled points] is equal to one for labeling a full image using Interactive segmentation tool. Specifically, labeling [2.2% area] is likely to be harder to label (see the image in Sec. 3.3).
It may be more efficient to obtain a supervision signal at a low cost than using complex unsupervised methods to achieve very small performance gains.
μ μΈκ· (Inkyu Shin)
KAIST / RCV Lab
https://dlsrbgg33.github.io/
κΉλμ§ (DongJin Kim)
KAIST / RCV Lab
https://sites.google.com/site/djkimcv/
μ‘°μ¬μ (JaeWon Cho)
KAIST / RCV Lab
https://chojw.github.io/
Citation of this paper