📝
Awesome reviews
  • Welcome
  • Paper review
    • [2022 Spring] Paper review
      • RobustNet [Eng]
      • DPT [Kor]
      • DALL-E [Kor]
      • VRT: A Video Restoration Transformer [Kor]
      • Barbershop [Kor]
      • Barbershop [Eng]
      • REFICS [ENG]
      • Deep texture manifold [Kor]
      • SlowFast Networks [Kor]
      • SCAN [Eng]
      • DPT [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Eng]
      • Patch Cratf : Video Denoising by Deep Modeling and Patch Matching [Eng]
      • LAFITE: Towards Language-Free Training for Text-to-Image Generation [Kor]
      • RegSeg [Eng]
      • D-NeRF [Eng]
      • SimCLR [Kor]
      • LabOR [Kor]
      • LabOR [Eng]
      • SegFormer [Kor]
      • Self-Calibrating Neural Radiance Fields [Kor]
      • Self-Calibrating Neural Radiance Fields [Eng]
      • GIRAFFE [Kor]
      • GIRAFFE [Eng]
      • DistConv [Kor]
      • SCAN [Eng]
      • slowfastnetworks [Kor]
      • Nesterov and Scale-Invariant Attack [Kor]
      • OutlierExposure [Eng]
      • TSNs [Kor]
      • TSNs [Eng]
      • Improving the Transferability of Adversarial Samples With Adversarial Transformations [Kor]
      • VOS: OOD detection by Virtual Outlier Synthesis [Kor]
      • MultitaskNeuralProcess [Kor]
      • RSLAD [Eng]
      • Deep Learning for 3D Point Cloud Understanding: A Survey [Eng]
      • BEIT [Kor]
      • Divergence-aware Federated Self-Supervised Learning [Eng]
      • NeRF-W [Kor]
      • Learning Multi-Scale Photo Exposure Correction [Eng]
      • ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [Eng]
      • ViT [Eng]
      • CrossTransformer [Kor]
      • NeRF [Kor]
      • RegNeRF [Kor]
      • Image Inpainting with External-internal Learning and Monochromic Bottleneck [Eng]
      • CLIP-NeRF [Kor]
      • CLIP-NeRF [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Kor]
      • DatasetGAN [Eng]
      • MOS [Kor]
      • MOS [Eng]
      • PlaNet [Eng]
      • MAE [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Eng]
      • Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning [Kor]
      • PointNet [Kor]
      • PointNet [Eng]
      • MSD AT [Kor]
      • MM-TTA [Kor]
      • MM-TTA [Eng]
      • M-CAM [Eng]
      • MipNerF [Kor]
      • The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [Eng]
      • Calibration [Eng]
      • CenterPoint [Kor]
      • YOLOX [Kor]
    • [2021 Fall] Paper review
      • DenseNet [Kor]
      • Time series as image [Kor]
      • mem3d [Kor]
      • GraSP [Kor]
      • DRLN [Kor]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Eng]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Kor]
      • NeSyXIL [Kor]
      • NeSyXIL [Eng]
      • RCAN [Kor]
      • RCAN [Eng]
      • MI-AOD [Kor]
      • MI-AOD [Eng]
      • DAFAS [Eng]
      • HyperGAN [Eng]
      • HyperGAN [Kor]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Eng]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Kor]
      • UPFlow [Eng]
      • GFP-GAN [Kor]
      • Federated Contrastive Learning [Kor]
      • Federated Contrastive Learning [Eng]
      • BGNN [Kor]
      • LP-KPN [Kor]
      • Feature Disruptive Attack [Kor]
      • Representative Interpretations [Kor]
      • Representative Interpretations [Eng]
      • Neural Discrete Representation Learning [KOR]
      • Neural Discrete Representation Learning [ENG]
      • Video Frame Interpolation via Adaptive Convolution [Kor]
      • Separation of hand motion and pose [kor]
      • pixelNeRF [Kor]
      • pixelNeRF [Eng]
      • SRResNet and SRGAN [Eng]
      • MZSR [Kor]
      • SANforSISR [Kor]
      • IPT [Kor]
      • Swin Transformer [kor]
      • CNN Cascade for Face Detection [Kor]
      • CapsNet [Kor]
      • Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Kor]
      • CSRNet [Kor]
      • ScrabbleGAN [Kor]
      • CenterTrack [Kor]
      • CenterTrack [Eng]
      • STSN [Kor]
      • STSN [Eng]
      • VL-BERT:Visual-Linguistic BERT [Kor]
      • VL-BERT:Visual-Linguistic BERT [Eng]
      • Squeeze-and-Attention Networks for Semantic segmentation [Kor]
      • Shot in the dark [Kor]
      • Noise2Self [Kor]
      • Noise2Self [Eng]
      • Dynamic Head [Kor]
      • PSPNet [Kor]
      • PSPNet [Eng]
      • CUT [Kor]
      • CLIP [Eng]
      • Local Implicit Image Function [Kor]
      • Local Implicit Image Function [Eng]
      • MetaAugment [Eng]
      • Show, Attend and Tell [Kor]
      • Transformer [Kor]
      • DETR [Eng]
      • Multimodal Versatile Network [Eng]
      • Multimodal Versatile Network [Kor]
      • BlockDrop [Kor]
      • MDETR [Kor]
      • MDETR [Eng]
      • FSCE [Kor]
      • waveletSR [Kor]
      • DAN-net [Eng]
      • Boosting Monocular Depth Estimation [Eng]
      • Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow [Kor]
      • Syn2real-generalization [Kor]
      • Syn2real-generalization [Eng]
      • GPS-Net [Kor]
      • Frustratingly Simple Few Shot Object Detection [Eng]
      • DCGAN [Kor]
      • RealSR [Kor]
      • AMP [Kor]
      • AMP [Eng]
      • RCNN [Kor]
      • MobileNet [Eng]
  • Author's note
    • [2022 Spring] Author's note
      • Pop-Out Motion [Kor]
    • [2021 Fall] Author's note
      • Standardized Max Logits [Eng]
      • Standardized Max Logits [Kor]
  • Dive into implementation
    • [2022 Spring] Implementation
      • Supervised Contrastive Replay [Kor]
      • Pose Recognition with Cascade Transformers [Eng]
    • [2021 Fall] Implementation
      • Diversity Input Method [Kor]
        • Source code
      • Diversity Input Method [Eng]
        • Source code
  • Contributors
    • [2022 Fall] Contributors
    • [2021 Fall] Contributors
  • How to contribute?
    • (Template) Paper review [Language]
    • (Template) Author's note [Language]
    • (Template) Implementation [Language]
  • KAIST AI
Powered by GitBook
On this page
  • 1. Problem definition
  • - Traditional denoising methods & Supervised-learning method :
  • 2. Motivation
  • Related work
  • Idea
  • 3. Method
  • - Example : classic denoiser vs donut denoiser
  • - J-invariant function :
  • 4. Experiment & Result
  • Experimental setup
  • Result
  • 5. Conclusion
  • Take home message
  • Author / Reviewer information
  • Author
  • Reviewer
  • Reference & Additional materials

Was this helpful?

  1. Paper review
  2. [2021 Fall] Paper review

Noise2Self [Eng]

Batson et al. / Noise2Self Blind Denoising by Self-Supervision / ICML 2019

PreviousNoise2Self [Kor]NextDynamic Head [Kor]

Last updated 3 years ago

Was this helpful?

---> 한국어로 쓰인 리뷰를 읽으려면 를 누르세요.

In this paper, the authors proposed a self-supervision method that eliminates noise by using only noise data without pre-trained noise property and clean image labels.

1. Problem definition

- Traditional denoising methods & Supervised-learning method :

1) Pre-trained noise property In order to denoise using the traditional method, it was necessary to pre-train the noise property of the input image. In this case, it is difficult to fit properly when new noises that I have not trained on go into the input. In addition, it is necessary to adjust various hyperparameters, such as the degree of smoothness or how similar it will be, and the rank of the matrix. The performace is greatly influenced by the hypoerparameters.

2) Supervised Learning (Noise image & Clean image) Instead of denoising by learning about the pre-trained noise, there is also a supervised learning method in which noise images and clean images of the same target are trained by paring (x, y) values. (data-driven prior).

∣∣fΘ(x)−y∣∣2||f_{Θ}(x)-y||^2∣∣fΘ​(x)−y∣∣2 As you can see the above function, it aims to minimize the difference between the output of the denoise function fΘf_{Θ}fΘ​ and the ground truth y. Although it can be used for the convolution neural network and showed good performance in various areas, it requires a long time to train. Also, in case of bio-medical data, it is hard to train models through supervised learning because it is difficult to obtain a clean image using as ground truth images.

3) Supervised Learning (Noise image & Clean image)

In the paper 'Noise2noise', they proposed a method of training by putting a noise image instead of a clean image in the label y. They showed as much performance as the existing denoise model without a clean image label.

4) Propsed : Self-Supervised Learning (No Label image)

In this paper, they proposed a self-supervision denoising method which performed better than the traditional denoising methods and can train denoise without a clean image. It can remove the noise with only using noise images, without any label values.

  • self-supervised loss : L(f)=E∣∣f(x)−x∣∣2L(f) = E||f(x)-x||^2L(f)=E∣∣f(x)−x∣∣2 In case of self-supervised learning, the x-value itself input to the function instead of the label y-value as shown above. The simple proof of this equation is as follows.

E∣∣f(x)−x∣∣2=E∣∣f(x)−y+y−x∣∣2=E∣∣f(x)−y∣∣2+E∣∣x−y∣∣2E||f(x)-x||^2 = E||f(x)-y+y-x||^2 = E||f(x)-y||^2 + E||x-y||^2E∣∣f(x)−x∣∣2=E∣∣f(x)−y+y−x∣∣2=E∣∣f(x)−y∣∣2+E∣∣x−y∣∣2 The value x is the noise image and f(x) is the result values after going through J-invariant function. The value y is the clean image. So, E∣∣f(x)−y∣∣2E||f(x)-y||^{2}E∣∣f(x)−y∣∣2 means Ground truth loss and E∣∣x−y∣∣2E||x-y||^{2}E∣∣x−y∣∣2 means Loss Variance. As a result, self-supervised loss computed by sum of the Ground truth loss and Loss Variance. According to the above proof, self-supervised loss can be obtained without a label y value. Through learning to minimize self-supervised loss, the optimal denoiser function can be found.

2. Motivation

Related work

Here are various ways to remove noise.

1) Traditional Methods

  • Smoothness : This is a method of removing noise by calculating the average value of the surrounding pixels to make the center pixel similar to that of the surrounding pixels.

  • Self-Similarity : If there are similar patches in the image, replacing the central pixel value with a weighted average value between similar patches. However, the hyperparameters have a large impact on performance, and new datasets unknown noise distribution are unlikely to see the same performance.

2) Use the Convolutional Neural Networks

  • Generative : Differentiable generative models can denoise the data using generative adversarial loss.

  • Gaussianity : If the noise follows an indepentent identically distributied (i.i.d) Gaussian distribution, use Stein's unbiased risk estimator to train the neural network.

  • Sparsity : If the image is sparse, it can be used a compression algorithm to denoise. Howerver, in this case, artifacts remain in the image and it needs a long time to seek sparse features.

  • Compressibility : Noise is removed by compressing and decompressing noisy data..

  • Statistical Independence : UNet, which is trained to predict true noise by measuring independent noise from the same input data, can predict the real signals(Noise2Noise).

Idea

There are many methods on how to denoise images like traditional methods such as smoothness or using convolutional neural networks such as UNets recently. However, these methods were possible only when we know the noise property in advance or there was a clean image. So in this paper, they propsed the denoising method based on self-supervision rather than the supervised learning method

3. Method

- Example : classic denoiser vs donut denoiser

  • classic denoiser : Using a median filer that replaces each pixel with the median of a disk of radius r →grg_{r}gr​

  • donut denoiser : Same as classic denoiser except that the center part is removed, corresponding to the J-invariant referred to in the paper → frf_{r}fr​

In the graph above, you can see the difference for each denoiser. r is the radius of each filter. For the donut denoiser (blue), the self-supervised minimum (red arrow) is same (r=3) with the ground truth minimum. The vertical difference between self-supervised and ground truth means the variance of the noise. This is consistent result with the self-supervised loss equation seen above. On the other hand, in the case of classic denoiser (orange), self-supervised MSE continues to increase and there is no correlation with ground truth results. In other words, the donut denoiser can adjust the loss value with self-supervised, but the classic denoiser can adjust the loss value only when there is a ground truth.

- J-invariant function : fΘf_{Θ}fΘ​

fΘ(x)J:=gΘ(1Jㆍs(x)+1Jcㆍx)Jf_{Θ}(x)_{J} := g_{Θ}(1_{J}ㆍs(x) + 1_{J^c}ㆍx)_{J}fΘ​(x)J​:=gΘ​(1J​ㆍs(x)+1Jc​ㆍx)J​

J-invariant fΘf_{Θ}fΘ​ function can be defined as above. gΘg_{Θ}gΘ​ is any classical denoiser, and J(J ∈ J) is any partition of the pixels to distinguish it from adjacent pixels like a mask. s(x) is the function replacing each pixel with the average of its neighbors (interpolation). That is, fΘf_{Θ}fΘ​ function interpolates with s(x) only in the area corresponding to J, and applies the original image x to other areas, then applies the classical denoiser. The fΘ(x)Jf_{Θ(x)J}fΘ(x)J​ gets independent results with xJx_{J}xJ​ because gΘg_{Θ}gΘ​ was applied after interpolation of x in J space. As a result, image x performed better when gΘg_{Θ}gΘ​ was applied after interpolation than when applied directly to the classical denoiser gΘg_{Θ}gΘ​.

4. Experiment & Result

Experimental setup

Dataset
Hanzi
CellNet
ImageNet

Image size

64x64

128x128

128x128(RGB)

batch size

64

64

32

epoch

30

50

1

They compared the denoise performance when self-supervised by applying the J-invariant function. There are three data sets: Hanzi, a Chinese character data set, CellNet, a microscope data set and an ImageNet data set.

Unet and DnCNN were used to compare the performance of each. The Unet has the same image size in the contracting path and image size in the expanding path. Using these property, the skip connection can calculate the two images together. This is similar to the method of calculating x and f(x) with the same target data, as in the principle of self-supervised learning. They use a random partition of 25 subsets for J-invariant and Peak-Signal-to-Noise Raio (PSNR) was used as an evaluation metric. A larger value of PSNR means less loss of image quality.

Result

5. Conclusion

Noise2Self removes noise in a self-supervision method, unlike other denoising methods. The advantage of this model is that it can remove noise without prior learning about the noise and can be trained without a clean image. However, there is a limitation which is trade-off between bias and variance depending on how the size of J is set.

Take home message

Self-supervised learning can be used to learn without target data.

The noise data and the result of J-invariant function f(x) are independent of each other.

With self-supervised learning, it can denoise only with the noise data and the result of J-invariant function, without clean data.

Author / Reviewer information

Author

Hyunmin Hwang / 황현민

  • KAIST AI

  • hyunmin_hwang@kaist.ac.kr

Reviewer

  1. Korean name (English name): Affiliation / Contact information

  2. Korean name (English name): Affiliation / Contact information

  3. ...

Reference & Additional materials

The table above shows the PSNR results according to each data and denoise architecture. Noise2Self(N2S) performed better than NLM and BM3D, which are traditional denoiser methods, and shows similar performance to Noise2Truth(N2T) trained with clean target and Noise2Noise(N2N) trained together with independent noise.

When looking at the result of denoising as an image, N2S performed better at removing noise than NLM and BM3D and showed similar results to N2N and N2T.

Batson, J.D., & Royer, L.A. (2019). Noise2Self: Blind Denoising by Self-Supervision. ArXiv, abs/1901.11365. ()

Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., & Aila, T. (2018). Noise2noise: Learning image restoration without clean data. arXiv preprint arXiv:1803.04189. ()

Local averaging ()

Noise2Self github ()

MIA: Josh Batson, Noise2Self: Blind denoising by self-supervision YouTube video ()

PSNR ()

Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham. ()

GitHub Link
link
link
link
link
link
link
link
Batson Joshua, and Loic Royer, "Noise2self: Blind denoising by self-supervision.", International Conference on Machine Learning. PMLR, 2019.
여기
noise2noise
noise2self
denoiser
UNET
result1
result2