📝
Awesome reviews
  • Welcome
  • Paper review
    • [2022 Spring] Paper review
      • RobustNet [Eng]
      • DPT [Kor]
      • DALL-E [Kor]
      • VRT: A Video Restoration Transformer [Kor]
      • Barbershop [Kor]
      • Barbershop [Eng]
      • REFICS [ENG]
      • Deep texture manifold [Kor]
      • SlowFast Networks [Kor]
      • SCAN [Eng]
      • DPT [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Eng]
      • Patch Cratf : Video Denoising by Deep Modeling and Patch Matching [Eng]
      • LAFITE: Towards Language-Free Training for Text-to-Image Generation [Kor]
      • RegSeg [Eng]
      • D-NeRF [Eng]
      • SimCLR [Kor]
      • LabOR [Kor]
      • LabOR [Eng]
      • SegFormer [Kor]
      • Self-Calibrating Neural Radiance Fields [Kor]
      • Self-Calibrating Neural Radiance Fields [Eng]
      • GIRAFFE [Kor]
      • GIRAFFE [Eng]
      • DistConv [Kor]
      • SCAN [Eng]
      • slowfastnetworks [Kor]
      • Nesterov and Scale-Invariant Attack [Kor]
      • OutlierExposure [Eng]
      • TSNs [Kor]
      • TSNs [Eng]
      • Improving the Transferability of Adversarial Samples With Adversarial Transformations [Kor]
      • VOS: OOD detection by Virtual Outlier Synthesis [Kor]
      • MultitaskNeuralProcess [Kor]
      • RSLAD [Eng]
      • Deep Learning for 3D Point Cloud Understanding: A Survey [Eng]
      • BEIT [Kor]
      • Divergence-aware Federated Self-Supervised Learning [Eng]
      • NeRF-W [Kor]
      • Learning Multi-Scale Photo Exposure Correction [Eng]
      • ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [Eng]
      • ViT [Eng]
      • CrossTransformer [Kor]
      • NeRF [Kor]
      • RegNeRF [Kor]
      • Image Inpainting with External-internal Learning and Monochromic Bottleneck [Eng]
      • CLIP-NeRF [Kor]
      • CLIP-NeRF [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Kor]
      • DatasetGAN [Eng]
      • MOS [Kor]
      • MOS [Eng]
      • PlaNet [Eng]
      • MAE [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Eng]
      • Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning [Kor]
      • PointNet [Kor]
      • PointNet [Eng]
      • MSD AT [Kor]
      • MM-TTA [Kor]
      • MM-TTA [Eng]
      • M-CAM [Eng]
      • MipNerF [Kor]
      • The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [Eng]
      • Calibration [Eng]
      • CenterPoint [Kor]
      • YOLOX [Kor]
    • [2021 Fall] Paper review
      • DenseNet [Kor]
      • Time series as image [Kor]
      • mem3d [Kor]
      • GraSP [Kor]
      • DRLN [Kor]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Eng]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Kor]
      • NeSyXIL [Kor]
      • NeSyXIL [Eng]
      • RCAN [Kor]
      • RCAN [Eng]
      • MI-AOD [Kor]
      • MI-AOD [Eng]
      • DAFAS [Eng]
      • HyperGAN [Eng]
      • HyperGAN [Kor]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Eng]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Kor]
      • UPFlow [Eng]
      • GFP-GAN [Kor]
      • Federated Contrastive Learning [Kor]
      • Federated Contrastive Learning [Eng]
      • BGNN [Kor]
      • LP-KPN [Kor]
      • Feature Disruptive Attack [Kor]
      • Representative Interpretations [Kor]
      • Representative Interpretations [Eng]
      • Neural Discrete Representation Learning [KOR]
      • Neural Discrete Representation Learning [ENG]
      • Video Frame Interpolation via Adaptive Convolution [Kor]
      • Separation of hand motion and pose [kor]
      • pixelNeRF [Kor]
      • pixelNeRF [Eng]
      • SRResNet and SRGAN [Eng]
      • MZSR [Kor]
      • SANforSISR [Kor]
      • IPT [Kor]
      • Swin Transformer [kor]
      • CNN Cascade for Face Detection [Kor]
      • CapsNet [Kor]
      • Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Kor]
      • CSRNet [Kor]
      • ScrabbleGAN [Kor]
      • CenterTrack [Kor]
      • CenterTrack [Eng]
      • STSN [Kor]
      • STSN [Eng]
      • VL-BERT:Visual-Linguistic BERT [Kor]
      • VL-BERT:Visual-Linguistic BERT [Eng]
      • Squeeze-and-Attention Networks for Semantic segmentation [Kor]
      • Shot in the dark [Kor]
      • Noise2Self [Kor]
      • Noise2Self [Eng]
      • Dynamic Head [Kor]
      • PSPNet [Kor]
      • PSPNet [Eng]
      • CUT [Kor]
      • CLIP [Eng]
      • Local Implicit Image Function [Kor]
      • Local Implicit Image Function [Eng]
      • MetaAugment [Eng]
      • Show, Attend and Tell [Kor]
      • Transformer [Kor]
      • DETR [Eng]
      • Multimodal Versatile Network [Eng]
      • Multimodal Versatile Network [Kor]
      • BlockDrop [Kor]
      • MDETR [Kor]
      • MDETR [Eng]
      • FSCE [Kor]
      • waveletSR [Kor]
      • DAN-net [Eng]
      • Boosting Monocular Depth Estimation [Eng]
      • Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow [Kor]
      • Syn2real-generalization [Kor]
      • Syn2real-generalization [Eng]
      • GPS-Net [Kor]
      • Frustratingly Simple Few Shot Object Detection [Eng]
      • DCGAN [Kor]
      • RealSR [Kor]
      • AMP [Kor]
      • AMP [Eng]
      • RCNN [Kor]
      • MobileNet [Eng]
  • Author's note
    • [2022 Spring] Author's note
      • Pop-Out Motion [Kor]
    • [2021 Fall] Author's note
      • Standardized Max Logits [Eng]
      • Standardized Max Logits [Kor]
  • Dive into implementation
    • [2022 Spring] Implementation
      • Supervised Contrastive Replay [Kor]
      • Pose Recognition with Cascade Transformers [Eng]
    • [2021 Fall] Implementation
      • Diversity Input Method [Kor]
        • Source code
      • Diversity Input Method [Eng]
        • Source code
  • Contributors
    • [2022 Fall] Contributors
    • [2021 Fall] Contributors
  • How to contribute?
    • (Template) Paper review [Language]
    • (Template) Author's note [Language]
    • (Template) Implementation [Language]
  • KAIST AI
Powered by GitBook
On this page
  • 1. Problem definition
  • 2. Motivation
  • 3. Method
  • 4. Experiment & Result
  • 5. Conclusion
  • Reference & Additional materials

Was this helpful?

  1. Paper review
  2. [2022 Spring] Paper review

Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Eng]

PreviousChaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Kor]NextPatch Cratf : Video Denoising by Deep Modeling and Patch Matching [Eng]

Last updated 3 years ago

Was this helpful?

1. Problem definition

  • Only the retina can observe the cardiovascular system non-invasively.

  • Through this, information such as the development of cardiovascular disease and the change in shape of microvascular vessels can be obtained.

  • This is an important indicator for ophthalmology diagnosis.

  • In this work, the U-Net 1 (+ Residual U-net 2) model is used to distinguish the pixels corresponding to the vessel from the retina image.(Image Segmentation)

  • of blood vessels are obtained from this.

    ➔ Compared to the previous methods, the trade-off between training time and performance was minimized

U-Net : A fully-convolutional network-based model of the proposed end-to-end method for image segmentation in the biomedical field.


2. Motivation

Related work

Currently, most image segmentation algorithms are based on CNN.

  • [Cai et al., 2016] CNN is also used by VGG net, which we are familiar with.

  • [Dasgupta et al., 2017] CNN과 A multi-label inference task is performed by combining CNN and structured prediction.

  • [Alom et al., 2018] Residual blocks were introduced and supplemented with recurrent residual Convolution Layer.

  • [Zhuang et al., 2019] Double stacked U-nets increase the path of residual blocks.

  • [Khanal et al., 2019] Background pixels and blood vessels were balanced using stochastic weights, and ambiguous pixels were balanced with reduced network once more.

Idea

This study proposes a structure which the U-Net1 and U-Net2 with residual blocks are linked.

  • The first part (U-Net1) extracts features

  • The second part (U-Net2 with residual blocks) recognizes new features and ambiguous pixels from residual blocks.

3. Method

The workflow of this study is as follows.

  1. Image acquisition

  • Collect retinal pictures.

  1. Pre-processing

  • Feature extraction, pattern highlighting, normalization, and etc.

  • Select characteristics that will be applied to the CNN architecture.

  1. Weight adjustment and performance evaluation

  • The plan is designed to be ongoing for the greatest outcomes.

  1. Interpretation of the results

1. Pre-Processing

The quality of the image can be improved through preprocessing, which is a very important step for CNN to detect specific characteristics.

Step1

Converts RGB images into gray scale images. This increases the contrast between blood vessels and backgrounds.

The formula for this is as follows:

Step2

This is the normalization stage of the data. This step is very useful for classification algorithms and especially for backpropagation neural network structures. A decrease in training time may be predicted if the values taken from each training data set are normalized.

In this study, two normalization methods were used.

  1. Min-Max normalization

  • The lowest value for all features is 0, the maximum value is 1, and all other values are converted to values between 0 and 1. If a characteristic has a minimum value of 20 and a maximum value of 40, for example, 30 is changed to 0.5 since it is midpoint between the two. It has the ability to linearly transform input data while preserving the original value.

If the minimum-maximum normalization is performed for the value v, the following equation can be used.

  • v': is a normalized value

  • v: Original value

  • A: Attribute value (here is the brightness of each channel). 0 is the darkest, 255 is the brightest)

  • MAXA: Largest brightness value in input data (image)

  • MINA: Smallest brightness value in input data (image)

  1. Z-Score Normalization

  • Z-score normalization is a data normalization strategy that reduce the effect of outliers. If the value of the feature matches the average, it will be normalized to 0, but if it is smaller than the average, it will be negative, and if it is larger than the average, it will be positive. The magnitude of the negative and positive numbers calculated at this time is determined by the standard deviation of the feature. So if the standard deviation of the data is large (the value is spread widely), the normalized value approaches zero. It is possible to effectively process outliers compared to maximum-minimum normalization.

  • σA: standard deviation

  • A′: the average value of A

Step3

The third step is to apply "Contrast Limited Adaptive Histogram Equalization (CLAHE), an effective way to uniformly improve the details of gray scale retina images.

  • Images with histogram values concentrated in certain areas have low contrast and can be considered bad quality images

  • When the histogram is uniformly dispersed across the entire image, it is a good image. The task of distributing distributions concentrated in a particular area evenly is called histogram equalization

  • While conventional histogram equalization tasks are difficult to achieve desired results by proceeding with the entire pixel, CLAHE can obtain good quality images because it uniformizes images by dividing images into small blocks of constant size

Step4

The final step is to adjust the brightness through gamma values. This distributes the brightness concentrated in a certain region, preventing potential block to feature extraction.

Patches are extracted from pre-processed images to obtain larger datasets and use them for training configured neural networks. In addition, various flippings are given to these patches to secure additional available data.

2. Architecture

In this study, a double-connected U-Net was used, and a residual network was used for the second part.

U-Net:

The Contracting Path

  • Repeat 3x3 convolutions twice (no padding)

  • Activation function is ReLU

  • 2x2 max-pooling (stride: 2)

  • Doubles the number of channels per down-sampling

Expanding Path extends the feature map with operations opposite to Contracting Path.

The Expanding Path

  • 2x2 convolution ("up-convolution")

  • Repeat 3x3 convolutions twice (no padding)

  • Cut the number of channels by half per Up-sampling with Up-Conv

  • Activation function is ReLU

  • Up-Conv feature map concatenates with a feature map with a cropped border of the Contracting path

  • Operate 1x1 convolution on the last layer With the above configuration, it is a total of 23-Layers Fully Convolutional Networks structure. It should be noted that the size of the final output, Segmentation Map, is smaller than the size of the input image. This is because padding was not used in the convolution operation.

Residual block:*

  • U-Net2 with Residual blocks:

The output of the U-Net network and the input part of the second network are connected. The number of channels and image size of each level remained the same as the decoding portion of the first half. However, both Contracting and Expanding added residual blocks at a new level. And since binary classification is performed in the last Expanded, 1x1 convolution is applied

Most of the pixels in that image are background and only a few represent vascular structures (class unbalance). For this reason, the loss function is used and the equation is shown below.

where the weight w varies randomly between 1 and α values, and s is a step. This dynamic weight change prevents the network from falling to a local minimum. To obtain the log probability, the LogSoftmax function is applied to the last layer of the neural network..

4. Experiment & Result

Dataset

  1. DRIVE

  • Each image resolution is 584*565 pixels with eight bits per color channel (3 channels).

  • 20 images for training set

  • 20 images for testing set

  1. CHASEDB

  • Each image resolution is 999*960 pixels with eight bits per color channel (3 channels).

Evaluation metric

The retinal image shows an unbalance in classes, so the suitable metric should be selected. Researchers adopts Recall, precision, F1-score, and accuracy.


Results

1. Performance

  • Based on the above metrics, performance is compared with previous studies

  • Both Precision and Recall have high values due to the high value of F1-Score

    • Suitable for vascular classification

  • Accuracy showed a high figure and 2nd highest result for F1-Score

  • In most cases, this study was consistent with ground truth, and FP and FN were also small

2. Training time

  • This architecture saves a lot of time compared to [Khanal et al.]

    • Approximately 1 hour faster for DRIVE dataset

    • Approximately 10 hours for the CHASEDB dataset

3. segmentation and The structural similarity index(SSIM)

The structural similarity index analyzes the viewing distance and edge information between gtound truth and test images. This is measured by quantifying the degradation of image quality (used for image compression), which has a value of 0 to 1, and the higher the quality, the better. Figure 6 compares U-Net1 with ground truth, and Figure 7 compares the entire architecture (U-Net1 + U-Net2 with residential block) with ground truth. The latter has a higher figure.

4. Factors that reduce segmentation performance

If you look at the blue circle, you can see that the blood vessels are relatively chunked. It is an important problem in image segmentation, and it can be seen that the above is well distinguished.

  • Avoid the lesion well?

➔ Quantified indicators are needed.


5. Conclusion

  1. The nobelty of this study

  • The first is the addition of residual blocks to the U-Net1 network. This has greatly contributed to mitigating degradation of the image.

  • Second, the information obtained from the previous U-Net1 is linked to the residual blocks of the later U-Net(U-Net with residual blocks) to minimize the information loss.

  1. This study achieved both performance and training time.

  • shows similar performance to previous studies

  • It can be significant that the training time has been greatly reduced.

  1. Image pre-processing process

  • The gray scale conversion, normalization, CLAHE, and gamma adjustments are used to create a high-quality input image

  • Patch the original image to augment and secure the data

Take home message

High-accuracy image segmentation requires effort and time consuming. In this paper, the previously proposed architectures were well utilized, and through this, the results could be obtained with a short training time. In addition, high-quality in-put images could be obtained through image pre-processing. As a result, it seems that the trade-off between training time and performance could be minimized.


Author

Korean Name (English name)

Reviewer

Reference & Additional materials

  1. [Original Paper] G. Alfonso Francia, C. Pedraza, M. Aceves and S. Tovar-Arriaga, "Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation," in IEEE Access, vol. 8, pp. 38493-38500, 2020

  2. [Blog] https://medium.com/@msmapark2/u-net-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0-u-net-convolutional-networks-for-biomedical-image-segmentation-456d6901b28a

Link:

Here, RGB is a channel of an image, respectively. In the above equation, G (Green) is emphasized the most and thought to be the least noisy and contains all of the image's features.

image
image

Link:

The image obtained through pre-processing is as follows

consists of a symmetrical network for obtaining the overall context information of the image and a network for accurate localization.

Residual blocks have also been proposed to solve the degradation problem. where FM(x) is a feature map expected from applying two convolutional layers to input features expressed as F(x), and the original input x is added to this transformation. Adding the original feature map alleviates the degradation problem that appears in the model. Below are the processes used in this work.

image
image
image

This function maximizes the overall probability of the data, by giving a high loss value when classification is wrong or unclear and a low loss value when prediction matches the expected by the model. The logarithm performs the penalizing part, the lower the probability, the greater the logarithm. Since these probabilities have values between zero and one, and the logarithms in that range are negative, the negative sign is used to convert them into positive values. To handle the problem of class unbalance, the weight attribute is provided, and each class is assigned both the prediction and the reference.

Recall: tells us how many relevant samples are selected.

Precision: tells us how many predicted samples are relevant.

F1-Score: is the harmonic mean between recall and precision.

Accuracy: measures how many observations, both positive and negative, were correctly classified.

Segmentation Results for Drive and CHASEDB dataset

The structural similarity index (SSIM) is introduced to evaluate the segmentation process to compare the first step with U-Net1 only and the second part with the addition of residual blocks.

Chunk

The DRIVE dataset has seven images containing lesion region, which can be mistaken for blood vessels and segmented. In the above Figure, it seems that it was well performed avoiding the lesion area (c).

U-net
CLAHE
U-Net
Morphological features
image
image
image
image
image
image
image