📝
Awesome reviews
  • Welcome
  • Paper review
    • [2022 Spring] Paper review
      • RobustNet [Eng]
      • DPT [Kor]
      • DALL-E [Kor]
      • VRT: A Video Restoration Transformer [Kor]
      • Barbershop [Kor]
      • Barbershop [Eng]
      • REFICS [ENG]
      • Deep texture manifold [Kor]
      • SlowFast Networks [Kor]
      • SCAN [Eng]
      • DPT [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Eng]
      • Patch Cratf : Video Denoising by Deep Modeling and Patch Matching [Eng]
      • LAFITE: Towards Language-Free Training for Text-to-Image Generation [Kor]
      • RegSeg [Eng]
      • D-NeRF [Eng]
      • SimCLR [Kor]
      • LabOR [Kor]
      • LabOR [Eng]
      • SegFormer [Kor]
      • Self-Calibrating Neural Radiance Fields [Kor]
      • Self-Calibrating Neural Radiance Fields [Eng]
      • GIRAFFE [Kor]
      • GIRAFFE [Eng]
      • DistConv [Kor]
      • SCAN [Eng]
      • slowfastnetworks [Kor]
      • Nesterov and Scale-Invariant Attack [Kor]
      • OutlierExposure [Eng]
      • TSNs [Kor]
      • TSNs [Eng]
      • Improving the Transferability of Adversarial Samples With Adversarial Transformations [Kor]
      • VOS: OOD detection by Virtual Outlier Synthesis [Kor]
      • MultitaskNeuralProcess [Kor]
      • RSLAD [Eng]
      • Deep Learning for 3D Point Cloud Understanding: A Survey [Eng]
      • BEIT [Kor]
      • Divergence-aware Federated Self-Supervised Learning [Eng]
      • NeRF-W [Kor]
      • Learning Multi-Scale Photo Exposure Correction [Eng]
      • ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [Eng]
      • ViT [Eng]
      • CrossTransformer [Kor]
      • NeRF [Kor]
      • RegNeRF [Kor]
      • Image Inpainting with External-internal Learning and Monochromic Bottleneck [Eng]
      • CLIP-NeRF [Kor]
      • CLIP-NeRF [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Kor]
      • DatasetGAN [Eng]
      • MOS [Kor]
      • MOS [Eng]
      • PlaNet [Eng]
      • MAE [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Eng]
      • Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning [Kor]
      • PointNet [Kor]
      • PointNet [Eng]
      • MSD AT [Kor]
      • MM-TTA [Kor]
      • MM-TTA [Eng]
      • M-CAM [Eng]
      • MipNerF [Kor]
      • The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [Eng]
      • Calibration [Eng]
      • CenterPoint [Kor]
      • YOLOX [Kor]
    • [2021 Fall] Paper review
      • DenseNet [Kor]
      • Time series as image [Kor]
      • mem3d [Kor]
      • GraSP [Kor]
      • DRLN [Kor]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Eng]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Kor]
      • NeSyXIL [Kor]
      • NeSyXIL [Eng]
      • RCAN [Kor]
      • RCAN [Eng]
      • MI-AOD [Kor]
      • MI-AOD [Eng]
      • DAFAS [Eng]
      • HyperGAN [Eng]
      • HyperGAN [Kor]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Eng]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Kor]
      • UPFlow [Eng]
      • GFP-GAN [Kor]
      • Federated Contrastive Learning [Kor]
      • Federated Contrastive Learning [Eng]
      • BGNN [Kor]
      • LP-KPN [Kor]
      • Feature Disruptive Attack [Kor]
      • Representative Interpretations [Kor]
      • Representative Interpretations [Eng]
      • Neural Discrete Representation Learning [KOR]
      • Neural Discrete Representation Learning [ENG]
      • Video Frame Interpolation via Adaptive Convolution [Kor]
      • Separation of hand motion and pose [kor]
      • pixelNeRF [Kor]
      • pixelNeRF [Eng]
      • SRResNet and SRGAN [Eng]
      • MZSR [Kor]
      • SANforSISR [Kor]
      • IPT [Kor]
      • Swin Transformer [kor]
      • CNN Cascade for Face Detection [Kor]
      • CapsNet [Kor]
      • Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Kor]
      • CSRNet [Kor]
      • ScrabbleGAN [Kor]
      • CenterTrack [Kor]
      • CenterTrack [Eng]
      • STSN [Kor]
      • STSN [Eng]
      • VL-BERT:Visual-Linguistic BERT [Kor]
      • VL-BERT:Visual-Linguistic BERT [Eng]
      • Squeeze-and-Attention Networks for Semantic segmentation [Kor]
      • Shot in the dark [Kor]
      • Noise2Self [Kor]
      • Noise2Self [Eng]
      • Dynamic Head [Kor]
      • PSPNet [Kor]
      • PSPNet [Eng]
      • CUT [Kor]
      • CLIP [Eng]
      • Local Implicit Image Function [Kor]
      • Local Implicit Image Function [Eng]
      • MetaAugment [Eng]
      • Show, Attend and Tell [Kor]
      • Transformer [Kor]
      • DETR [Eng]
      • Multimodal Versatile Network [Eng]
      • Multimodal Versatile Network [Kor]
      • BlockDrop [Kor]
      • MDETR [Kor]
      • MDETR [Eng]
      • FSCE [Kor]
      • waveletSR [Kor]
      • DAN-net [Eng]
      • Boosting Monocular Depth Estimation [Eng]
      • Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow [Kor]
      • Syn2real-generalization [Kor]
      • Syn2real-generalization [Eng]
      • GPS-Net [Kor]
      • Frustratingly Simple Few Shot Object Detection [Eng]
      • DCGAN [Kor]
      • RealSR [Kor]
      • AMP [Kor]
      • AMP [Eng]
      • RCNN [Kor]
      • MobileNet [Eng]
  • Author's note
    • [2022 Spring] Author's note
      • Pop-Out Motion [Kor]
    • [2021 Fall] Author's note
      • Standardized Max Logits [Eng]
      • Standardized Max Logits [Kor]
  • Dive into implementation
    • [2022 Spring] Implementation
      • Supervised Contrastive Replay [Kor]
      • Pose Recognition with Cascade Transformers [Eng]
    • [2021 Fall] Implementation
      • Diversity Input Method [Kor]
        • Source code
      • Diversity Input Method [Eng]
        • Source code
  • Contributors
    • [2022 Fall] Contributors
    • [2021 Fall] Contributors
  • How to contribute?
    • (Template) Paper review [Language]
    • (Template) Author's note [Language]
    • (Template) Implementation [Language]
  • KAIST AI
Powered by GitBook
On this page
  • 1. Problem definition
  • 2. Motivation
  • Related work
  • Idea
  • 3. Method
  • Adaptive scaling
  • Sinogram Domain Network
  • Image Domain Net
  • 4. Experiment & Result
  • Experimental setup
  • Result
  • 5. Conclusion
  • Take home message
  • Author / Reviewer information
  • Author
  • Reviewer
  • Reference & Additional materials

Was this helpful?

  1. Paper review
  2. [2021 Fall] Paper review

DAN-net [Eng]

Tao Wang et. al / Dual-Domain Adaptive-Scaling Non-Local Network for CT Metal Artifact Reduction / MICCAI 2021

PreviouswaveletSR [Kor]NextBoosting Monocular Depth Estimation [Eng]

Last updated 3 years ago

Was this helpful?

1. Problem definition

First, let's see, how computed tomography (CT) images are created. The X-ray simply rotates around the body of a patient. The images are reconstructed from projection data which are the measured attenuation of X-ray intensity after passing through the body at multiple angles (Fig. 1, left). By sorting projection data, so-called sinograms can be derived (Fig. 1, middle). Sinograms represent the raw data necessary for the reconstruction of transverse images by means of an image reconstruction algorithm suited for CT, e.g. filtered back-projection (FBP) (Fig. 1, right).

Due to metallic implants, the reconstructed CT images can be contaminated by heavy artifacts which degrade the imaging quality and severely compromise doctors’ diagnoses. In particular, some artifacts and certain lesions can appear similarly, leading to misdiagnosis, and subsequent medical image analysis can be difficult.

The attenuation of X-ray intensity after passing through an object can be described with a constant called linear attenuation coefficient. Assuming that the attenuation distribution is 2D, the linear attenuation coefficients with metallic implants can be expressed as follows: μ(E)=μt(E)⊙(1−M)+μm(E)⊙M\mu(E)=\mu_t(E) \odot (1-M) + \mu_m(E) \odot Mμ(E)=μt​(E)⊙(1−M)+μm​(E)⊙M where μt(E)\mu_t(E)μt​(E) and μm(E)\mu_m(E)μm​(E) represent attenuation images of tissue and metal parts respectively; MMM denote the metal mask; and ⊙\odot⊙ is the elementwise multiplication. Then, the projection data SmaS_{ma}Sma​ can be calculated as follows Sma=−ln∫η(E)exp(−P(μ(E)))dE=−ln∫η(E)exp(−P(μt(E)⊙(1−M)))dE−ln∫η(E)exp(−P(μm(E)⊙M)dE=Stissue+SmetalS_{ma} = -ln \int \eta(E)exp(-\mathcal{P}(\mu(E)))dE = -ln \int \eta(E)exp(-\mathcal{P}(\mu_t(E)\odot (1-M)))dE -ln \int \eta(E)exp(-\mathcal{P}(\mu_m(E) \odot M)dE = S_{tissue} + S_{metal}Sma​=−ln∫η(E)exp(−P(μ(E)))dE=−ln∫η(E)exp(−P(μt​(E)⊙(1−M)))dE−ln∫η(E)exp(−P(μm​(E)⊙M)dE=Stissue​+Smetal​ where P\mathcal{P}P is the forward projection operation and η(E)\eta(E)η(E) denotes the intensity distribution with spectral energy at EEE. Thus, SmaS_{ma}Sma​ are contributed by the attenuation of tissues and metal objects denoted as StissueS_{tissue}Stissue​ and SmetalS_{metal}Smetal​, respectively. SLIS_{LI}SLI​ is regarded as linear interpolation corrected sinogram which is a rough estimation and attempt to restore the valuable residual information from Ssub=Sma−SLIS_{sub}=S_{ma}−S_{LI}Ssub​=Sma​−SLI​.

In this paper, they proposed DAN-Net consisting of three components: adaptive scaling, sinogram-domain network and image-domain network.

2. Motivation

Related work

During the past several decades, numerous metal artifact reduction (MAR) methods have been proposed.

  1. Conventional MAR methods can be grouped into three categories:

    • projection completion methods - treat projection data in the metal trace region as missing and fill in lost data with estimated values by different image inpainting methods or interpolation strategies. However, the metal trace boundaries may not be smooth.

    • iterative reconstruction methods - constrained optimization is used, but these methods are usually time-consuming and require manually well-designed regularizers.

    • image postprocessing methods - reduce metal artifacts in the image domain but usually cannot suppress the artifacts well and are apt to distort the anatomic structure.

  2. Deep-learning (DL)-based MAR methods include:

    • recovering the missing data in the metal trace.

    • reducing metal artifacts in the image domain.

    • end-to-end dual-domain networks - the mainstream for MAR. There are a few problems in these methods which need to be addressed. Firstly, the data in the metal trace have a much higher amplitude than the data outside the metal trace and they can be regarded as following two different data distributions. It is difficult for neural networks to transform two different data distributions into a uniform distribution. We prefer to have a uniform distribution of the data because then the related grayscale images have better contrast. Also, uniform distributions of the data usually result in domain invariant performance. In addition, the change in the metal trace boundaries will cause weak continuity of the first derivative of projection data in a certain section, which will be further expanded by filtering and will generate extra artifacts. Finally, metal artifacts are non-local in the image domain, which may fail to be reduced well by a convolution operation.

Idea

Therefore, a dual-domain Adaptive-scaling Non-local Network (DAN-Net) for MAR was proposed. There are three distinct contributions:

  1. The original sinogram is preprocessed using adaptive scaling. Then the scaled metal projections and corresponding preprocessed CT images which were reconstructed by filtered back-projection, are taken as the inputs for dual-domain networks. This new approach can preliminarily suppress metal artifacts and maintain tissue details.

  2. A novel residual sinogram learning strategy is proposed to avoid transforming two different data distributions into a uniform one and to improve the smoothness of the corrected projection.

  3. A non-local U-Net architecture is designed for image-domain enhancement, which can capture long-range dependencies of metal artifacts and further improve the image quality.

3. Method

In Fig. 2, we can see the overview of the proposed DAN-Net which consists of the following three components.

Adaptive scaling

Linear attenuation operation was used to restore the data in the metal trace as Spre=SLI+λSsubS_{pre} = S_{LI}+ \lambda S_{sub}Spre​=SLI​+λSsub​, where λ\lambdaλ is the scaling parameter to control the trade-off between artifact reduction and detail preservation. SpreS_{pre}Spre​ represents the corrected projection after adaptive scaling. Parameter λ=0.4\lambda = 0.4λ=0.4 was chosen experimentally. The corresponding adaptively scaled CT image is obtained as Xpre=P−1(Spre)X_{pre} = \mathcal{P^{-1}}(S_{pre})Xpre​=P−1(Spre​) where P−1\mathcal{P^{-1}}P−1 denotes FBP operation.

Sinogram Domain Network

SresS_{res}Sres​ is taken as input to neural network GsinoG_{sino}Gsino​ which used U-Net as the backbone and the mask pyramid network (MPN) to retain sufficient information of metal projection. Since the main goal is to retrieve information within the metal trace, the adaptively scaled residual sinogram is refined only in the metal trace. The corrected sinogram can be written as Ssino=Gsino(Sres,Mp)⊙Mt+SLIS_{sino} = G_{sino}(S_{res}, M_p) \odot M_t + S_{LI}Ssino​=Gsino​(Sres​,Mp​)⊙Mt​+SLI​. Also, Lsino\mathcal{L_{sino}}Lsino​ is adopted to measure the differences between SsinoS_{sino}Ssino​ and the ground truth SgtS_{gt}Sgt​ as Lsino=∥(Ssino−Sgt)⊙Mt∥1\mathcal{L_{sino}} = \| (S_{sino} - S_{gt}) \odot M_t\|_1Lsino​=∥(Ssino​−Sgt​)⊙Mt​∥1​. Then, Xsino=P−1(Ssino)X_{sino} = \mathcal{P^{-1}}(S_{sino})Xsino​=P−1(Ssino​) can be obtained using an analytical reconstruction layer. To alleviate the secondary artifacts in the reconstructed CT image, the reconstruction loss LFBP\mathcal{L_{FBP}}LFBP​ between XsinoX_{sino}Xsino​ and the ground truth image XgtX_{gt}Xgt​ is utilized as LFBP=∥(Xsino−Xgt)⊙(1−M)∥1\mathcal{L_{FBP}} = \| (X_{sino} - X_{gt}) \odot (1-M) \|_1LFBP​=∥(Xsino​−Xgt​)⊙(1−M)∥1​.

Image Domain Net

To enhance the reconstructed CT images another U-Net was used. For computational efficiency, we halve the channel numbers because convolution is a local operator whose receptive field is limited by the size of filters, but metal artifacts are non-local therefore, convolution-based postprocessing methods may fail to remove the artifacts well. For this problem, the non-local layer is added to the network GimG_{im}Gim​ after the second and third down-sampling steps, as seen in the image above. To focus on the artifact-impacted regions, XsinoX_{sino}Xsino​ and XpreX_{pre}Xpre​ are concatenated as inputs of GimG_{im}Gim​. Additionally a residual learning strategy is adopted: Xim=Xpre+Gim(Xsino,Xpre)X_{im} = X_{pre} + G_{im}(X_{sino},X_{pre})Xim​=Xpre​+Gim​(Xsino​,Xpre​). GimG_{im}Gim​ is also optimized with L1 norm in the image domain Lim=∥(Xim−Xgt)⊙(1−M)∥1\mathcal{L_{im}} = \| (X_{im} - X_{gt}) \odot (1-M) \|_1Lim​=∥(Xim​−Xgt​)⊙(1−M)∥1​.

In summary, the total objective function is L=Lsino+αLFBP+βLim\mathcal{L} = \mathcal{L_{sino} + \alpha \mathcal{L_{FBP}} + \beta \mathcal{L_{im}}}L=Lsino​+αLFBP​+βLim​, where α\alphaα and β\betaβ are the weighting parameters of different components which were empirically set α=β=1\alpha = \beta = 1α=β=1.

4. Experiment & Result

Experimental setup

In this paper, they used the DeepLesion dataset for which they generated metal masks from [3]. They randomly selected 1000 CT images from the DeepLesion dataset and 90 metal masks to synthesize 90,000 combinations for the training set. For evaluation remaining 200 CT images and 10 masks were used. The original CT images were resized to 256×256 for computational efficiency. To simulate Poisson noise, a polychromatic X-ray source was employed, and the incident beam X-ray was set to 2×107 photons [4]. The partial volume effects and scatter were also taken into consideration. Without loss of generality, the experiments were restricted to 2D parallel-beam geometry. For the sampling condition, 367 detector bins and 361 sampling views were assumed to be uniformly distributed from 0∘0^{\circ}0∘ to 180∘180^{\circ}180∘. The CT values were truncated to [0, 4095] for a better approach to the real situation.

The results were compared for the newly proposed DAN-Net and other state-of-art-methods (LI, NMAR, CNNMAR, DuDoNet and ADN) quantitatively with structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) and qualitatively using two representative metallic implants with different sizes.

Result

In the following table, we can see that DAN-Net raises the scores in comparison to the other methods by suppressing the metal artifacts.

Methods
Uncorrected
LI
NMAR
CNNMAR
DuDoNet
ADN
DAN-Net

PSNR

15.33

30.74

30.83

32.15

36.82

33.60

40.61

SSIM

0.6673

0.9224

0.9270

0.9508

0.9777

0.9275

0.9872

For qualitative comparison, we can look at Fig. 3. (A1)&(A2)- (H1)&(H2) denote: Reference images, Uncorrected, LI, NMAR, CNNMAR, DuDoNet, ADN and DAN-Net.

LI and NMAR contain some radial artifacts, while DL-based methods perform better. When metal objects get larger, LI and NMAR introduce obvious new artifacts (marked as C2 and D2). In the second case, other methods fail to preserve the details around metallic implants, while DAN-Net maintains these structural details more completely. In this paper they have also tested the proposed method on clinical data and the same trend could be observed: DAN-Net suppresses most of the metal artifacts and preserves the fine-grained anatomical structures around the metals, which supplies coherent results to the simulated data and demonstrates the potential for real clinical application.

The following table shows the effect of using sinogram-domain residual learning, image-domain non-local module and adaptive scaling and as we can see from the results, the final module of the proposed DAN-Net achieves the best results.

Methods
Sino-Net
Res-Sino-Net
IM-Net
Non-local-IM-Net
Ma-Dual-Net
DAN-Net

PSNR

31.43

31.71

33.79

34.75

34.15

40.61

SSIM

0.9232

0.9494

0.9520

0.9720

0.9597

0.9872

5. Conclusion

DAN-Net was proposed for MAR in CT images. Thanks to adaptive scaling, artifacts can be reduced first with better preservation of tissue details. A residual sinogram learning strategy and non-local module are used additionally to enhance the CT image quality. Various experiments were done to prove the better performance of DAN-Net in comparison with several state-of-the-art MAR methods in both qualitative and quantitative aspects. However, the method is shown for 256×256 images, but instead, 512×512 images are mostly used in clinical practice, so I would like to see how this method performs on larger images.

Take home message

Metal artifacts are non-local therefore, convolution-based postprocessing methods may fail to remove the artifacts well.

Adaptive scaling is used to retrieve more structural details around the metallic implants.

It is difficult to convert two different data distributions to a unified distribution for normal networks which in this case is caused to the higher attenuation coefficient of metal.

Author / Reviewer information

Author

Povolna Anna

  • KAIST Bio and Brain Engineering

Reviewer

  1. ...

  2. ...

  3. ...

Reference & Additional materials

  1. Zhang, Y., Yu, H.: Convolutional neural network based metal artifact reduction in x-ray computed tomography. IEEE Trans. Med. Imaging 37(6), 1370–1381 (2018)

  2. Tang, S., et al.: Application of projection simulation based on physical imaging model to the evaluation of beam hardening corrections in X-ray transmission tomography. J. X-ray Sci. Technol. 16(2), 95–117 (2008)

  3. W. Stiller, “Basics of iterative Reconstruction Methods in Computed Tomography: A vendor-independent overview,” European Journal of Radiology, vol. 109, pp. 147–154, 2018.

LinkedIn
GitHub
Dual-Domain Adaptive-Scaling Non-local Network for CT Metal Artifact Reduction
DAN-Net GitHub