Adversarial Attack is a technique that induces incorrect prediction of the model by intentionally adding noise to the image as shown in the figure. Adversarial attacks are classified into targeted attacks and non-targeted attacks. Targeted attack is an attack that induces the prediction of the target model into a specific class. And non-targeted is and attack that does not induce, but simply mispredicts.
A white box attack that can access the model to be attacked, also can access the weight of the model, so it is possible to obtain the gradient of the loss function for the input image. This gradient is used to create an adversarial image.
✔Transfer-Based Adversarial-Attack
If the model you want to attack is inaccessible, you should try transfer-based adversarial attack using transferability of your adversarial image. This is an adversarial image created by a white box attack on the source model, also attacks the target model. Therefore, in order to improve the transfer-based adversarial attack success rate, it is very important to prevent overfitting phenomenon in which the adversarial image depends on the source model and shows high performance only in the source model.
Diversity Input Method(DIM) generates a adversarial image using an image that has undergone random resizing and random padding as input to the model. This is based on the assumption that a adversarial image should act adversarially even if its size and location change. This prevents adversarial images from overfitting the source model, maintaining adversity across multiple models.
2. Method
Diversity Input Method(DIM)✨
The core idea of the Diversity Input Method(DIM) is to avoid the dependence of the adversarial image on the source model by using the slope of the transformed image with randomly resizing and random padding. This tranform process will be called DI transform. The image below compares the original image with the image after DI transform.
The implementation of the DI transformation in this paper is as follows:
random padding : Randomly pad the image to the top, bottom, left, and right so that it is 330 × 330 × 3
In this paper, TensorFlow is used, and the image size is fixed to 330 × 330 × 3 after DI transform. (After that, the image size is converted again according to the model input size.) I use PyTorch to maintain the process of random resizing and random padding of the paper, but change the image size after DI transform as original image. In this way, it does not have to go through the post-processing process.
DI transform has the advantage that it can be used with known transfer-based adversarial attacks (I-FGSM, MI-FGSM). In the case of attacking using the I-FGSM attack technique with DI transform, it will be referred to as DI-FGSM. In the related work below, I will also introduce each attack method.
Related work✨
1) Iterative Fast Gradient Sign Method (I-FGSM)
The fast gradient sign method (FGSM) changes each pixel of X by ε in the direction of increasing loss function L(X,y(true)) for the input image X and the real class y(true), to create a hostile image X^{ adv}.
Xadv=X+ε⋅sign(∇XL(X,ytrue)).
iterative Fast Gradient Sign Method (I-FGSM) is that repeatedly executes an FGSM attack that changes each pixel by α.
As a method of preventing overfitting to the source model, there is a method using momentum (MI-FGSM). MI-FGSM is iteratively performed like I-FGSM, and it accumulates gradient (gt) information from the beginning to the present and uses it for adversarial image update. The difference is that the sign of gt is used for update, not the sign of the loss function.
Accumulating gradients helps not to fall into a poor local maxima, and it is stable because the direction of the repeatedly updated adversarial change is similar to that of I-FGSM. Therefore, MI-FGSM shows better transferability than I-FGSM.
3. Implementation
Use Python language, version >= 3.6
Using PyTorch in the code implementation process
Use manual seed : used to fix randomness (included in the example code below)
🔨 Environment
The environment (env_di-fgsm.yml) required in the process of implementing the DI transform was created as a yml file. You can use the Anaconda virtual environment and set the environment by entering the following command.
# Environment setup using condacondaenvcreate-fenv_di-fgsm.yml
📋DI-FGSM
In this file, DI-FGSM is implemented. I used _**comments **_to explain the overall code. The size of tensors is shown as an example based on the CIFAR-10 image (size: 32, 32) which used in the example file (Transfer Attack.py) to be introduced below.
The diverse_input function part in class DIFGSM is the core part of DI-FGSM. Random resizing and Random padding parts are implemented. After calling the diverse_input function in the forward function, backpropagation occurs.
## DI-FGSM : DIFGSM.pyimport torchimport torch.nn as nnimport torchvision.transforms as transformsfrom torchvision.transforms import InterpolationModeimport torchgeometry as tgmfrom attack import AttackclassDIFGSM(Attack):def__init__(self,model,eps=8/255,alpha=2/255,steps=20,di_pad_amount=31,di_prob=0.5):super().__init__("DIFGSM", model) self.eps = eps # Maximum change in one pixel for total step (range 0-255) self.steps = steps # number of di-fgsm steps self.alpha = alpha # Maximum change in one pixel for one step (range 0-255) self.di_pad_amount = di_pad_amount # Maximum value that can be padded self.di_prob = di_prob # Probability of deciding whether to apply DI transform or not self._supported_mode = ['default','targeted'] # deciding targeted attack or notdefdiverse_input(self,x_adv): x_di = x_adv # size : [24,3,32,32] h, w = x_di.shape[2], x_di.shape[3]# original image size, h: 32, w: 32# random value that be padded pad_max = self.di_pad_amount -int(torch.rand(1) * self.di_pad_amount)# pad_max : 2# random value that be padded left pad_left =int(torch.rand(1) * pad_max)# pad_left : 1# random value that be padded right pad_right = pad_max - pad_left # pad_right : 1# random value that be padded top pad_top =int(torch.rand(1) * pad_max)# pad_top : 1# random value that be padded bottom pad_bottom = pad_max - pad_top # pad_bottom : 1# four vertices of the original image# tensor([[[ 0., 0.], [31., 0.], [31., 31.], [ 0., 31.]]]) points_src = torch.FloatTensor([[ [0, 0], [w -1, 0], [w -1+0, h -1+0], [0, h -1+0], ]])# four vertices of the image after DI transform# tensor([[[ 1., 1.], [30., 1.], [30., 30.], [ 1., 30.]]]) points_dst = torch.FloatTensor([[ [pad_left, pad_top], [w - pad_right -1, pad_top], [w - pad_right -1, h - pad_bottom -1], [pad_left, h - pad_bottom -1], ]])# Matrix used in the transformation process# tensor([[[0.9355, 0.0000, 1.0000], [0.0000, 0.9355, 1.0000], [0.0000, 0.0000, 1.0000]]]) M = tgm.get_perspective_transform(points_src, points_dst)# The image is resized and padded so that the vertices of the original image go to the new vertices. x_di = tgm.warp_perspective(x_di, torch.cat(x_di.shape[0] * [M]).cuda(), dsize=(w, h)).cuda() x_di = transforms.Resize((w, h), interpolation=InterpolationMode.NEAREST)(x_di)# If the random value is less than or equal to di_prob, di conversion does not occur. cond = torch.rand(x_adv.shape[0])< self.di_prob cond = cond.unsqueeze(-1).unsqueeze(-1).unsqueeze(-1) x_di = torch.where(cond.cuda(), x_di, x_adv)return x_didefforward(self,images,labels):""" Overridden. """ images = images.clone().detach().to(self.device) labels = labels.clone().detach().to(self.device)if self._targeted:# targeted attack case, get target label target_labels = self._get_target_label(images, labels) loss = nn.CrossEntropyLoss()# use Cross-Entropy loss for classification adv_images = images.clone().detach()for _ inrange(self.steps): adv_images.requires_grad =True outputs = self.model(self.diverse_input(adv_images))# after DI transform image# Calculate lossif self._targeted: cost =-loss(outputs, target_labels)# targeted attack case, use -loss functionelse: cost =loss(outputs, labels)# else, (untargeted attack case), use +loss function# Update adversarial images grad = torch.autograd.grad(cost, adv_images, retain_graph=False, create_graph=False)[0] grad = grad / torch.mean(torch.abs(grad), dim=(1,2,3), keepdim=True) adv_images = adv_images.detach()+ self.alpha*grad.sign()# I-fgsm step delta = torch.clamp(adv_images - images, min=-self.eps, max=self.eps)# limiting changes beyond epsilon adv_images = torch.clamp(images + delta, min=0, max=1).detach()return adv_images
📋Example code
In Transfer Attack.py, I tested the performance of Transfer Attack using DI-FGSM.
##3 : This part indicate the attack process and result of the source model. You can specify an attack as atk = DIFGSM(model, eps=16 / 255, alpha=2 / 255, steps=10, di_pad_amount=5).
##5, ##6: Shows the clean accuracy tested on the target model with the validation set, and the robust accuracy tested with the adversarial image created in ##3.
The clean accuracy tested with the validation set on the target model is 87.26 %, which shows relatively high comparative performance.
On the other hand, the robust accuracy of testing the target model performance with an adversarial image made with DI-FGSM through the source model showed low performance at 38.87%, indicating that it is a successful transfer-based adversarial attack.