Train Image Classification Network Robust to Adversarial Examples

This example uses:

This example shows how to train a neural network that is robust to adversarial examples using fast gradient sign method (FGSM) adversarial training.

Neural networks can be susceptible to a phenomenon known as adversarial examples [1], where very small changes to an input can cause it to be misclassified. These changes are often imperceptible to humans.

Techniques for creating adversarial examples include the FGSM [2] and the basic iterative method (BIM) [3], also known as projected gradient descent [4].

You can use adversarial training [5] to train networks that are robust to adversarial examples. This example shows how to:

Train an image classification network.
Investigate network robustness by generating adversarial examples.
Train an image classification network that is robust to adversarial examples.

Load Training Data

The digitTrain4DArrayData function loads images of handwritten digits and their labels.

rng("default")
[XTrain,TTrain] = digitTrain4DArrayData;

Extract the class names.

classNames = categories(TTrain);
numObservations = numel(TTrain);

Use the trainingPartitions function to split the data into training and validation sets. This function is attached as a supporting file. To access the function, open the example.

[idxTrain,idxValidation] = trainingPartitions(numObservations,[0.85 0.15]);

XValidation = XTrain(:,:,:,idxValidation);
TValidation = TTrain(idxValidation);

XTrain = XTrain(:,:,:,idxTrain);
TTrain = TTrain(idxTrain);

Construct Network Architecture

Create a convolutional neural network suitable for image classification.

layers = [
    imageInputLayer([28 28 1])

    convolution2dLayer(3,8,Padding="same")
    batchNormalizationLayer
    reluLayer

    maxPooling2dLayer(2,Stride=2)

    convolution2dLayer(3,16,Padding="same")
    batchNormalizationLayer
    reluLayer

    maxPooling2dLayer(2,Stride=2)

    convolution2dLayer(3,32,Padding="same")
    batchNormalizationLayer
    reluLayer

    fullyConnectedLayer(10)
    softmaxLayer];

Specify the training options. Choosing among the options requires empirical analysis. To explore different training option configurations by running experiments, you can use the Experiment Manager app.

optionsTrainNetwork = trainingOptions("sgdm", ...
    InitialLearnRate=0.01, ...
    MaxEpochs=5, ...
    Shuffle="every-epoch", ...
    ValidationData={XValidation,TValidation}, ...
    Plots="training-progress", ...
    Metrics="accuracy", ...
    Verbose=false);

Train the network using the trainnet function.

net = trainnet(XTrain,TTrain,layers,"crossentropy",optionsTrainNetwork);

Test Network

Test the classification accuracy of the network by evaluating network predictions on a test data set. Make predictions using the minibatchpredict function, and convert the classification scores to labels using the scores2label function.

[XTest,TTest] = digitTest4DArrayData; 

scores = minibatchpredict(net,XTest);
YTest = scores2label(scores,classNames);

accuracy = mean(YTest == TTest)

accuracy = 
0.9754

The network accuracy is very high.

Generate Adversarial Examples

Apply adversarial perturbations to the input images and see how doing so affects the network accuracy.

You can generate adversarial examples using techniques such as FGSM. FGSM is a simple technique that takes a single step in the direction of the gradient $\nabla_{X} L (X, T)$ of the loss function $L$ , with respect to the image $X$ you want to find an adversarial example for. The adversarial example is calculated as

$X_{adv} = X + α . sign (\nabla_{X} L (X, T))$ .

Parameter $α$ is the step size and controls how different the adversarial examples look from the original images. In this example, the values of the pixels are between 0 and 1, so an $α$ value of 0.01 alters each individual pixel value by up to 1% of the range. The value of $α$ depends on the image scale. For example, if your image is instead between 0 and 255, you need to multiply this value by 255.

BIM is a simple improvement to FGSM which applies FGSM over multiple iterations and applies a threshold. This method can yield adversarial examples with less distortion than FGSM. For more information about generating adversarial examples, see Generate Untargeted and Targeted Adversarial Examples for Image Classification. Create adversarial examples using the BIM.

Create lower and upper bounds using the test data.

epsilon = 0.1;

XLowerTest = dlarray(max(XTest-epsilon,0),"SSCB");
XUpperTest = dlarray(min(XTest+epsilon,1),"SSCB");

Define the step size and the number of iterations using the adversarialOptions function.

alpha = 0.01;
numAdvIter = 20;

optionsAdversarialTest = adversarialOptions("bim",NumIterations=numAdvIter,StepSize=alpha);

Use the findAdversarialExamples function to generate adversarial examples.

[examplesTest,mislabelsTest,iXTest] = findAdversarialExamples(net,XLowerTest,XUpperTest,TTest,Algorithm=optionsAdversarialTest);
trueLabelsTest = TTest(iXTest);

View the original image and the adversarial example side-by-side. The adversarial example is misclassified even though the adversarial image appears very similar to the original image.

figure
tiledlayout(1,3); 

nexttile;
imshow(XTest(:,:,:,iXTest(1)));
title({"Original Image","Label: " + string(trueLabelsTest(1))});

nexttile 
imshow(1-(extractdata(examplesTest(:,:,:,1)) - XTest(:,:,:,iXTest(1)))); 
title("Perturbation")

nexttile 
imshow(extractdata(examplesTest(:,:,:,1))); 
title({"Adversarial Example", "Label: " + string(mislabelsTest(1))});

Figure contains 3 axes objects. Hidden axes object 1 with title Original Image Label: 0 contains an object of type image. Hidden axes object 2 with title Perturbation contains an object of type image. Hidden axes object 3 with title Adversarial Example Label: 9 contains an object of type image.

Train Robust Network

You can train a network to be robust against adversarial examples. One popular method is adversarial training. Adversarial training involves finding adversarial examples and adding them to the training set.

Generate a new set of adversarial examples to add to the training data. For the network to be robust to perturbations of size $ϵ$ , perform FGSM training with a value slightly larger than $ϵ$ . For this example, perturb the training images using step size $α = 1.25 ϵ$ .

XLowerTrain = dlarray(max(XTrain-epsilon,0),"SSCB");
XUpperTrain = dlarray(min(XTrain+epsilon,1),"SSCB");

optionsAdversarialTrain = adversarialOptions("fgsm",StepSize=1.25*epsilon);

[examplesTrain,~,iXTrain] = findAdversarialExamples(net,XLowerTrain,XUpperTrain,TTrain,Algorithm=optionsAdversarialTrain);
trueLabelsTrain = TTrain(iXTrain);

Augment the training data set with the new adversarial examples.

augmentedXTrainData = cat(4,XTrain,examplesTrain);
augmentedTTrainData = cat(1,TTrain,trueLabelsTrain);

Train the network on the original and the adversarial data.

netRobust = trainnet(augmentedXTrainData,augmentedTTrainData,layers,"crossentropy",optionsTrainNetwork);

Test the classification accuracy of the robust network by evaluating network predictions on the original test data set. The robust network still performs well on the original test set.

scoresRobust = minibatchpredict(netRobust,XTest);
YTestRobust = scores2label(scoresRobust,classNames);

accuracy = mean(YTestRobust == TTest)

accuracy = 
0.9832

Compare Robust and Nonrobust Networks on Adversarial Examples

Compute the accuracy of the nonrobust and the robust network on the adversarial example data.

scoresTestAdv = minibatchpredict(net,examplesTest);
YTestAdv = scores2label(scoresTestAdv,classNames);
accuracyAdv = mean(YTestAdv == trueLabelsTest')

accuracyAdv = 
0

scoresTestAdvRobust = minibatchpredict(netRobust,examplesTest);
YTestAdvRobust = scores2label(scoresTestAdvRobust,classNames);
accuracyRobustTest = mean(YTestAdvRobust == trueLabelsTest')

accuracyRobustTest = 
0.9796

Plot the results. You can see that for the original network, the accuracy is severely degraded for the adversarial examples. However, the robust network still accurately classifies the images.

visualizePredictions(examplesTest,YTestAdv,YTestAdvRobust,trueLabelsTest);

Supporting Functions

Visualize images along with their predicted classes. Correct predictions use green text. Incorrect predictions use red text.

function visualizePredictions(XTest,YPred,YPredAdv,TTest)

figure
height = 3;
width = 3;
numImages = height*width;

% Select random images from the data.
indices = randperm(size(XTest,4),numImages);

XTest = extractdata(XTest);
XTest = XTest(:,:,:,indices);
YPred = YPred(indices);
YPredAdv = YPredAdv(indices);
TTest = TTest(indices);

% Plot images with the predicted label.
for i = 1:(numImages)
    subplot(height,width,i)
    imshow(XTest(:,:,:,i))

    % If the prediction is correct, use green. If the prediction is false,
    % use red.
    if YPred(i) == TTest(i)
        color = "\color{green}";
    else
        color = "\color{red}";
    end

    if YPredAdv(i) == TTest(i)
        colorAdv = "\color{green}";
    else
        colorAdv = "\color{red}";
    end

    title({"Nonrobust Net: " + color + string(YPred(i)), ...
        "Robust Net: " + colorAdv + string(YPredAdv(i))})
end

end

References

[1] Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. “Intriguing Properties of Neural Networks.” Preprint, submitted February 19, 2014. https://arxiv.org/abs/1312.6199.

[2] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. “Explaining and Harnessing Adversarial Examples.” Preprint, submitted March 20, 2015. https://arxiv.org/abs/1412.6572.

[3] Kurakin, Alexey, Ian Goodfellow, and Samy Bengio. “Adversarial Examples in the Physical World.” Preprint, submitted February 10, 2017. https://arxiv.org/abs/1607.02533.

[4] Madry, Aleksander, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. “Towards Deep Learning Models Resistant to Adversarial Attacks.” Preprint, submitted September 4, 2019. https://arxiv.org/abs/1706.06083.

[5] Wong, Eric, Leslie Rice, and J. Zico Kolter. “Fast Is Better than Free: Revisiting Adversarial Training.” Preprint, submitted January 12, 2020. https://arxiv.org/abs/2001.03994.