Deceiving machine learning models – The final frontiers of code review.

This image has an empty alt attribute; its file name is screenshot-from-2020-07-23-16-00-51.png
Now You See a Cute little panda , Now You Don’t.

In the lasts years of the software industry, the computational power at disposition of developers has increased to the point where for a set of complex problems, the creation of numerical models that represent the understanding on the subject, is now feasible. This models are built using machine learning algorithms, and this represents a new type of challenge in code auditing and application security.

Understanding the target

As a case study, we have chosen to classify what type of object is in an image. To solve these types of complex problems, years of development and expertise are no longer needed in order to specify in a declarative manner the context of the problem and all the possible nuances to which the software is going to be exposed. All the libraries that once may have been written to process the classification of an image (involving Computer Vision algorithms like, Canny edge detection, Hough Line Transform, etc. and heuristics hacks that developers often use to approximate the solution to what it is expected) now can be achieved through the training of a Convolutional Neural Network (CNN).

CNN Diagram.

A CNN is a type of neural network, commonly utilized for computer vision applications. Without going into much detail, CNN consist of an input layer (in our case the representation of the image) and an output layer (in our case the classification of the image), as well as multiple hidden layers. The hidden layers convolve the input and apply an activation function (commonly ReLU), the convolution is a specialized kind of linear operation applied intention of extracting the relevant features of the original image, once this is done a “Pooling” layer reduces the dimension of the data and passes it to the next hidden layer.
As we can see a CNN has a lot of flexibility in its definition, and different architectures of neural networks have been proposed. But the core of the effectiveness of the algorithm depends on how the features are extracted, and this is highly dependent on how the convolution is done. For this we must understand how Neural Networks are trained.

To train a Convolutional Neural Network (NN) in a supervised manner, we must have a fully classified representative dataset of our problem to solve, and large enough so we can test and correct the evaluation of our model to the point where it can predict the result as expected.

In every convolution, our CNN will apply some internal weights and bias to the processed input. Learning involves adjusting the “weights” of the network to improve the accuracy. This is done through the calculation of a cost function, and optimizing it. The optimization is commonly done with gradient descent [1]

Misclassification attack

As we have seen, the fact that our program now has a numerical representation of knowledge implies that a lot of the understanding of the problem comes from the understanding of the underlying mathematical concepts. This alone, is a hint that attacks done to CNN, will most likely abuse the algorithm chosen to create them.

In this practical example, we will take a look to the paper “Explaining And Harnessing Adversarial Examples” published by Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy [2], where the “fast gradient sign method (FGSM) ” is defined. This method proposes the creation of a vector.

Mathetical definition of Fast Gradient Sign Method Vector.

Where is θ is the model, x the original image, y the label associated with x and J(θ, x, y) be the cost function used to train the neural network. This comes as a solution of maximizing wX = wx + wη where X is a alteration of the original image x, X = x + η.


PyTorch has implementations of different CNNs, all vulnerable to this type of misclassification attack [3]. All of them pre-trained with the widely used ImageNet Dataset.

Our implementation [4] of this attack uses the PyTorch’s pre-trained GoogLeNet to generate a misclassified image of a Giant Panda that it is not discernible for humans. The image of the misclassification vector is amplified for demonstration purposes

As always , PoC || GTFO.

Attacks mitigations

Although these kinds of attacks are inherent to the type of algorithm used to train the NN, there are methods to detect and protect against it. Studies have shown, that the statistical distribution of data corresponding to this, and other types of adversarial examples can be detected. Increasing the accuracy and reliability of our solutions.

Sources

[1] https://en.wikipedia.org/wiki/Gradient_descent
[2] https://arxiv.org/pdf/1412.6572.pdf
[3] https://pytorch.org/docs/stable/torchvision/models.html
[4] https://github.com/pucarasec/Fast_gradient_sign_method

One thought on “Deceiving machine learning models – The final frontiers of code review.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s