The intersection of Machine Learning (ML) and cybersecurity is significant. ML can be used to attack computer systems. This could be achieved by automatically discovering vulnerabilities or generating phishing e-mails. At the same time, ML – e.g. anomaly and network intrusion detection or spam filters – can also be used to defend computer systems by looking at the patterns of how people log in, type their password, etc.
In this article, we focus on adversarial ML, i.e. what types of threats an ML system may be sensitive to. We review the three main types of attacks that aim to compromise the ML system and make it produce incorrect results or leak private information. Let’s start with some background information.
With ransomware making the headlines, we realize that the risks for companies are increasing as they rely more and more on digitalized systems for crucial processes. Surprisingly, many companies still aren’t concerned with cybersecurity or «haven’t had the time yet» to address the topic. No wonder, therefore, that cybercrime pays off.
ML has been developing at an extremely fast pace in recent years. With it, the number of discoveries related to attacks and vulnerabilities has grown in proportion. ML vulnerabilities have already been exploited in the wild, and it is only a matter of time that these risks become widespread. At Adnovum, we do our best to stay ahead of such trends and ensure we can adequately share and provide support for such cutting edge topics.
Training an ML model involves relying on an algorithm to take its best shot at the problem you’re trying to solve by «simply» looking at data. In most cases, this means providing annotated examples so that your model can classify future unseen examples correctly. How does this impact cybersecurity? The logic or the software our computer is running to make new decisions goes from something that is made for humans by humans – i.e. code – to a series of weights and parameters that perform a calculation and produce a result.
With regular software you identify a bug, create a test case, and fix the code. With ML, all static analysis is nearly impossible. In addition, ML is probabilistic in nature: Even if you have a 99% accurate model, one out of a hundred predictions will be wrong. In complex models, ML parameters are not only notoriously difficult to interpret, but they are in the millions or billions, especially with larger models that take care of computer vision and natural language processing tasks. This means that we have reduced control of some of the mechanisms cybersecurity usually relies on.
What types of attacks are we concerned with, then? Let us review the three main ones: data poisoning, adversarial input, and model privacy attacks. These cases are applicable when dealing with attackers who do not have direct access to the model but simply exploit interaction patterns.
With data poisoning, an attacker tries to provide wrong data so that the ML model is trained to make wrong predictions.
This can be achieved in various ways, such as exploiting feedback loops that are used to improve the model. For example, if Gmail blindly trusted user reports on spam/not spam, it wouldn’t take long to influence the model to let specific spam through or reject a competitor’s mail. A well-known example of this type of exploit was the poisoning of Microsoft’s Tay bot on Twitter. Learning from conversations with other users, the bot quickly started to interact in a socially unacceptable fashion.
A data poisoning attack could be carried out to create a backdoor: The poisoned data would introduce subtle errors in the model’s predictions designed to remain unnoticed.
Creating an adversarial input is about designing an input so that it will be wrongly classified. Small modifications that are not necessarily recognized by a human (such as background noise or a small perturbation) may mislead the algorithm, causing it to fail. Adversarial inputs can have a huge impact on real-world systems.
If we take the case of car autopilots, for example, wrong interpretation of road signs could be disastrous. Small targeted modifications, such as a post-it on a stop sign, could result in wrong classifications. Luckily, models are becoming more robust now thanks to increasing awareness.
Adversarial input attacks are usually successful because ML models don't only learn the things humans see or hear, but pick up numerous different patterns which are underlying in the data … and exploitable.
Attackers may also try to attack an ML model. But instead of tricking it into making wrong classifications, the aim is to obtain information about the training data of the model itself. Data sets can constitute an important intellectual property asset that is sensitive (such as personal healthcare data). Researchers have shown that it is possible to guess if certain data was used for training and even to roughly reconstitute complex training data, like faces. There are multiple approaches to address this concern, although they can impact the model’s metrics.
In addition to stealing training data or getting a glimpse of it, attackers might steal the model itself. By testing the model that was developed over years with enough queries and examples, they may be able to rebuild a model in no time that does the exact same thing.
ML initiatives may obviously come with certain risks. There are multiple options for attackers to target an organization and compromise it. Remember: In all three types of attacks, hackers do not need direct access to the model, but simply exploit interaction patterns. Scary? The good news: There are ways to protect yourself and your organization. Stay tuned and get in touch with us!
[snippet_article_cta id="article-cta-ml-special-cybersec-en"]