1.2. AI in the service of cybersecurity
AI systems are generally efficient, being less time and money-consuming than a human being when fulfilling a given task. AI systems are also evolutionary, as their computation power enables the completion of far more tasks in the same amount of time. For example, a typical facial recognition system is both efficient and evolutionary; once developed, it can be applied to numerous camera flows with a significantly lower cost than that of human analysts employed to perform a similar job. This explains why cybersecurity experts are seriously looking into AI and its potential contribution to mitigating certain problems. As an example, machine learning used by many AI algorithms can help detect malware, which are increasingly difficult to identify and isolate due to their growing capacity to adapt to traditional security solutions (Veiga 2018).
Capgemini Research Institute has conducted a survey of 850 managers of seven large industrial companies: among the top management members included in this survey, 20% are information systems managers and 10% are responsible for information systems security. Companies headquartered in France, Germany, United Kingdom, the United States, Australia, India and Italy are mentioned in the report (Capgemini Research Institute 2019). Capgemini noted that, as digital companies develop, their cyberattack risk increases exponentially. It has been noted that 21% of companies declared one cybersecurity breach experience leading to unauthorized access in 2018. The price paid by companies for cybersecurity breaches is heavy (20% declared losses of over 50 million dollars). According to this survey, 69% of the companies estimate a need for AI to counteract cyberattacks. The majority of telecommunications companies (80%) declared that they relied on AI to identify the threats and counteract the attacks. According to the Capgemini report, the telecommunications sector declared the highest losses of over 50 million dollars, which led to AI being considered a priority in counteracting the costly breaches in this sector. Understandably, consumer goods sellers (78%) and banks (75%) came second and third, respectively, in this ranking, as these sectors increasingly rely on digital models. Companies based in the United States have as their top priority AI-based cybersecurity applications and platforms.
Figure 1.2. Organizations and countries relying on artificial intelligence to identify threats and counteract attacks
New vulnerabilities are discovered every day in the current programs, and these may infect and take control of a company’s entire network. In contrast to traditional software vulnerabilities (for example, buffer memory overflow), the current intelligent systems have a certain number of vulnerabilities. This involves in particular data input causing errors in learning systems (Biggio et al. 2012), taking advantage of the flaws in the design of autonomous systems’ objectives (Amodei et al. 2016) and the use of inputs designed to falsify the classification of machine learning systems (Szegedy et al. 2013). As these vulnerabilities show, intelligent systems may outperform humans, but their potential failures are also unrivaled.
An ideal cyberdefense would offer full protection to users, while preserving system performances. Although this ideal cyberdefense may currently seem very distant, steps could be taken toward it by rendering cyberdefense more intelligent. The idea of using AI techniques in cybersecurity is not new. Landwehr (2008) states that, at their start, computer security and AI did not seem to have much in common. Researchers in the field of AI wanted computers to do by themselves what humans were able to do, whereas the researchers in the security field tried to solve the leakages in the computer systems, which they considered vulnerable. According to Schneier (2008), “The Internet is the most complex machine ever built. We barely understand how it works, not to mention how to secure it”. Given the rapid multiplication of new web applications and the increasing use of wireless networks (Barth and Mitchell 2008) and the Internet of Things, cybersecurity has become the most complex threat to society.
The need for securing web applications against attacks (such as Cross Site Scripting [XSS], Cross Site Request Forgery [CSRF] and code injection) is increasingly obvious and pressing. Over time, XSS and CSRF scripts have been used to conduct various attacks. Some of them can be interpreted as direct bypasses of the original security policy. The same security policy was similar to a simple and efficient protection, but it turned out it could be easily bypassed and certain functionalities of modern websites could be blocked. According to Crockford (2015), the security policies adopted by most browsers “block useful contents and authorize dangerous contents”. These policies are currently being reviewed. However, the detection of attacks such as XSS, CSRF or code injection requires more than a simple rule, namely a context-dependent reasoning capacity.
The use of AI in cybersecurity generally involves certain smart tools and their application to intrusion detection (Ahmad et al. 2016; Kalaivani et al. 2019) or other aspects of cybersecurity (Ahlan et al. 2015). This approach involves the use of other AI techniques developed for problems that are entirely different from cybersecurity; this may work in certain cases, but it has inherent and strict limitations. Cybersecurity has specific needs, and meeting them requires new specifically developed AI techniques. Obviously, AI has substantially evolved in certain fields, but there is still a need for learning and developing new intelligent techniques adapted to cybersecurity. In this context, according to Landwehr (2008) one “AI
branch related to computer security from its earliest age is automated reasoning, particularly when applied to programs and systems. Though the SATAN program of Dan Farmer and Wietse Venema, launched in 1995, has not yet been identified as AI, it has automated a process searching for vulnerabilities in system configurations that would require much more human efforts”. Ingham et al. (2007) have proposed an inductive reasoning system for the protection of web applications. The works of Vigna and co-workers (Mutz et al. 2007; Cova et al. 2007, 2010; Kirdaa et al. 2009; Robertson et al. 2010) have also dealt with the protection of web applications against cyberattacks. Firewalls using deep packet inspections can be considered a sort of AI instantiation in cybersecurity. Firewalls have been part of the cyberdefense arsenal for many years. Although in most cases more sophisticated techniques (Mishra et al. 2011; Valentín and Malý 2014; Tekerek and Bay 2019) are also used, filtering relies on the port number. Firewalls cannot rely on the port number, as most web applications use the same port as the rest of the web traffic. Deep packet inspection is the only option enabling the identification of malware code in a legitimate application. The idea of application layer filtering of the Transmission Control Protocol/Internet Protocol (TCP/IP) model was introduced in the third generation of firewall in the 1990s. The modest success of these technologies is an indication that much more is still to be done in AI, so that it can make a significant difference in terms of cybersecurity. Nevertheless, it is worth noting that using AI in cybersecurity is not necessarily a miracle solution. For example, attacks without malware, which require no software download and dissimulate malware activities inside legitimate cloud computing services, are on the increase, and AI is not yet able to counteract these types of network breach.
1.3. AI applied to intrusion detection
Intrusion detection is defined as the process of intelligent monitoring of events occurring in a computer system or network and their analysis in search for signs of security policy breach (Bace 2000). The main objective of intrusion detection systems is to protect network availability, confidentiality and integrity. Intrusion detection systems are defined both by the method used