1.3.4.3. Markov chains
A Markov chain is a random process related to a finite number of states, with memoryless transition probabilities. During the learning phase, probabilities associated with transitions are estimated from the normal behavior of the target system. Detection of anomalies is then achieved by comparing the anomaly score obtained for the sequences observed at a fixed threshold. In the case of a hidden Markov model (Hu et al. 2009; Zegeye et al. 2018; Liang et al. 2019), the system we are interested in is assumed to be a Markov process in which states and transitions are masked. In the literature, several methods have been presented for solving the intrusion detection problem by inspecting the packet headers. Mahoney and Chan (2001) experimented with anomaly detection on DARPA network data by comparing the header fields of the network packet. Several systems use the Markov model for intrusion detection: PHAD (Packet Header Anomaly Detector) (Mahoney and Chan 2001), LERAD (Learning Rules for Anomaly Detection) (Mahoney and Chan 2002a) and ALAD (Application Layer Anomaly Detector) (Mahoney and Chan 2002b). In the book of Zegeye et al. (2018), an intrusion detection system using the hidden Markov model is proposed. The phase of network traffic analysis involves characteristic extraction techniques, reduction of dimensions and vector quantization, which plays an important role in large sets of data, as the amount of data transmitted increases every day. Model performances with respect to the KDD 99 dataset indicate an accuracy above 99%.
1.3.4.4. Support-vector machines
The support-vector machine is a technique used for solving various learning, classification and prediction problems. The support-vector machine was employed in an implementation of the structural risk minimization (SRM) principle of Vapnik (1998), which minimizes the generalization error, in the sense of true error on unseen examples. The basic support-vector machine addresses problems with two classes, in which data are separated by a hyperplane defined by a certain number of support vectors. Support vectors are a subset of learning data serving to define the limit between the two classes. When the support-vector machine cannot separate two classes, it solves this problem by mapping the input data in spaces of high-dimensional functions by means of a kernel function. In a high-dimensional space, it is possible to create a hyperplane enabling a linear separation (which corresponds to a curved surface in the lower input space). Consequently, the kernel function plays an important role in the support-vector machine. In practice, various kernel functions can be used, such as linear, polynomial, or Gaussian. A remarkable property of the support-vector machine is its learning capacity, which does not depend on the dimensionality of the characteristic space. This means that the support-vector machine can generalize when given numerous functionalities. Mukkamala and Sung (2003b) showed the many advantages of the support-vector machine compared to other techniques. Support-vector machines surpass neural networks in terms of upgradability, learning time, runtime and prediction accuracy. Mukkamala and Sung (2003a) also applied support-vector machines for the extraction of intrusion detection characteristics of KDD files. They empirically proved that the functionalities selected using the support-vector machine yielded similar results as the use of a full set of functionalities. This decrease in the number of functionalities reduces the computation efforts. Chen et al. (2005) also proved that support-vector machines surpassed neural networks.
1.3.5. Clustering techniques
Clustering techniques operate by organizing observed data in groups, depending on a given similarity or a distance measurement. Similarity can be measured by using the cosine formula, the binary weighted cosine formula proposed by Rawat (2005) or other formulas. The most commonly used procedure for clustering involves the selection of a representative point for each cluster. Then each new data point is classified as belonging to a given group depending on the proximity to the corresponding representative point. There are at least two approaches for the classification-based detection of anomalies. In the first approach, the anomaly detection model is formed using unlabeled data including both normal and attack traffic. In the second approach, the model is formed using only normal data and a normal activity profile is created. The idea underlying the first approach is that abnormal or attack data represent a small percentage of the total data. If this hypothesis is verified, anomalies and attacks can be detected depending on cluster size: large clusters correspond to normal data and the other data points to attacks. Liao and Vemuri (2002) used the K-nearest neighbor (K-nn) approach, based on the Euclidian distance, to define the belonging of data points to a given cluster. The Minnesota intrusion detection system is a network-based anomaly detection approach that uses data exploration and clustering techniques (Levent et al. 2004).
Leung and Leckie (2005) proposed an unsupervised anomaly detection approach for intrusion detection on a network. The proposed algorithm, known as “fpMAFIA”, is a clustering algorithm based on density and on grid for large data sets. The major advantage of this algorithm is that it can produce arbitrary forms and cover over 95% of the set of data with appropriate values of parameters. The authors proved that the algorithm evolves linearly with respect to the number of registrations in the set of data. They evaluated the accuracy of the newly proposed algorithm and proved that it enables reaching a reasonable detection rate.
1.3.6. Hybrid techniques
Many researchers suggested that the monitoring capacity of current IDS systems could be improved by adopting a hybrid approach including detection techniques of both anomalies and signatures (Lunt et al. 1992; Anderson et al. 1995; Fortuna et al. 2002; Hwang et al. 2007). Sabhnani and Serpen (2003) proved that no single classification technique enables the detection of all the attack classes at an acceptable false alarm rate and with a good detection accuracy. The authors used various techniques to classify the intrusions by means of a KDD 1998 dataset. Many researchers proved that the hybrid or set-based classification technique can improve detection accuracy (Mukkamala et al. 2005; Chen et al. 2005; Aslahi-Shahri et al. 2016; Hamamoto et al. 2018; Hajimirzaei and Navimipour 2019; Sai Satyanarayana Reddy et al. 2019). A hybrid approach involves the integration of various learning or decision-making models. Each learning model operates differently and uses a different set of functionalities. The integration of various learning models yields better results than the individual learning or decision-making models and reduces their individual limitations. A significant advantage of the combination of redundant and complementary classification techniques is that it increases robustness and accuracy in most applications.
Various methods combining various classification techniques were proposed in the literature (Menahem et al. 2009; Witten et al. 2016). Ensemble methods have a common objective: to build a combination of certain models, instead of using a single model to improve the results. Mukkamala and its collaborators (2005) proved that the use of ensemble classifiers led to the best possible accuracy for each category of attack models. Chebrolu et al. (2005) used the Classification And Regression Trees-Bayesian network (CART-BN) approach for intrusion detection. Zainal et al. (2009) proposed the hybridization of linear genetic programming of the adaptive neural fuzzy inference system and of random forests for intrusion detection. They proved empirically that by assigning appropriate weights to the classifiers in a hybrid approach, the accuracy of detection of all the classes of network traffic is improved compared to an individual classifier. Menahem et al. (2009) used various classifiers and tried to take advantage of their strengths. Hwang et al. (2007)