Following are three most frequently used Density-based Clustering techniques:
1 i) Mean-Shift
2 ii) OPTICS
3 iii) DBSCAN
c) Centroid-Based Clustering
Clusters that are represented by a vector are a part of centroid-based clustering. It is not a mandate requirement that these clusters should be a part of the given dataset. The number of clusters is inadequate to size k in k means clustering algorithm; therefore, it is essential to find centres of k cluster and allocate objects to their nearest centres. By taking different values of k random initializations, this algorithm runs multiple times to select the best of multiple runs (Giannotti et al., 2013). In k medoid clustering, clusters are firmly limited to the members of the dataset, whereas in k medians clustering, median is taken to form a cluster; the foremost drawback of these techniques is that we have to select the number of clusters beforehand.
d) Connection-Based (Hierarchical) clustering
As the name itself suggests, this type of clustering is performed on the basis of closeness or distance of objects. The most important key point to form these types of cluster is the distance between the objects by which they can be connected with each other and form clusters. Instead of single partitioning of dataset, these algorithms provide an in-depth hierarchy of merging clusters at particular distances. To represent clusters, a dendrogram is used. Merging distance of the clusters is shown on the y-axis and an object placement shows the x-axis to ensure that there should not be the mixing of clusters.
On the basis of the different ways with which distance is calculated, there are several types of connection-based clusters:
1 i) Single-Linkage Clustering
2 ii) Complete-Linkage
3 iii) Average-Linkage Clustering
e) Recent Clustering Techniques
For high dimensional data, the above-mentioned standard clustering techniques are not fit, therefore some new techniques are being discovered. These new techniques can be classified into two major categories, namely: Subspace Clustering and Correlation Clustering.
A small list of attributes that should be measured for the formation of a cluster is taken into consideration under subspace clustering. Correlation between the chosen attributes can also be performed with correlation clustering.
2.4 Privacy Preserving Data Mining (PPDM)
To extract the pertinent knowledge from large volumes of data and to protect all sensitive information of that database, we use privacy preserving data mining (PPDM). These techniques are created with the aim to confirm the protection of sensitive data so that privacy can be reserved with the efficient performance of all data mining operations. There are two classes of privacy concerned data mining techniques:
1 Data privacy
2 Information privacyModification of database for the protection of sensitive data of the individuals, we use data privacy technique. If there is a requirement for the modification of sensitive knowledge that can be deduced from the database, information privacy technique is preferred. To provide privacy to input, data privacy is preferable, whereas for providing privacy to output, the technique of information privacy is used. To reserve personal information from exposure is the main focus of a PPDM algorithm. It relies on the analysis of those mining algorithms that are attained during data privacy. Main objective of Privacy Preserving Data Mining is building algorithms that convert the original data in some useful means, so that there is no visibility of private data and knowledge even after a successful mining process. Privacy laws would allow the access in the case that some related satisfactory benefit is found resulting from the access.
2.5 Intrusion Detection Systems (IDS)
Onset detection of the intrusion is the main aim of an Intrusion detection system. There is a requirement of a high level of human knowledge and substantial amount of time to attain security in data mining. However, intrusion detection systems based on data mining need less expertise for better performance. To perceive network attacks in contrast to services that are vulnerable, intrusion detection system is very helpful. Applications-based data-driven attacks always privilege escalation (Thabtah et al., 2005), un-authorized logins and files accessibility is very sensitive in nature (Hong, 2012). Data mining process can be used as a tool for cyber security for the competent detection of malware from the code. Figure 2.3 shows the outline of an intrusion detection system. Several components such as, sensors, a console monitor and a central engine forms the complete intrusion detection system. Security events are generated by sensors whereas the task of console monitor is to monitor and control all events and alerts. The main function of the central engine is recording of events in a database and on the basis of these events, alerts can be created followed by certain set of rules. Following factors are responsible for the classification of an intrusion detection system:
1 i) Location
2 ii) Type of Sensors
3 iii)Technique used by the Central engine for generation of alerts.
Figure 2.3 An overview of intrusion detection system (IDS).
All the three components of an intrusion detection system can be integrated into a single device.
2.5.1 Types of IDS
Detection of an intrusion could be done either on a network or with an individual system and therefore we have three types of IDS, namely: Network Based, Host Based and Hybrid IDS.
2.5.1.1 Network-Based IDS
Computer networks have been targeted by enemies and criminals because of their progressively dynamic roles in modern societies. It is very important to find the best possible solutions for the sake of protection of our systems. Various techniques of intrusion prevention like programming errors avoidance, protection of information using encryption techniques and biometrics or passwords (Zhan et al., 2005) can be used as a first line of security. By using intrusion prevention technique as the only protection measure, our system is not 100% safe from combat attacks. To provide an additional security for computer system, the above-mentioned techniques are used. Various resources like accounts of users, their file systems and the system kernels of a target system must be protected by an intrusion detection system. For network-based intrusion detection systems, data source is the network packets. To listen and analyse network traffic as the packets travel across the network, the network-based intrusion detection system (NIDS) makes use of a network adapter. A network-based intrusion detection system is used to generate alerts for the detection of an intrusion which is outside of the boundary of its enterprise.
Advantages
Following are the advantages of a Network-Based IDS:
1 They can be made invisible to improve the security against attacks.
2 Large size of networks can be monitored by network-based IDS.
3 This IDS can give better output deprived of upsetting the usual working of a network.
4 It is easy to fit in an IDS into an existing network.
Limitations
Limitations of Network-Based IDS are as follows:
1 Virtual