The anomaly identification rule is constructed as follows:
Record Yk is classified as regular if
(3.2)
If AD(Yk) > anomaly detection cuttoff, then record Υ is classified as anomalous. The anomaly cutoff is defined by the expected false discovery rate (expFD):
(3.3)
N(AD(Yk) > anomaly cuttoff; Yk ⊂ TrainSet) is the number of records in the training set with values of anomaly detection classifier exceeding cutoff, K is total number of records in the training set.
For construction of anomaly detection classifiers we selected parameters based on results of dissimilarity analysis (Katz et al., [8]). These parameters are Vp/Vs and Poisson’s Ratio.
3.3 Basic Anomaly Detection Classifiers
Three basic classifiers are introduced, analyzed and tested in this paper:
1 1. Distance from the center of the training set:(3.4)where ym and ctr,m are coordinates of the tested record and of the center of the training set respectively. The center of the training set is defined as the mean over train set records. Coordinates of the training set center are of the form: where yk,m is the m-th coordinate of the k-th record in the training set, K is total number of records in the training set.
2 2. Nearest neighbors sparsity:(3.5)where dist(Y, neighborl) is the distance between tested record Y and its l-th nearest neighbor from the training set. The farther away in a parameter space tested records are from the records in the training set, the larger are both the sparsity and the distance from the center of the training set. These two classifiers are universal. Their performance is not affected by the properties of records in the training set.
3 3. Divergence is defined as follows:(3.6)
The divergence defined by the Eq. 3.6 is of the “Bregman divergence” type (Bregman [20]). It is similar to distance, but does not satisfy either the triangle inequality or the symmetry conditions. Applications of Bregman divergence to the solution of machine learning problems are presented, for example in the Banerjee et al., [21]. Bregman type divergence of Eq. 3.6 is a new highly specialized AD classifier with coefficients am dependent on the anomaly type. It needs prior information about the type of potential anomaly. This classifier may be efficient, for example, if all coordinates of the anomaly records tend to be smaller than the respective coordinates of the records in the training set. This is to be the case for such parameters as Vp/Vs and Poisson’s ratio, if the training set is a compilation of the records obtained in brine-filled sands or shales, and the anomaly of interest is gas-filled sands. In this case reasonable values for coefficients in Eq. 3.6 are am=1.
We also construct and test adaptive aggregated anomaly classifiers designed to identify anomalies with unknown properties. They are built as a linear combination of measured parameters:
Weights sm in Eq. 3.7 should be adjusted according the properties of a specific anomaly. In this paper, the writers showed a technique for the optimization of these coefficients for detection of an anomaly with unknown properties.
3.4 Prior and Posterior Characteristics of Anomaly Detection Performance
To characterize anomaly detection quality, we introduced and distinguished two types of quality characteristics: (a) Prior quality characteristics and (b) Posterior (actual) quality characteristics. The only prior classification quality characteristic is an expected false discovery rate (expFD). Value of the expected false discovery rate is assigned prior to performing data analysis. It is used for calculation of anomaly detection cutoff (AD cutoff) on the data in a training set. Posterior characteristics are calculated on the test set with identified regular and anomaly records. They include true and false discovery rates as functions of the AD cutoff. True and false discovery rates form a posterior ROC curve, which is used for evaluation of area under the ROC curve and comparative analysis of efficiency of several anomaly detection classifiers.
The writers used bootstrap for statistical analysis of anomaly detection results and did comparative analysis of properties of posterior efficiency characteristics. At each bootstrap run, sampling with replacement was done and a randomly formed pair of training and test set was constructed. The training set was selected from a pool of regular records. Each test set contained both regular and anomaly records. Multiple pairs of training and test sets produced by random sampling were utilized for calculation of quality characteristics of AD classifiers. They included mean and median values, and width of the quantile region for analyzed AD characteristics. They also included analysis of parameters characterizing relations between expected false discovery rate and posterior AD characteristics. The ROC curve analysis was done using multiple posterior ROC curves.
Figure 3.1 illustrates the methodology for one-class anomaly detection technique that starts with an assignment of expected FDR and calculation of anomaly classification cutoff. It also demonstrates a high level of performance of the divergence classifier.
First 20 values in the Figure 3.1 marked by circles (Index≤20) are the values of the divergence classifier on the records from the training set. The points marked by triangles and crosses (Index>20) show values of the divergence classifier on the records from the test set. The horizontal dashed line shows anomaly detection cutoff. Records with the values above the cutoff are classified as anomalous. One can observe that distribution of the values of the divergence classifier on the regular records in the test set is very similar to that in the training set. On the other hand, the values of the divergence for anomaly records are systematically higher compared to divergence for regular records. The anomaly detection cutoff corresponds to the expected false discovery rate of 15%. Expected FDR is calculated as percent of records in the training set that exceeds AD cutoff. Posterior true and false discovery rates are, respectively, the percentages of regular and anomaly records in the test set exceeding AD cutoff. In this particular case, the posterior false discovery rate is smaller compared to the expected FDR and equals 6.6%. True discovery rate is high and equals 84%. High true discovery rate is due to the large proportion of anomaly records characterized by positive divergence values. Low posterior FDR is due to the fact that divergence values on a large proportion of regular records in the test set are smaller than the classification cutoff.
Figure 3.1 Divergence values for records in training and test set. The horizontal dashed line is the classification cutoff for the expected false discovery rate of 15%. The test set contains 30 regular records and 25 anomaly records from gas-filled sands. Each record (Vp/Vs and Posson’s ratio).