More detailed information on results of anomaly type hypothesis test is given in the Table 3.3. It shows p-values for the hypothesis tests for three types of AD classifiers and for multiple sizes of the training set. One can observe that p-values are smaller than significance level for all three classifiers if the training set contains 25 records. With decrease of the size of the training set p-value increases for all three AD classifiers. It becomes larger than 40% for universal classifiers when the size of the training set is 10 records or lower. P-value does not exceed 10% for all sizes of the training sets in the case of aggregated classifier. It remains smaller than the significance level for a sample size of 15 records or larger. For p-values smaller than the significance level, hypothesis H0 is rejected so that statistically significant results of the test are in accordance with suggestions that the detected anomaly is gas-filled sands. Therefore, statistically significant results for tests of anomaly type hypothesis were obtained for sizes of the training set larger than 10 records in the case of the adaptive aggregated classifier, for the size of a training set of 25 in the case of the distance classifier, and for sizes of 20 and 25 records in the case of the sparsity classifier.
3.8 Conclusion
New algorithms and a methodology for machine learning one-class anomaly detection are presented. Detection of gas-filled sandstones, detection of abnormal pressure zones, and detection of gas-filled fractured carbonates are examples of geologic anomaly detection problems.
Three groups of AD classifiers were developed and tested for anomaly detection:
1 a. Universal classifiers applicable to detection of an anomaly of arbitrary type. Included among them is the distance from the center of the training set and the sparsity of neighbors from the training set.
2 b. Specialized methods designed for detection of an anomaly of a specifictype. An example of this type of method is the Bregman type divergenceclassifier designed for the detection of gas-filled sand anomalies.
3 c. New types of adaptable classifiers. An example of a classifier of this type isthe aggregated classifier.
4 d. All three groups of methods were tested for detection of the gas-filled sandanomaly with regular records presented by the records from the brine-filled sands and shales.
Statistical analysis presented in this paper illustrates that the specialized divergence classifier outperforms universal AD classifiers in detecting gas-filled sand anomalies. Its posterior discovery rate was as high as 84% with a false discovery rate smaller than 20%. Adaptable classifiers that do not need information about properties of anomalies have a posterior r discovery rate around 80% with false discovery close to around 20%. Universal methods have a lower posterior true discovery rate. The writers envision the main role of universal AD methods in the detection part of the anomaly as a first step in the adaptation of aggregated classifiers.
A combination of bootstrap and ROC curve analysis was used for analysis of the efficiency of the developed algorithms. In the framework of this approach multiple pairs of training and test sets were generated using bootstrap resampling of the analyzed data set. Then multiple ROC curves were generated. The set of generated ROC curves was subjected to statistical analysis with estimation of median AUC and lower and upper quantiles for the AUC values.
Several tests of the hypothesis related to the identification of anomaly type were completed. An instrument for testing the anomaly type hypothesis is the parameter named anomalyIdentifier which multiple values were generated via bootstrap. Poisson’s ratio was used as the anomaly characterization function. When the adaptable classifier was used for anomaly detection, test results were highly significant for sizes of the training set exceeding 10 records.
References
1. R. Agrawal, and P. Raghavan, A Linear Method for Deviation Detection in Large Databases Arning A. KDD-96, 164–169 (1996).
2. V. Barnett, The study of outliers: Purpose distance and model. Applied Statistics 27(3), 242–250 (1978).
3. V. Barnett, and T. Lewis, Outliers in Statistical Data, p. 582, John Wiley, New York, NY, (1994).
4. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, LOF: Identifying density-based local outliers, Proc. ACM SIGMOD Int. Conf. Dallas, 12 (2000).
5. G. Chilingar, S. Mazzulo, and H. Rieke, Carbonate Reservoir Characterization: A Geologic-engineering Analysis, p. 639, part 1. Elsevier (1992).
6. J. Dvorkin, G. Mavko, and A. Nur, Overpressure detection from compressional and shear-wave data. Geophysical Research Letters 26(22), 3417–3420 (1999).
7. A. Gurevich, G. Chilingar, and F. Aminzadeh, Origin of the formation fluid pressuire distribution and ways to improving pressure prediction methods. J. Pet. Sci. Eng. 12, 67–77 (1994).
8. S. Katz, G. Chilingar, F. Aminzadeh, and L. Khilyuk, Dissimilarity analysis of petro-physical parameters as gas-sand predictors. Journal of Sustainable Energy Engineering 2, 101–115 (2014).
9. A. Ramos, and J. Castagna, Useful approximations for converted-wave AVO. Geophysics 66(6), 1721–1734 (2001).
10. F. Aminzadeh, and S. Chatterjee, Applications of cluster analysis in exploration seismology. Geoexploration 23, 147–159 (1984).
11. F. Aminzadeh, Meta attributes: A new concept for reservoir characterization and seismic anomaly detection, GCAGS 53th Annual Convention (2003).
12. D. Maity, and F. Aminzadeh, Novel fracture zone identifier attribute using geophysical and well log data for unconventional reservoirs. Interpretation Journal 3(3), 155–167 (2015).
13. F. Aminzadeh, J. Barhen, C. W. Glover, and N. B. Toomanian, Reservoir parameter estimation using a hybrid neural network. Computers and Geosciences 26, 869–875 (2000).
14. F. Aminzadeh, Applications of AI and soft computing for challenging problems in the oil industry. J. Petroleum Science and Engineering 47, 5–14 (2005).
15. F. Aminzadeh, A new concept for seismic anomaly detection, Offshore Technology Conf., SPE (2005).
16. M. T. Taner, F. Koehler, and R. E. Sheriff, Complex seismic trace analysis. Geophysics 44, 1041–1063 (1979).
17. S. Chopra, and K. J. Marfurt, Seismic attributes for prospect identification and reservoir characterization. SEG Geophysical Developments 11, 464 (2007).
18. D. Tax, and R. Duin, Uniform object generation for optimizing one-class classifiers. J. Machine Learning Research 2, 155–173 (2001).
19. J. Muñoz-Mari, F. Bovolo, L. Gomez-Chova, L. Bruzzone, and G. Camp-Valls, Semisupervised one-class support vector machines for classification of remote sensing data. Geoscience and Remote Sensing, IEEE Transactions on 48(8), 3188–3197 (2010).
20. L. Bregman, The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. Computational Mathematics and Mathematical Physics 7(3), 200–217 (1967).
21. A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh, Clustering with Bregman Divergences. J. of Machine Learning Research 6, 1705–174 (2005).
22. W. Ostander, Plane-wave refection coefficients for gas sands at nonnormal angles of incidence. Geophysics 49(10), 1637–1648 (1984).
23. P. Jain, C. Jambhekar, and