2
Zonotic Diseases Detection Using Ensemble Machine Learning Algorithms
Bhargavi K.
Department of Computer Science and Engineering, Siddaganga Institute of Technology, Tumakuru, India
Abstract
Zonotic diseases are a kind of infectious disease which spreads from animals to humans; the disease usually spreads from infectious agents like virus, prion and bacteria. The identification and controlling the spread of zonotic disease is challenging due to several issues which includes no proper symptoms, signs of zoonoses are very similar, improper vaccination of animals, and poor knowledge among people about animal health. Ensemble machine learning uses multiple machine learning algorithms, to arrive at better performance, compared to individual/stand-alone machine learning algorithms. Some of the potential ensemble learning algorithms like Bayes optimal classifier, bootstrap aggregating (bagging), boosting, Bayesian model averaging, Bayesian model combination, bucket of models, and stacking are helpful in identifying zonotic diseases. Hence, in this chapter, the application of potential ensemble machine learning algorithms in identifying zonotic diseases is discussed with their architecture, advantages, and applications. The efficiency achieved by the considered ensemble machine learning techniques is compared toward the performance metrics, i.e., throughput, execution time, response time, error rate, and learning rate. From the analysis, it is observed that the efficiency achieved by Bayesian model combination, stacking, and Bayesian model combination are high in identifying of the zonotic diseases.
Keywords: Zonotic disease, ensemble machine learning, Bayes optimal classifier, bagging, boosting, Bayesian model averaging, Bayesian model combination, stacking
2.1 Introduction
Zonotic diseases are a kind of infectious disease which spreads from animals to human beings; the disease usually spreads from infectious agents like virus, prion, virus, and bacteria. The human being who gets affected first will, in turn, spread that disease to other human beings likewise the chain of disease builds. The zonotic disease gets transferred in two different mode of transmission, one is direct transmission in which disease get transferred from animal to human being, and the other is intermediate transmission in which the disease get transferred via intermediate species that carry the disease pathogen. The emergence of zonotic diseases usually happens in large regional, global, political, economic, national, and social forces levels. There are eight most common zonotic diseases which spread from animal to humans on a wider geographical area which include zonotic influenza, salmonellosis, West Nile virus, plague, corona viruses, rabies, brucellosis, and lyme disease. Early identification of such infectious disease is very much necessary which can be done using ensemble machine learning techniques [1, 2].
The identification and controlling of spread of zonotic disease is challenging due to several issues which includes no proper symptoms, signs of zoonoses are very much similar, improper vaccination of animals, poor knowledge among the peoples about animal health, costly to control the world wide spread of the disease, not likely to change the habits of people, prioritization of symptoms of disease is difficult, lack of proper clothing, sudden raise in morbidity of the humans, consumption of spoiled or contaminated food, inability to control the spread of zonotic microorganisms, reemerging of zonotic diseases at regular time intervals, difficult to form coordinated remedial policies, violation of international law to control the disease, transaction cost to arrive at disease control agreements is high, surveillance of disease at national and international level is difficult, unable to trace the initial symptoms of influenza virus, wide spread nature of severe acute respiratory syndromes, inability to provide sufficient resources, climate change also influences on the spread of the disease, difficult to prioritize the zonotic diseases, increasing trend in the spread of disease from animals to humans, and continuous and close contact between the humans and animals [3, 4].
Ensemble machine learning uses multiple machine learning algorithms, to arrive at better performance, compared to individual/stand-alone machine learning algorithms [5, 6]. Some of the potential ensemble learning algorithms like Bayes optimal classifier, bootstrap aggregating (bagging), boosting, Bayesian model averaging (BMA), Bayesian model combination, bucket of models, stacking, and remote sensing. Some of the advantages offered by ensemble machine learning compared to traditional machine learning are as follows: better accuracy is achieved in prediction, scalability if the solution is high as it can handle multiple nodes very well, combines multiple hypothesis to maximize the quality of the output, provides sustainable solution by operating in an incremental manner, efficiently uses the previous knowledge to produce diverse model-based solutions, avoids overfitting problem through sufficient training, models generated are good as they mimics the human like behavior, complex disease spreading traces can be analyzed using combined machine learning models, misclassification of samples is less due to enough training models, not sensitive toward outliers, cross-validation of output data samples increases performance, stability of the chosen hypothesis is high, measurable performance in initial data collection is high, will not converge to local optimal solutions, exhibits non-hierarchical and overlapping behaviors, several open source tools are available for practical implementation of the models, and so on [7–9].
The main goal of applying ensemble machine learning algorithms in identifying the zonotic diseases are as follows: decreases the level of bagging and bias and improves the zonotic disease detection accuracy with minimum iteration of training, automatic identification of diseases, use of base learners make it suitable to medical domain, easy to identify the spread of disease at early stage itself, identifies the feature vector which yields maximum information gain, easy training of hyper parameters, treatment cost is minimum, adequate coverage happens to large set of medical problems, reoccurrence of the medical problems can be identified early, high correlation between machine learning models leads to efficient output, training and execution time is less, scalability of the ensemble models is high, offers aggregated benefits of several models, non-linear decision-making ability is high, provides sustainable solutions to chronic diseases, automatic tuning of internal parameters increases the convergence rate, reusing rate of the clinical trials gets reduced, early intervention prevents spread of disease, capable to record and store high-dimensional clinical dataset, recognition of neurological diseases is easy, misclassification of medical images with poor image quality is reduced, combines the aggregated power of multiple machine learning models, and so on [10, 11].
2.2 Bayes Optimal Classifier
Bayes optimal classifier is a popular machine learning model used for the purpose of prediction. This technique is based on Bayes theorem which is principled by Bayes theorem and closely related to maximum posteriori algorithm. The classifier operates by finding the hypothesis which has maximum probability of occurrence. The probable prediction is carried out by the classifier using probabilistic model which finds the most probable prediction using the training and testing data instances.
The basic conditional probability equation predicts one outcome given another outcome, consider A and B are two probable outcomes the probability of occurrence of event using the equation P(A|B) = (P(B|A)*P(A))/P(B). The probabilistic frameworks used for prediction purpose are broadly classified into two types one is maximum posteriori, and the other is maximum likelihood estimation. The important objective of these two types of probabilistic framework is that they locate most promising hypothesis in the given training data sample. Some of the zonotic diseases which can be identified and treated well using Bayes optimal classifier are Anthrax, Brucellosis, Q fever, scrub typhus, plague, tuberculosis, leptospirosis, rabies, hepatitis, nipah virus, avian influenza, and so on [12, 13]. A high-level representation of Bayes optimal classifier is shown in Figure 2.1. In the hyperplane of available datasets, the Bayes classifier performs the multiple category classification operation to draw soft boundary among the available datasets and make separate classifications. It is observed that, with maximum iteration of training and overtime, the accuracy of the Bayes optimal classifier keeps improving.