Naive Bayes algorithm: This is a classification technique based on Bayes’ theorem with an assumption of independence among predictors. In simple terms, a naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
Example 2.1
An animal may be considered to be a tiger if it has four legs, weighs about 250 pounds, and has yellow fur with black strips. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this animal is a tiger, and that is why it is known as “Naive.”
Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods. Bayes’ theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Here we start with
(2.6)
where
P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
P(c) is the prior probability of the class.
P(x|c) is the likelihood, which is the probability of a predictor given the class.
P(x) is the prior probability of the predictor.
Design Example 2.1
Suppose we observe a street guitar player who plays different types of music, say, jazz, rock, or country. Passersby leave a tip in a box in front of him depending on whether or not they like what he is playing. The player chooses to play different songs independent of the tips he receives. Below we have a training dataset of song and the corresponding target variable “tip” (which suggests possibilities of getting a tip for a given song). Now, we need to classify whether player will get a tip or not based on the song he is playing. Let us follow the steps involved in this task.
1 Convert the dataset into a frequency table Data TablesongjazzrockcountryjazzjazzrockcountrycountryjazztipnoyesyesyesyesyesnonoyescountryjazzrockrockcountryyesnoyesyesnoFrequency Tablesongnoyesrock4country32jazz23sum59
2 Create a Likelihood table by finding the probabilities, for example, rock probability = 0.29 and probability of getting a tip is 0.64.Likelihood Tablesongnoyesrock4=4/140.29country32=5/140.36jazz23=5/140.36sum59=5/14=9/140.360.64
3 Now, use the Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of prediction.
In our case, the player will get a tip if he plays jazz. Is this statement correct? We can solve it using the above method of posterior probability.
which has a relatively high probability. On the other hand, if he plays rock we have
R Code for Naive Bayes
require(e1071) #Holds the Naive Bayes Classifier Train <- read.csv(file.choose()) Test <- read.csv(file.choose()) #Make sure the target variable is of a two-class classification problem only levels(Train$Item:Fat_Content) model <- naiveBayes(Item:Fat_Content~., data = Train) class(model) pred <- predict(model,Test) table(pred)
Nearest neighbor algorithms: These are among the “simplest” supervised ML algorithms and have been well studied in the field of pattern recognition over the last century. They might not be as popular as they once were, but they are still widely used in practice, and we recommend that the reader at least consider the k‐nearest neighbor algorithm in classification projects as a predictive performance benchmark when trying to develop more sophisticated models. In this section, we will primarily talk about two different algorithms, the nearest neighbor (NN) algorithm and the k‐nearest neighbor (kNN) algorithm. NN is just a special case of kNN, where k = 1. To avoid making this text unnecessarily convoluted, we will only use the abbreviation NN if we talk about concepts that do not apply to kNN in general. Otherwise, we will use kNN to refer to NN algorithms in general, regardless of the value of k.
kNN is an algorithm for supervised learning that simply stores the labeled training examples,
Then, to make a prediction (class label or continuous target), the kNN algorithms find the k nearest neighbors of a query point and compute the class label (classification) or continuous target (regression) based on the k nearest (most “similar”) points. The overall idea is that instead of approximating the target function f(x) = y globally, during each prediction, kNN approximates the target function locally. In practice, it is easier to learn to approximate a function locally than globally.
Example 2.2
Figure 2.8 illustrates the NN classification algorithm in two dimensions (features x1 and x2). In the left subpanel, the training examples are shown as blue dots, and a query point that we want to classify is shown as a question mark. In the right subpanel, the class labels are also indicated, and the dashed line indicates the nearest neighbor of the query point, assuming a Euclidean distance metric. The predicted class label is the class label