Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Savo G. Glisic
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Программы
Год издания:	0
isbn:	9781119790310

Скачать книгу

i element-of d 1 Subscript Baseline upper S EndAbsoluteValue EndFraction minus StartFraction StartAbsoluteValue sigma Subscript a Sub Subscript i element-of d 2 Subscript intersection y equals c Sub Subscript i Subscript Baseline upper S EndAbsoluteValue Over StartAbsoluteValue sigma Subscript a Sub Subscript i element-of d 2 Subscript Baseline upper S EndAbsoluteValue EndFraction EndAbsoluteValue right-parenthesis squared"/>

When the target attribute is binary, the Gini and twoing criteria are equivalent. For multiclass problems, the twoing criterion prefers attributes with evenly divided splits.

Orthogonality (ort) criterion: This binary criterion is defined as

italic o r t left-parenthesis a Subscript i Baseline comma d 1 comma d 2 upper S right-parenthesis equals 1 minus cosine theta left-parenthesis upper P Subscript y comma 1 Baseline comma upper P Subscript y comma 2 Baseline right-parenthesis

where θ(P_y,1, P_y,2) is the angle between two distribution vectors P_y,1 and P_y,2 of the target attribute y on the bags sigma Subscript a Sub Subscript i Subscript element-of d 1 Baseline upper S and , respectively. It was shown that this criterion performs better than the information gain and the Gini index for specific problem constellations.

Kolmogorov–Smirnov (K–S) criterion: Assuming a binary target attribute, namely, dom(y) = {c₁, c₂}, the criterion is defined as

(2.21)

It was suggest extending this measure to handle target attributes with multiple classes and missing data values. The results indicate that the suggested method outperforms the gain ratio criteria.

Stopping criteria: The tree splitting (growing phase) continues until a stopping criterion is triggered. The following conditions are common stopping rules: (i) all instances in the training set belong to a single value of y, (ii) the maximum tree depth has been reached, (iii) the number of cases in the terminal node is less than the minimum number of cases for the parent nodes, and (iv) if the node were split, the number of cases in one or more child nodes would be less than the minimum number of cases for child nodes. The best splitting criteria is not greater than a certain threshold.

Pruning methods: Employing tightly stopping criteria tends to create small and underfitted decision trees. On the other hand, using loosely stopping criteria tends to generate large decision trees that are overfitted to the training set. Pruning methods were developed to solve this dilemma. There are various techniques for pruning decision trees. Most of them perform top‐down or bottom‐up traversal of the nodes. A node is pruned if this operation improves a certain criterion. Next, we describe the most popular techniques.

Cost‐complexity pruning (pr): This proceeds in two stages. In the first stage, a sequence of trees T₀, T₁, … T_k is built on the training data, where T₀ is the original tree before pruning and T_k is the root tree. In the second stage, one of these trees is chosen as the pruned tree, based on its generalization error estimation. The tree T_{i + 1} is obtained by replacing one or more of the subtrees in the predecessor tree T_i with suitable leaves. The subtrees that are pruned are those that obtain the lowest increase in apparent error rate per pruned leaf (l):

(2.22) alpha equals StartFraction epsilon left-parenthesis italic p r left-parenthesis upper T comma t right-parenthesis comma upper S right-parenthesis minus epsilon left-parenthesis upper T comma upper S right-parenthesis Over bar l left-parenthesis upper T right-parenthesis bar minus bar l left-parenthesis italic p r left-parenthesis upper T comma t right-parenthesis right-parenthesis bar EndFraction

where ε(T, S) indicates the error rate of the tree T over the sample S, and ∣l(T)∣ denotes the number of leaves in T. The parameter pr(T, t) denote the tree obtained by replacing the node t in T with a suitable leaf. In the second phase, the generalization error of each pruned tree T₀, T₁, … T_k is estimated. The best pruned tree is then selected. If the given dataset is large enough, it is suggested to break it into a training set and a pruning set. The trees are constructed using the training set and evaluated on the pruning set. On the other hand, if the given dataset is not large enough, the cross‐validation methodology is suggested, despite the computational complexity implications.

Reduced error pruning: While traversing the internal nodes from the bottom to the top, the procedure checks, for each internal node, whether replacing it with the most frequent class reduces the tree’s accuracy. If not, the node is pruned. The procedure continues until any further pruning would decrease the accuracy. It can be shown that this procedure ends with the smallest accurate subtree with respect to a given pruning set.

Minimum‐error pruning (MEP): It performs bottom‐up traversal of the internal nodes. In each node, it compares the l‐probability‐error rate estimation with and without pruning. The l‐probability‐error rate estimation is a correction to the simple probability estimation using frequencies. If S_t denote the instances that have reached node t, then the error rate obtained if this node was pruned is

(2.23)

where p_apr(y = c_i) is the a priori probability of y getting the value c_i, and denotes the weight given to the a priori probability. A node is pruned if it does not increase the m probability‐error rate.

Pessimistic pruning: The basic idea is that the error ratio estimated using the training set is not reliable enough. Instead, a more realistic measure known as continuity correction for binomial distribution should be used: ε′(T, S) = ε(T, S) + ∣ l(T) ∣ /2 · ∣ S∣. However, this correction still produces an optimistic error rate. Consequently, it was suggested to prune an internal node t if its error rate is within one standard error from a reference tree, namely

(2.24) Скачать книгу