(4.30)
where βi,j,k denotes βi, j applied to document k and
The numerator of S1 denotes the average contribution of the phrase to the prediction of class 1 across all occurrences of the phrase. The denominator denotes the same statistic, but for class 2. Thus, if S1 is high, then w1, …, wk is a strong signal for class 1, and likewise for S2 . It was proposed [93] to use S as a score function in order to search for high‐scoring representative phrases that provide insight into the trained LSTM, and C to denote the class corresponding to a phrase.
In practice, the number of phrases is too large to feasibly compute the score for all of them. Thus, we approximate a brute force search through a two‐step procedure. First, we construct a list of candidate phrases by searching for strings of consecutive words j with importance scores βi, j > c for any i and some threshold c. Then, we score and rank the set of candidate phrases, which is much smaller than the set of all phrases.
Rules‐based classifier: The extracted patterns from Section 4.1 can be used to construct a simple rules‐based classifier that approximates the output of the original LSTM. Given a document and a list of patterns sorted by descending score given by S, the classifier sequentially searches for each pattern within the document using simple string matching. Once it finds a pattern, the classifier returns the associated class given by C, ignoring the lower‐ranked patterns. The resulting classifier is interpretable, and despite its simplicity, retains much of the accuracy of the LSTM used to build it.
4.4 Accuracy and Interpretability
This section focuses on the accuracy and interpretability trade‐off in fuzzy model‐based solutions. A fuzzy model based on an experience‐oriented learning algorithm is presented that is designed to balance the trade‐off between both of the above aspects. It combines support vector regression (SVR) to generate the initial fuzzy model and the available experience on the training data and standard fuzzy model solution.
Fuzzy systems have been used for modeling or control in a number of applications. They are able to incorporate human knowledge, so that the information mostly provided for many real‐world systems could be discovered or described by fuzzy statements. Fuzzy modeling (FM) considers model structures in the form of fuzzy rule‐based systems and constructs them by means of different parametric system identification techniques. In recent years, the interest in data‐driven approaches to FM has increased. On the basis of a limited training data set, fuzzy systems can be effectively modeled by means of some learning mechanisms, and the fuzzy model after learning tries to infer the true information. In order to assess the quality of the obtained fuzzy models, there are two contradictory requirements: (i) interpretability, the capability to express the behavior of the real system in a way that humans can understand, and (ii) accuracy, the capability to faithfully represent the real system. In general, the search for the desired trade‐off is usually performed from two different perspectives, mainly using certain mechanisms to improve the interpretability of initially accurate fuzzy models, or to improve the accuracy of good interpretable fuzzy models. In general, improving interpretability means reducing the accuracy of initially accurate fuzzy models.
It is well known that support vector machine (SVM) has been shown to have the ability of generalizing well to unseen data, and giving a good balance between approximation and generalization. Thus, some researchers have been inspired to combine SVM with FM in order to take advantage of both of approaches: human interpretability and good performance. Therefore, support vector learning for FM has evolved into an active area of research. Before we discuss this hybrid algorithm‚ we discuss separately the basics of the FM and SVR approach.
4.4.1 Fuzzy Models
A descriptive (linguistic) fuzzy model captures qualitative knowledge in the form of if‐then rules [106]:
Here,