36 36 Baraniuk, R.G., Cevher, V., and Wakin, M.B. (2012). Low‐dimensional models for dimensionality reduction and signal recovery: a geometric perspective. Proc. IEEE 98: 959–971.
37 37 Gray, R.M. (1984). Vector quantization IEEE acoustics. Speech Signal Process. Mag. 1: 4–29.
38 38 Gersho, A. and Gray, R.M. (1992). Vector quantization and signal compression. Kluwer Academic Publishers.
39 39 Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philos. Mag. 2: 559–572.
40 40 Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component analysis. Chemom. Intel. Lab. Syst. 2: 37–35.
41 41 Dunteman, G.H. (1989). Principal Component Analysis. Sage Publications.
42 42 Jollife, I. T. Principal Component Analysis Wiley, 2002
43 43 Hÿvarinen, A., Karhunen, J., and Oja, E. (2001). Independent Component Analysis. Wiley.
44 44 He, R., Hu, B.‐G., Zheng, W.‐S., and Kong, X.‐W. (2011). Robust principal component analysis based on maximum correntropy criterion. IEEE Trans. Image Process. 20: 1485–1494.
45 45 Li, X.L., Adali, T., and Anderson, M. (2011). Noncircular principal component analysis and its application to model selection. IEEE Trans. Signal Process. 59: 4516–4528.
46 46 Sorzano, C.O.S., Vargas, J., and Pascual‐Montano, A. A survey of dimensionality reduction techniques. https://arxiv.org/ftp/arxiv/papers/1403/1403.2877.pdf
47 47 Jenssen, R. (2010). Kernel entropy component analysis. IEEE Trans. Pattern Anal. Mach. Intell. 32: 847–860.
48 48 https://medium.com/machine‐learning‐101/chapter‐2‐svm‐support‐vector‐machine‐theory‐f0812effc72
49 49 https://cgl.ethz.ch/teaching/former/vc_master_06/Downloads/viscomp‐svm‐clustering_6.pdf
50 50 https://people.duke.edu/~rnau/regintro.htm
51 51 https://discuss.analyticsvidhya.com/t/what‐does‐min‐samples‐split‐means‐in‐decision‐tree/6233
52 52 https://medium.com/@mohtedibf/indepth‐parameter‐tuning‐for‐decision‐tree‐6753118a03c3
53 53 https://gist.github.com/PulkitS01/97c9920b1c913ba5e7e101d0e9030b0e
3 Artificial Neural Networks
3.1 Multi‐layer Feedforward Neural Networks
3.1.1 Single Neurons
A biological [1] and mathematical model of a neuron can be represented as shown in Figure 3.1 with the output of the neuron modeled as
where xi are the inputs to the neuron, wi are the synaptic weights, and wb models a bias. In general, f represents the nonlinear activation function. Early models used a sign function for the activation. In this case, the output y would be +1 or −1 depending on whether the total input at the node s exceeds 0 or not. Nowadays, a sigmoid function is used rather than a hard threshold. One should immediately notice the similarity of Eqs. (3.1) and (3.2) with Eqs. (2.1) and (2.2) defining the operation of a linear predictor. This should suggest that in this chapter we will take the problem of parameter estimation to the next level. The sigmoid, shown in Figure 3.1, is a differentiable squashing function usually evaluated as y = tanh (s). This engineering model is an oversimplified approximation to the biological model. It neglects temporal relations. This is because the goals of the engineer differ from that of the neurobiologist. The former must use the models feasible for practical implementation. The computational abilities of an isolated neuron are extremely limited.
For electrical engineers, the most popular applications of single neurons are in adaptive finite impulse response (FIR) filters. Here,
Multi‐layer neural networks: A neural network is built up by incorporating the basic neuron model into different configurations. One example is the Hopfield network, where the output of each neuron can have a connection to the input of all neurons in the network, including a self‐feedback connection. Another option is the multi‐layer feedforward network illustrated in Figure 3.2. Here, we have layers of neurons where the output of a neuron in a given layer is input to all the neurons in the next layer. We may also have sparse connections or direct connections that may bypass layers. In these networks, no feedback loops exist within the structure. These network are sometimes referred to as backpropagation networks.
Figure 3.1 From biological to mathematical simplified model of a neuron.
Source: CS231n Convolutional Neural Networks for Visual Recognition [1].
Figure 3.2 Block diagram of feedforward network.
Notation: A single neuron extracted from the l‐th layer of an L‐layer network is also depicted in Figure 3.2. Parameters