Univariate variance is a second‐order statistical measure of the departure of the input observations with respect to the sample mean. A generalization of the univariate variance to multivariate variables is the trace of the input covariance matrix. By choosing the m largest eigenvalues of the covariance matrix Cx, we guarantee that we are making a representation in the feature space explaining as much variance of the input space as possible with only m variables. As already indicated in Section 2.1, in fact, w1 is the direction in which the data exhibit the largest variability, w2 is the direction with largest variability once the variability along w1 has been removed, w3 is the direction with largest variability once the variability along w1 and w2 has been removed, and so on. Thanks to the orthogonality of the wi vectors, and the subsequent decorrelation of the feature vectors, the total variance explained by PCA decomposition can be conveniently measured as the sum of the variances of each feature,
References
1 1 Robert Nau, Statistical forecasting: notes on regression and time series analysis, https://people.duke.edu/~rnau/regintro.htm (accessed 12 May 2021).
2 2 https://www.analyticsvidhya.com/blog/2016/01/xgboost‐algorithm‐easy‐steps
3 3 PulkitS01. K‐Means implementation. GitHub, Inc. https://gist.github.com/PulkitS01/97c9920b1c913ba5e7e101d0e9030b0e
5 5 https://iq.opengenus.org/newton‐raphson‐method
6 6 https://www.math.ubc.ca/~anstee/math104/newtonmethod.pdf
7 7 http://mathforcollege.com/nm/mws/gen/03nle/mws_gen_nle_txt_newton.pdf
8 8 Rokach, L. and Maimon, O. (2005). Top‐down induction of decision trees classifiers – a survey. IEEE Trans. Syst., Man, Cybernet. – Part C: Appl. Rev. 35 (4): 476–487.
9 9 Safavin, S.R. and Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21 (3): 660–674.
10 10 Murthy, S.K. (1998). Automatic construction of decision trees from data: a multidisciplinary survey. Data Min. Knowl. Discovery 2 (4): 345–389.
11 11 Kohavi, R. and Quinlan, J.R. (2002, ch. 16.1.3). Decision‐tree discovery. In: Handbook of Data Mining and Knowledge Discovery (eds. W. Klosgen and J.M. Zytkow), 267–276. London, U.K.: Oxford University Press.
12 12 Hastie, T. Trees, Bagging, Random Forests and Boosting. Stanford University lecture notes.
13 13 Hancock, T.R., Jiang, T., Li, M., and Tromp, J. (1996). Lower bounds on learning decision lists and trees. Inf. Comput. 126 (2): 114–122.
14 14 Hyafil, L. and Rivest, R.L. (1976). Constructing optimal binary decision trees is NP‐complete. Inf. Process. Lett. 5 (1): 15–17.
15 15 Zantema, H. and Bodlaender, H.L. (2000). Finding small equivalent decision trees is hard. Int. J. Found. Comput. Sci. 11 (2): 343–354.
16 16 Naumov, G.E. (1991). NP‐completeness of problems of construction of optimal decision trees. Sov. Phys. Dokl. 36 (4): 270–271.
17 17 Bratko, I. and Bohanec, M. (1994). Trading accuracy for simplicity in decision trees. Mach. Learn. 15: 223–250.
18 18 Almuallim, H. (1996). An efficient algorithm for optimal pruning of decision trees. Artif. Intell. 83 (2): 347–362.
19 19 Rissanen, J. (1989). Stochastic Complexity and Statistical Inquiry. Singapore: World Scientific.
20 20 Quinlan, J.R. and Rivest, R.L. (1989). Inferring decision trees using the minimum description length principle. Inf. Comput. 80: 227–248.
21 21 Mehta, R.L., Rissanen, J., and Agrawal, R. (1995). Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, pp. 216–221.
22 22 Dash, M. and Liu, H. (1997). Feature selection for classification. Intell. Data Anal. 1: 131–156.
23 23 Guyon, I. and Eliseeff, A. (2003). An introduction to variable and feature selection. J. Mach. Learn. Res. 3: 1157–1182.
24 24 Saeys, Y., Inza, I., and Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics 23: 2507–2517.
25 25 Rioul, O. and Vetterli, M. (1991). Wavelets and signal processing. IEEE Signal Process. Mag. 8: 14–38.
26 26 Graps, A. (1995). An introduction to wavelets. IEEE Comput. Sci. Eng. 2: 50–61.
27 27 Huang, H.E., Shen, Z., Long, S.R. et al. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non‐stationary time series analysis. Proc. R. Soc. Lond. A 454: 903–995.
28 28 Rilling, G.; Flandrin, P. & Goncalves, P. On empirical mode decomposition and its algorithms Proc. IEEE‐EURASIP Workshop on Nonlinear Signal and Image Processing, 2003
29 29 Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Ann. Eugen. 7: 179–188.
30 30 Fodor, I.K. (2002). A Survey of Dimension Reduction Techniques. Lawrence Livermore National Laboratory.
31 31 Bian, W. and Tao, D. (2011). Max‐min distance analysis by using sequential SDP relaxation for dimension reduction. IEEE Trans. Pattern Anal. Mach. Intell. 33: 1037–1050.
32 32 Cai, H., Mikolajczyk, K., and Matas, J. (2011). Learning linear discriminant projections for dimensionality reduction of image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 33: 338–352.
33 33 Kim, M. and Pavlovic, V. (2011). Central subspace dimensionality reduction using covariance operators. IEEE Trans. Pattern Anal. Mach. Intell. 33: 657–670.
34 34 Lin, Y.Y., Liu, T.L., and Fuh, C.S. (2011). Multiple kernel learning for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell.