The second term in the right‐hand side of the above equation is
4.9 Principal Component Analysis
All suboptimal transforms such as the DFT and DCT decompose the signals into a set of coefficients, which do not necessarily represent the constituent components of the signals. Moreover, the transform kernel is independent of the data hence they are not efficient in terms of both decorrelation of the samples and energy compaction. Therefore, separation of the signal and noise components is generally not achievable using these suboptimal transforms.
Expansion of the data into a set of orthogonal components certainly achieves maximum decorrelation of the signals. This enables separation of the data into the signal and noise subspaces.
Figure 4.12 The general application of PCA.
For a single‐channel EEG the Karhunen–Loéve transform is used to decompose the ith channel signal into a set of weighted orthogonal basis functions:
(4.117)
where Φ = {ϕk } is the set of orthogonal basis functions. The weights wi, k are then calculated as:
(4.118)
Often noise is added to the signal, i.e. xi (n) = si (n) + vi (n), where vi (n) is additive noise. This degrades the decorrelation process. The weights are then estimated in order to minimize a function of the error between the signal and its expansion by the orthogonal basis, i.e. e i = xi − Φwi . Minimization of the error in this case is generally carried out by solving the least‐squares problem. In a typical application of PCA as depicted in Figure 4.12, the signal and noise subspaces are separated by means of some classification procedure.
4.9.1 Singular Value Decomposition
Singular value decomposition (SVD) is often used for solving the least‐squares (LS) problem. This is performed by decomposition of the M × M square autocorrelation matrix R into its eigenvalue matrix Λ = diag(λ1, λ2, … λ M ) and an M × M orthogonal matrix of eigenvectors V, i.e. R = VΛVH , where (.) H denotes Hermitian (conjugate transpose) operation. Moreover, if A is an M × M data matrix such that R = AH A then there exist an M × M orthogonal matrix U, an M × M orthogonal matrix V, and an M × M diagonal matrix ∑ with diagonal elements equal to
(4.119)
Hence ∑ 2 = Λ. The columns of U are called left singular vectors and the rows of VH are called right singular vectors. If A is rectangular N × M matrix of rank k then U will be N × N and ∑ will be:
(4.120)
where S = diag(σ1, σ2, … σ k ), where σ i =
(4.121)
where ∑ † is an M × N matrix defined as:
(4.122)
A † has a major role in the solutions of least‐squares problems, and S −1 is a k × k diagonal matrix with elements equal to the reciprocals of the singular values of A, i.e.
(4.123)
In order to see the application of the SVD in solving the LS problem consider the error vector e defined as:
where d is the desired signal vector and Ah is the estimate
(4.125)
or equivalently
(4.126)
Since U is a unitary matrix, ‖e 2‖ = ‖UH e ‖2. Hence, the vector h that minimizes ‖e 2‖ also minimizes ‖UH e ‖2. Finally, the unique solution as an optimum h (coefficient vector) may be expressed as [43]:
(4.127)
where k is the rank of A. Alternatively, as the optimum least‐squares coefficient vector:
(4.128)
Performing PCA is equivalent to performing an SVD on the covariance matrix. PCA uses the same concept as SVD and orthogonalization to decompose the data into its constituent uncorrelated orthogonal components such that the autocorrelation matrix is diagonalized. Each eigenvector represents a principal component and the individual eigenvalues are numerically related to the variance they capture in the direction of the principal components.