2.2.1 Matching pursuit algorithm
For the purpose of finding a least set of atoms and determining the corresponding coefficients to represent a signal, the essence of the MP algorithm aims to find the atom with the highest correlation with the signal and removes the component of this atom from the signal. This process is repeated iteratively until the stop condition is met. More specifically, let us take a closer look at the steps to use MP for signal representation. k represents the index of iteration number, yk=Dαk is the current approximation, the signal is denoted as y, and the current residual is Rky:
Initialization: R0y=yy0=0k=1
Step 1. Calculate the inner product of the signal y with each column (atoms) in the dictionary D, denoted as ⟨R0y,dk⟩k.
Step 2. Find an atom in the dictionary which yields the maximum inner product satisfying ⟨R0y,dr0⟩⩾supj∈(1,…,K)⟨R0y,dj⟩, where r0 is the column index of the corresponding atom in D.
Hence, in the first iteration, an intermediate representation of the single y0 consists of two parts, as follows:
y0=⟨R0y,dr0⟩dr0+R1y.(2.6)
In this equation, the first part is the orthogonal projection of the most matching atom dr0 on the span of ⟨R0y,dr0⟩, and the second part is the residue.
Step 3. Increment k, repeat steps 1 and 2 for the kth residue Rky.
So, in the kth iteration, the signal is represented as follows:
Rky=〈Rky,drk+1〉drk+1+Rk+1y.(2.7)
Step 4. Stop the iterations until the residue satisfies the given convergence criterion. The signal is finally approximated as y≈∑n=0k⟨Rny,drn⟩drn.
Let us take a very simple example to explain the searching process for the MP algorithm. Suppose that the observed data are y=(−1,−2)T and a dictionary D=1.3−100.30.8−1, each column of which is one atom. We also set the coding errors Er=0.8. Figure 2.2 shows the intuitive process of the MP algorithm to find optimal α to represent y, which contains four steps. Under such a coding error vector, the observed data y are finally represented as y≈−1.3·d2+2·d3 by using two of three atoms in the dictionary.
Figure 2.2. An example to illustrate the searching process of the MP algorithm when observed signal y and a dictionary D are given, rk denotes the residue of the kth iteration.
One of the problems with MP is that it may pick up the same atom multiple times. Orthogonal matching pursuit (OMP) (Pati et al 1993) corrects this issue by updating all the nonzero coefficients according to a least-squares criterion in each step, which makes the residual orthogonal to the chosen atoms. It means that each atom would not be picked more than once and hence this strategy results in a faster convergence.
So, at the kth iteration, the OMP gives the signal representation as follows:
y=∑n=1kαnkdn+Rky,with⟨Rky,dn⟩=0,n=1,…,k.(2.8)
Still, OMP is a greedy approach that iteratively finds the locally optimal choice in hope of approximating the global minimum. This is a reasonable compromise for the sparse approximation problem, given that the global solution for the NP-hard problem is practically unreachable. Instead, sequentially refining the entries in α to reduce the approximation error is very computationally efficient. Various extensions were proposed to accelerate the convergence of OMP (such as compressive sampling OMP (CoSaMP) (Needell and Tropp 2009), regularized OMP (ROMP) (Needell and Vershynin 2010), and stagewise OMP (StOMP) (Donoho et al 2012)).
Currently, sparse approximation algorithms are widely used in image processing, audio processing, machine learning, and many other fields. An unknown signal is usually modeled as a sparse combination of a few atoms in a given dictionary. The use of sparsity-inspired models often yields state-of-the-art results in a variety of applications.
2.2.2 The K-SVD algorithm
As discussed before, OMP works well for a fixed dictionary. On the other hand, it should work better if we optimize the dictionary to fit the data, which brings us to the second problem, i.e. how to train a dictionary from data. K-SVD is a classical dictionary learning algorithm for designing an over-complete dictionary for a sparse representation via a singular value decomposition. It was first introduced by Aharon et al (2006). The kernel function of K-SVD can be regarded as a direct generalization of the K-means, in which each input signal is allowed to be represented by a linear combination of dictionary atoms.
Theoretically an over-complete dictionary D∈RM×K can be directly chosen as a pre-specified transform matrix, such as over-complete wavelets, curvelets, steerable wavelet filters, short-time Fourier transforms, and others. However, dictionary learning driven by training data is a novel route, and has attracted major attention recently due to its amazing performance in terms of both the effectiveness and efficiency of the resultant sparse representation. Specifically, given the training signals Y∈RM×N that contain N samples, the goal is to find the dictionary D∈RM×K that can be used for sparse representation of Y with a coefficient matrix φ∈RK×N. It is noted that different from equation (2.4) in which the y and α are the column vectors, Y and φ are both matrices, representing a number of vectorized samples together to train the dictionary, which is expressed as
minD,φ∥Y−Dφ∥F2s.t.φi0⩽ε∀i.(2.9)
It is also a nonconvex problem over the dictionary D and sparse coefficients φ. Correspondingly, the measure metric on the representation residue ∥·∥F2 is called the Frobenius-norm, which calculates the sum of squares of all the elements of the matrix. Moreover, finding a sparse code and constituting a dictionary for a sparse representation have to be performed in a coordinated fashion. Next, let us introduce K-SVD as a typical example. The objective function of K-SVD is formulated as
minD,φ{∥Y−Dφ∥F2}s.t.∀i,φi0⩽T0.(2.10)
The K-SVD algorithm consists of two steps, namely, sparse coding and dictionary updating.
Step 1: The sparse coding stage.
Before the sparse coding procedure, an initial dictionary is generated by a traditional data sparsifier, such as an over-complete discrete cosine transform (DCT) dictionary. Then, we can seek the sparse coding vector φ for the current dictionary. Figure 2.3 depicts the sparse coding stage, in which red rectangles at the φ∈RK×N matrix present the coding of selected atoms. Usually, the OMP algorithm is used to compute the coefficients, as long as it can supply a solution with a predetermined number of nonzero entries T0. The OMP method has been discussed in the previous section; here we will focus on the next stage.
Figure 2.3. The sparse coding stage of the dictionary learning method.
Step 2: The dictionary updating stage.
In this stage, the total number of K atoms in the dictionary is updated independently with φ fixed. In practice, the K atoms are updated one by one, which means that in one step, only one atom