∥Y−Dφ∥F2=Y−∑j=1KdjφTjF2=Y−∑j≠kdjφTj−dkφTkF2=Ek−dkφTkF2.(2.11)
Figure 2.4. Dictionary update stage. (1) Independent update of each column of the dictionary and (2) shrink operation.
In this equation, D=d1,d2,…,dK∈RM×K contains K atoms, and Ek stands for the approximation error when atom dk is removed from the dictionary. With this variable separation, Dφ is decomposed into the K matrices with a rank of 1 (denoted as rank-1), in which only the kth column remains in question for updating.
Based on the refined objective function in equation (2.11), whose goal is to search for the closest dk with its coefficient φTk to maximally approximate Ek, the SVD algorithm could be a straightforward method to solve the problem of updating only one column of D and φTk. The SVD decomposition is done with a factorization of the form Ek=UΔV⊤, where both U and V are an orthogonal matrix, and Δ is a diagonal matrix. The key point of the SVD method in this study is that only a few principal components in the diagonal matrix can closely approximate the matrix. Based on this, dk is taken as the first column of U while φTk is taken as the first column of V multiplied by the first element of the diagonal matrix Δ denoted as Δ(1,1).
However, directly decomposing Ek by SVD for the update of dk and the corresponding φTk cannot ensure the sparsity of the updated φTk, because there is no such sparsity control regularizer in SVD (Sahoo and Makur 2013); in other words, the position and value of the zero elements in the update φTk may change. To address the loss of sparsity issue, K-SVD considers those columns of Ek by extracting the corresponding φTk that is nonzero, and obtain a new Ek, denoted as E¯k. A simple solution is to restrict Ek to obtain by a shrink operation E¯k=EkΩk, whose process is illustrated in figure 2.4 by a shrink operation Ωk. In particular, Ωk is defined as a matrix to restrict the Ek by discarding the zero entries. Similarly, we manage to obtain the φ¯Tk by φ¯Tk=φTkΩk, as well. Hence, by rewriting the objective function in equation (2.11), we have an expected dˆk and the corresponding φˆTk:
dˆk,φˆTk=arg mindk,φTkEkΩk−dkφTkΩkF2=arg mindk,φ¯TkE¯k−dkφ¯TkF2.(2.12)
In order to solve it by approximating the E¯k term with a rank-1 matrix, the SVD method is suggested to find alternative dk and φ¯Tk. In detail, E¯k is SVD decomposed into UΔV⊤, setting dk to the first column of the matrix U and φ¯Tk to the first column of V multiplied by Δ(1,1), and updating the whole dictionary, the process solves φ and D iteratively. Once all the K atoms of D are updated column by column in the same fashion to obtain an expected dictionary, we fix this dictionary and go to the sparse coding stage until it reaches the stopping condition. It is noted that the K-SVD algorithm is flexible to use other methods in the sparse coding stage. The workflow of the K-SVD algorithm can be summarized in algorithm 2.1.
Algorithm 2.1 The K-SVD algorithm for dictionary learning. Reprinted with permission from Aharon et al (2006). Copyright 2019 IEEE.
Input: A set of training data Y={yi}i=1N∈RM×N |
Output: An over-complete dictionary D∈RM×K and a sparse coding vector φ∈RK×N |
1: Initialize the dictionary D with K randomly selected training signals |
2: while not converge do |
3: Sparse coding: |
4: for each training signal yi∈Y, use OMP to compute the representation vectors φi, i=1,2,…,N: do |
5: minαyi−DφiF2 s.t. ∀i,φi0⩽T0 |
6: end for |
7: Dictionary updating: |
8: for k = 1, 2,…,K update the kth atom dk of D and the kth row of the coding coefficients φTk: do |
9: Compute the representation error matrix: Ek=Y−∑j≠kdjφTj |
10: Obtain E¯k and φ¯Tk by discarding zero entries with a shrink operation E¯k=EkΩk and φ¯Tk=φTkΩk |
11: Apply SVD decomposition E¯k=UΔVT. Update the atom dk with the first column of U. The corresponding coefficient vector φ¯Tk is together updated with the first column of V multiplied by Δ(1,1) |
12: end for |
13: end while |
Despite the fact that the K-SVD algorithm converges quickly, it is still computationally expensive as the SVD decomposition must be performed K times, and all N training signals are used for sparse coding at each iteration. This task demands a heavy memory use, in particular when the set of training data is large. Some modifications to the original algorithm were presented that alleviate the computational challenge; for example, the use of batch-OMP for K-SVD presented by Rubinstein et al (2008). In the next section, we will introduce an application of the dictionary learning approach for CT reconstruction.
2.3 CT reconstruction via dictionary learning
It is suggested that dictionary learning for a sparse representation effectively mimics the HVS perceiving structural/semantic information in natural images. In this section, we will use low-dose CT reconstruction as an example to illustrate how the dictionary learning method is applied in the CT field. This case is described in two parts. In the first part, we will introduce a statistic iterative reconstruction framework (SIR) extensively used in the CT field, which exemplifies the application of the Bayesian approach for CT imaging. In the second part, based on SIR we will illustrate the application of dictionary learning for CT reconstruction to preserve subtle features in reconstructed images from low x-ray dose data. This example will show the power of dictionary learning as a machine learning technique.
2.3.1 Statistic iterative reconstruction framework (SIR)
X-ray photons recorded by the detectors can be regarded as evidence. A reconstructed image can be seen as a hypothesis. In the Bayesian framework, given our statistical knowledge and evidence, we should make a hypothesis that is most reasonable/plausible.
X-ray photons are emitted from an x-ray source. During their propagation, they interact with tissues inside a patient. As a result, some x-ray photons are absorbed or re-directed, while the rest of them penetrate the patient, reaching an x-ray detector. Assuming a monochromatic x-ray beam, Beer’s law is expressed as
Ii=I0e−∫Lix(l)dl,(2.13)
where Ii represents the recorded x-ray intensity at the ith element of the detector, I0 is the initial flux, and x is the linear attenuation coefficient along the ith x-ray path Li from which the intensity Ii is produced. This equation can be converted into the line integral and is discretized as
bi=∑j=1Naijxj,(2.14)