Based on equation (1.2), the iterative algorithm always targets a criterion, such as minimizing the difference between the measured data and calculated data. One approach is obtained by minimizing the functional:
f(x)=Ax−b22.(1.3)
With this functional, the solution is updated iteratively until it converges. Landweber iteration is among the first deterministic iterative methods proposed to solve the inverse problem. It is the fastest descent method to minimize the residual error. The Newton method based on the first and second derivatives can accelerate the search process. The conjugate gradient method is somehow between the gradient descent method and the Newton method. It addresses the slow convergence of the gradient descent method and avoids the need to calculate and store the second derivative required by the Newton method (Bakushinsky and Kokurin 2004, Landweber 1951). For more details, read further on the Bayesian approach to inverse problems.
Here, we illustrate the iterative algorithm that solves the inverse problem in the simplest way: the gradient descent method. The main idea behind the gradient descent search is to move a certain distance at a time in the direction opposite to the current gradient which is equivalent to updating an intermediate image as follows:
xk+1=xk−α∇f(xk),(1.4)
where α is the step size controlling the step size in each iteration. The update is repeatedly performed until a stopping condition is met. Then, the optimal result is obtained.
It is inevitable that the measurement data always contain noise or error in practice, which means that the inverse problem can be denoted as
b˜=Ax+n,(1.5)
where n denotes the data noise or error. As presented above, the CT measurement always contains noise and error introduced by non-idealities of the physical imaging model, which may cause inaccuracy and instability of CT image reconstruction. In this situation, the inverse problem may not be uniquely solvable, and depends sensitively on the measurement error and the initialized image as the starting point of the iterative process, given its ill-posed property. Hence, how to compute a practically acceptable solution is the critical issue of any ill-posed inverse problem.
If we still use the same least squares optimization method to find a solution, the minimum of f(x) will only minimize the average energy of noise or error. Thus, there would be a set of solutions all of which are consistent with the measured data. How can we choose the good ones among all these solutions? Clearly, we need additional information to narrow the space of solutions. In other words, the more such prior information/knowledge we have, the smaller the space of solutions, and the greater the chance we can recover the ground truth better (Bertero and Boccacci 1998, Dashti and Stuart 2017, Stuart 2011).
Fortunately, profound yet elegant solutions have been found to settle this problem. Bayesian inference is one of the most important approaches for solving inverse problems. Instead of directly minimizing the error between the calculated and real tomographic data, the Bayesian approach takes the prior knowledge into consideration. Bayesian inference derives the posterior probability from a prior probability and a ‘likelihood function’ derived from a statistical model for the measured data.
Bayesian inference computes the posterior probability according to Bayes’ theorem:
P(x∣b)=P(b∣x)P(x)P(b),(1.6)
where P(x) is the prior probability, an estimate of the probability of the hypothesis x before the data, the current evidence b, is measured. P(x∣b) is the probability of a hypothesis x given the observed b. P(b∣x) is the probability of b given x, which is usually called the likelihood function. P(b) is sometimes termed the marginal likelihood or ‘model evidence’. It is a constant after the data are measured. In Bayesian inference, the recovery of hidden varibles can be simply achieved by maximizing the posterior probability (MAP):
x^MAP=argmaxxP(x∣b).(1.7)
According to Bayes’ theorem, we can rewrite this formula as
x^MAP=argmaxxP(b∣x)P(x).(1.8)
In optimization theory, applying a monotone function to the objective function does not change the result of optimization. Hence, we can apply a logarithmic operation to separate the first and second term:
xˆMAP=arg maxx(log(P(b∣x))+log(P(x))).(1.9)
In the application of Bayesian inference to solve the inverse problem, Lagrangian optimization is extensively used, and the above objective function can be presented as
F(x)=ϕ(x,b˜)+λψ(x).(1.10)
The first term is the penalty term to measure the data fitting error corresponding to the likelihood function. According to different data models, the first term can be specialized into 12∥Ax−b˜∥22, ∥Ax−b˜∥1, ∫(Ax−b˜lnAx)dx, which are statistically well suited to additive Gaussian noise, impulsive noise, and Poisson noise, respectively. The second term can be interpreted as the regularization functional, which is the the prior probability P(x) in Bayesian inference. Just like the first term, the second term can be also specialized according to the statistical model for x. λ is the regularization parameter which balances these two terms For tomographic imaging in particular, the first term is the logarithmic likelihood term, which mainly describes the statistical distribution of the original measured data, and the second item is the prior information item, which is on some prior distribution of images to be reconstructed.
Perceiving the prior information of an image means that we should extract as much information from the image as possible. In an extreme case, if we have perfect knowledge of the image, the reconstruction process is no longer needed since we have already known everything about the image. In common cases, an image cannot be perfectly known before it is reconstructed. Practically, general information of images can be extracted and then assumed, and this prior information in turn can constrain the candidate images allowed as a reconstruction outcome. In fact, an image as a high-dimensional variable is a point in a high-dimensional space, and natural images just occupy a very small portion of this high-dimensional space, although they vary greatly with dramatically different content. This phenomenon implies that we can represent natural images by exploring their intrinsic distribution properties and utilizing them for image reconstruction. Generally speaking, the intrinsic properties exhibit themselves as correlations or redundancies among image regions, obeying some structured distributions in gray scales or colors.
In Bayesian inference, solving the inverse problem uses intrinsic distribution properties to narrow the search space of unknown variables. These properties also have the ability to suppress image noise or measurement error in the inversion process because the error will disturb the intrinsic properties. Hence, how to extract the intrinsic distribution properties and how to use them for an inverse problem solution are two key aspects of Bayesian inference. With natural image statistics, these key questions can be answered. Natural image statistics is a discipline to figure out the natural image statistics based on a number of statistical models whose parameters are estimated from image samples. This is widely used in image processing fields. In a simple form, natural images can be regarded as a linear combination of features or intrinsic properties. Rather than directly