where F is the matrix of size nd × nm associated with the linear operator f. A common approach to find the solution of the inverse problem associated with Eq. (1.47) is to estimate the model m that gives the minimum misfit between the data d and the theoretical predictions
The model m* that minimizes the L2‐norm is called the least‐squares solution because it minimizes the sum of the squares of the differences of measured and predicted data, and it is given by the following equation, generally called the normal equation (Aster et al. 2018):
(1.49)
If we consider the data points to be imperfect measurements with random errors, the inverse problem associated with Eq. (1.47) can be seen, from a statistical point of view, as a maximum likelihood estimation problem. Given a model m, we assign to each observation di a PDF fi(di∣m) for i = 1, … , nd and we assume that the observations are independent. The joint probability density of the vector of independent observations d is then:
The expression in Eq. (1.50) is generally called likelihood function. In the maximum likelihood estimation, we select the model m that maximizes the likelihood function. If we assume a discrete linear inverse problem with independent and Gaussian distributed data errors (
and the maximization of Eq. (1.51) is equivalent to the minimization of Eq. (1.48) (Tarantola 2005; Aster et al. 2018).
The L2‐norm is not the only misfit measure that can be used in inverse problems. For example, to avoid data points inconsistent with the chosen mathematical model (namely the outliers), the L1‐norm is generally preferable to the L2‐norm. However, from a mathematical point of view, the L2‐norm is preferable because of the analytical tractability of the associated Gaussian distribution.
In science and engineering applications, many inverse problems are not linear; therefore, the analytical solution of the inverse problem might not be available. For non‐linear inverse problems, several mathematical algorithms are available, including gradient‐based deterministic methods, such as Gauss–Newton, Levenberg–Marquardt, and conjugate gradient; Markov chain Monte Carlo methods, such as Metropolis, Metropolis Hastings, and Gibbs sampling; and stochastic optimization algorithms, such as simulated annealing, particle swarm optimization, and genetic algorithms. For detailed descriptions of these methods we refer the reader to Tarantola (2005), Sen and Stoffa (2013), and Aster et al. (2018).
1.7 Bayesian Inversion
From a probabilistic point of view, the solution of the inverse problem corresponds to estimating the conditional distribution m∣d. The conditional probability P(m∣d) can be obtained using Bayes' theorem (Eqs. 1.8 and 1.25):
(1.52)
where P(d∣m) is the likelihood function, P(m) is the prior distribution, and P(d) is the marginal distribution. The probability P(d) is a normalizing constant that guarantees that P(m∣d) is a valid PDF.
In geophysical inverse problems, we often assume that the physical relation f in Eq. () is linear and that the prior distribution P(m) is Gaussian (Tarantola 2005). These two assumptions are not necessarily required to solve the Bayesian inverse problem, but under these assumptions, the inverse solution can be analytically derived. Indeed, in the Gaussian case, the solution to the Bayesian linear inverse problem is well‐known (Tarantola 2005). If we assume that: (i) the prior distribution of the model is Gaussian, i.e.