p"/>s toward zero, while the
local scale s, with its heavy‐tailed prior
, allow a small number of
and hence
s to be estimated away from zero. While motivated by two different conceptual frameworks, the spike‐and‐slab can be viewed as a subset of global–local priors in which
is chosen as a mixture of delta masses placed at
and
. Continuous shrinkage mitigates the multimodality of spike‐and‐slab by smoothly bridging small and large values of
.
On the other hand, the use of continuous shrinkage priors does not address the increasing computational burden from growing and in modern applications. Sparse regression posteriors under global–local priors are amenable to an effective Gibbs sampler, a popular class of MCMC we describe further in Section 4.1. Under the linear and logistic models, the computational bottleneck of this Gibbs sampler stems from the need for repeated updates of from its conditional distribution
(4)
where is an additional parameter of diagonal matrix and .5 Sampling from this high‐dimensional Gaussian distribution requires operations with the standard approach [58]: for computing the term and for Cholesky factorization of . While an alternative approach by Bhattacharya et al. [48] provides the complexity of , the computational cost remains problematic in the big and big regime at after choosing the faster of the two.
3.1.2 Conjugate gradient sampler for structured high‐dimensional Gaussians
The conjugate gradient (CG) sampler of Nishimura and Suchard [57] combined with their prior‐preconditioning technique overcomes this seemingly inevitable growth of the computational cost. Their algorithm is based on a novel application of the CG method [59, 60], which belongs to a family of iterative methods in numerical linear algebra. Despite its first appearance in 1952, CG received little attention for the next few decades, only making its way into major software packages such as MATLAB in the 1990s [61]. With its ability to solve a large and structured linear system via a small number of matrix–vector multiplications without ever explicitly inverting , however, CG has since emerged as an essential and prototypical algorithm for modern scientific computing [62, 63].
Despite its earlier rise to prominence in other fields, CG has not found practical applications in Bayesian computation until rather recently [57, 64]. We can offer at least two explanations for this. First, being an algorithm for solving a deterministic linear system, it is not obvious how CG would be relevant to Monte Carlo simulation, such as sampling from ; ostensively, such a task requires computing a “square root” of the precision matrix so that for . Secondly, unlike direct linear algebra methods, iterative methods such as CG have a variable computational cost that depends critically on the user's choice of a preconditioner and thus cannot be used as a “black‐box” algorithm.6 In particular, this novel application of CG to Bayesian computation is a reminder that other powerful ideas in other computationally intensive fields may remain untapped by the statistical computing community; knowledge transfers will likely be facilitated by having more researchers working at intersections of different fields.
Nishimura and Suchard [57] turns CG into a viable algorithm for Bayesian sparse regression problems