52 52 Nott, D.J. and Kohn, R. (2005) Adaptive sampling for Bayesian variable selection. Biometrika, 92, 747–763.
53 53 Ghosh, J. and Clyde, M.A. (2011) Rao–Blackwellization for Bayesian variable selection and model averaging in linear and binary regression: a novel data augmentation approach. J. Am. Stat. Assoc., 106,1041–1052.
54 54 Carvalho, C.M., Polson, N.G., and Scott, J.G. (2010) The horseshoe estimator for sparse signals. Biometrika, 97,465–480.
55 55 Polson, N.G. and Scott, J.G. (2010) Shrink globally, act locally: sparse Bayesian regularization and prediction. Bayesian Stat., 9, 501–538.
56 56 Polson, N.G., Scott, J.G., and Windle, J. (2013) Bayesian inference for logistic models using Pólya–Gamma latent variables. J. Am. Stat. Assoc., 108, 1339–1349.
57 57 Nishimura, A. and Suchard, M.A. (2018) Prior‐preconditioned conjugate gradient for accelerated gibbs sampling in “large n & large p” sparse Bayesian logistic regression models. arXiv:1810.12437.
58 58 Rue, H. and Held, L. (2005) Gaussian Markov Random Fields: Theory and Applications, CRC Press.
59 59 Hestenes, M.R. and Stiefel, E. (1952) Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stand., 49, 409–436.
60 60 Lanczos, C. (1952) Solution of systems of linear equations by minimized iterations. J. Res. Nat. Bur. Stand., 49, 33–53.
61 61 Van der Vorst, H.A. (2003) Iterative Krylov Methods for Large Linear Systems, vol. 13, Cambridge University Press.
62 62 Cipra, B.A. (2000) The best of the 20th century: editors name top 10 algorithms. SIAM News, 33, 1–2.
63 63 Dongarra, J., Heroux, M.A., and Luszczek, P. (2016) High‐performance conjugate‐gradient benchmark: a new metric for ranking high‐performance computing systems. Int. J. High Perform. Comput. Appl., 30, 3–10.
64 64 Zhang, L., Zhang, L., Datta, A., and Banerjee, S. (2019) Practical Bayesian modeling and inference for massive spatial data sets on modest computing environments. Stat. Anal. Data Min., 12, 197–209.
65 65 Golub, G.H. and Van Loan, C.F. (2012) Matrix Computations, vol. 3, Johns Hopkins University Press.
66 66 Pybus, O.G., Tatem, A.J., and Lemey, P. (2015) Virus evolution and transmission in an ever more connected world. Proc. R. Soc. B: Biol. Sci., 282, 20142878.
67 67 Bloom, D.E., Black, S., and Rappuoli, R. (2017) Emerging infectious diseases: a proactive approach. Proc. Natl. Acad. Sci. U.S.A., 114, 4055–4059.
68 68 Pybus, O.G., Suchard, M.A., Lemey, P. et al. (2012) Unifying the spatial epidemiology and molecular evolution of emerging epidemics. Proc. Natl. Acad. Sci. U.S.A., 109, 15066–15071.
69 69 Nunes, M.R., Palacios, G., Faria, N.R. et al. (2014) Air travel is associated with intracontinental spread of dengue virus serotypes 1–3 in Brazil. PLoS Negl. Trop. Dis., 8, e2769.
70 70 Bletsa, M., Suchard, M.A., Ji, X. et al. (2019) Divergence dating using mixed effects clock modelling: an application to HIV‐1. Virus Evol., 5, vez036.
71 71 Dudas, G., Carvalho, L.M., Bedford, T. et al. (2017) Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature, 544, 309–315.
72 72 Elbe, S. and Buckland‐Merrett, G. (2017) Data, disease and diplomacy: GISAID's innovative contribution to global health. Glob. Chall., 1, 33–46.
73 73 Ji, X., Zhang, Z., Holbrook, A. et al. (2020) Gradients do grow on trees: a linear‐time O(N)‐dimensional gradient for statistical phylogenetics. Mol. Biol. Evol., 37, 3047–3060.
74 74 Baum, L. (1972) An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities, 3, 1–8.
75 75 Suchard, M.A., Lemey, P., Baele, G. et al. (2018) Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol., 4, vey016.
76 76 Gentle, J.E., Härdle, W.K., and Mori, Y. (eds) (2012) How computational statistics became the backbone of modern data science, in Handbook of Computational Statistics, Springer, pp. 3–16.
77 77 Lunn, D., Spiegelhalter, D., Thomas, A., and Best, N. (2009) The BUGS project: evolution, critique and future directions. Stat. Med., 28, 3049–3067.
78 78 Bergstra, J., Breuleux, O., Bastien, F. et al. (2010) Theano: A CPU and GPU Math Expression Compiler. Proceedings of the Python for Scientific Computing Conference (SciPy) Oral Presentation.
79 79 Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1986) Learning representations by back‐propagating errors. Nature, 323, 533–536.
80 80 Neal, R.M. (1996) Bayesian Learning for Neural Networks, Springer‐Verlag.
81 81 Gelman, A. (2014) Petascale Hierarchical Modeling Via Parallel Execution. U.S. Department of Energy. Report No: DE‐SC0002099.
82 82 Hoffman, M.D. and Gelman, A. (2014) The no‐U‐turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res., 15, 1593–1623.
83 83 Stan Development Team (2018) Stan Modeling Language Users Guide and Reference Manual. Version 2.18.0.
84 84 Livingstone, S. and Zanella, G. (2019) On the robustness of gradient‐based MCMC algorithms. arXiv:1908.11812.
85 85 Mangoubi, O., Pillai, N.S., and Smith, A. (2018) Does Hamiltonian Monte Carlo mix faster than a random walk on multimodal densities? arXiv:1808.03230.
86 86 Livingstone, S., Faulkner, M.F., and Roberts, G.O. (2019) Kinetic energy choice in Hamiltonian/hybrid Monte Carlo. Biometrika, 106, 303–319.
87 87 Dinh, V., Bilge, A., Zhang, C., and Matsen IV, F.A. (2017) Probabilistic Path Hamiltonian Monte Carlo. Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1009–1018.
88 88 Nishimura, A., Dunson, D.B., and Lu, J. (2020) Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods. Biometrika, 107, 365–380.
89 89 Geman, S. and Geman, D. (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell., PAMI‐6, 721–741.
90 90 Gelfand, A.E. and Smith, A.F. (1990) Sampling‐based approaches to calculating marginal densities. J. Am. Stat. Assoc., 85, 398–409.
91 91 Monnahan, C.C., Thorson, J.T., and Branch, T.A. (2017) Faster estimation of Bayesian models in ecology using Hamiltonian Monte Carlo. Methods Ecol. Evol., 8, 339–348.
92 92 Zhang, Z., Zhang, Z., Nishimura, A. et al. (2020) Large‐scale inference of correlation among mixed‐type biological traits with phylogenetic multivariate probit models. Ann. Appl. Stat.
93 93 Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B, 39, 1–22.
94 94 Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., and Saul, L.K. (1999) An introduction to variational methods for graphical models. Mach. Learn., 37, 183–233.
95 95 Wei, G.C. and Tanner, M.A. (1990) A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms. J. Am. Stat. Assoc., 85, 699–704.
96 96 Ranganath, R., Gerrish, S., and Blei, D.M. (2014) Black Box Variational Inference. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics.
97 97 Dagum, L. and Menon, R. (1998) OpenMP: an industry standard API for shared‐memory programming. IEEE Comput. Sci. Eng., 5, 46–55.
98 98 Warne, D.J., Sisson, S.A., and Drovandi, C. (2019) Acceleration of expensive computations in Bayesian statistics using vector operations. arXiv preprint arXiv:1902.09046.
99 99 Bergstra, J., Bastien, F., Breuleux, O. et al. (2011) Theano: Deep Learning on GPUS with Python. NIPS 2011, BigLearning Workshop, Granada, Spain vol. 3, pp. 1–48. Citeseer.
100 100 Nielsen, M.A. and Chuang,