We acknowledge the School of Energy Resources of the University of Wyoming and the Nielson Energy Fellowship, the sponsors of the Stanford Center for Earth Resources Forecasting, the Stanford Rock Physics and Borehole Geophysics Project, and the Dean of the Stanford School of Earth, Energy, and Environmental Sciences, Professor Steve Graham, for their continued support.
We want to thank all our co‐authors of previous publications, especially the MSc and PhD students that contributed to the recent developments in our research. This book would not have been possible without the contribution of numerous students, colleagues, and friends who have constantly inspired and motivated us. Finally, we thank our families for their love, encouragement, and support.
1 Review of Probability and Statistics
Statistics and probability notions and methods are commonly used in geophysics studies to describe the uncertainty in the data, model variables, and model predictions. Statistics and probability are two branches of mathematics that are often used together in applied science to estimate parameters and predict the most probable outcome of a physical model as well as its uncertainty. Statistical methods aim to build numerical models for variables whose values are uncertain (e.g. seismic velocities or porosity in the subsurface) from measurements of observable data (e.g. measurements of rock properties in core samples and boreholes). Probability is then used to make predictions about unknown events (e.g. porosity value at a new location) based on the statistical models for uncertain variables. In reservoir modeling, for example, we can use statistics to create multiple reservoir models of porosity and water saturation using direct measurements at the well locations and indirect measurements provided by geophysical data, and then apply probability concepts and tools to make predictions about the total volume of hydrocarbon or water in the reservoir. The predictions are generally expressed in the form of a probability distribution or a set of statistical estimators such as the most‐likely value and its variability. For example, the total fluid volume can be described by a Gaussian distribution that is completely defined by two parameters, the mean and the variance, that represent the most‐likely value and the uncertainty of the property prediction, respectively. Probability and statistics have a vast literature (Papoulis and Pillai 2002), and it is not the intent here to do a comprehensive review. Our goal in this chapter is to review some basic concepts and establish the notation and terminology that will be used in the following chapters.
1.1 Introduction to Probability and Statistics
The basic concept that differentiates statistics and probability from other branches of mathematics is the notion of the random variable. A random variable is a mathematical variable such that the outcome is unknown but the likelihood of each of the possible outcomes is known. For example, the value of the P‐wave velocity at a given location in the reservoir might be unknown owing to the lack of direct measurements; however, the available data might suggest that velocity is likely to be between 2 and 6 km/s with an expected value of 4 km/s. We model our lack of knowledge about the P‐wave velocity by describing it as a random variable. The expected value is the mean of the random variable and the lower and upper limits of the confidence interval represent the range of its variability. Any decision‐making process involving random variables in the subsurface should account for the uncertainty in the predictions, because the predicted value, for example the mean of the random variable, is not necessarily the true value of the property and its accuracy depends on the uncertainty of the measurements, the approximations in the physical models, and the initial assumptions. All these concepts can be formalized using statistics and probability definitions.
In probability and statistics, we view a problem involving random variables as an experiment (i.e. a statistical experiment) where the variable of interest can take different possible values and we aim to predict the outcome of the experiment. We can formulate the main notions of probability using set theory. In Kolmogorov's formulation (Papoulis and Pillai 2002), sets are interpreted as events and probability is a mathematical measure on a class of sets. The sample space S is the collection of all the possible outcomes of the experiment. In reservoir modeling studies, an example of sample space could be the set of all possible reservoir models of porosity generated using a geostatistical method (Chapter 3). An event E is a subset of the sample space. For example, an event E could represent all the reservoir models with average porosity less than 0.20.
If the sample space is large enough, we can use a frequency‐based approach to estimate the probability of an event E. In this setting, we can define the probability of an event E as the number of favorable outcomes divided by the total number of outcomes. In other words, the probability of E is the cardinality of E (i.e. the number of elements in the set E) divided by the cardinality of the sample space S (i.e. the number of elements in the set S). In our example, the probability of a reservoir model having an average porosity lower than 0.20 can be computed as the number of models with average porosity less than 0.20, divided by the total number of reservoir models. For instance, if the sample space includes 1000 geostatistical models of porosity and the event E includes 230 models, then the probability P(E) that a reservoir model has average porosity less than 0.20 is P(E) = 230/1000 = 0.23.
In general, there are two main interpretations of probability: the frequentist approach and the Bayesian approach. The frequentist approach is based on the concept of randomness and this interpretation is related to experiments dealing with aleatory uncertainty owing to natural variations. Statistical events associated with tossing a coin or rolling a die can be described using the frequentist approach, since the outcomes of these events can be investigated by repeating the same experiment several times and studying the frequency of the outcomes. The Bayesian approach focuses on the concept of uncertainty and this interpretation is common for epistemic uncertainty owing to the lack of knowledge. Statistical events associated with porosity or P‐wave velocity in the subsurface are often described using the Bayesian approach, because it is not possible to collect enough data or have a large number of controlled identical experiments in geology.
In geophysical modeling, we often quantify uncertainty using different statistics and probability tools, such as probability distributions, statistical estimators, and geostatistical realizations. For example, the uncertainty associated with the prediction of porosity and fluid saturation from seismic data can be represented by the joint probability distributions of porosity and fluid saturation at each location in the reservoir, by a set of statistical estimations such as the mean, the maximum a posteriori estimate, the confidence interval, and the variance, or by an ensemble of multiple realizations obtained by sampling the probability distribution. In general, it is always possible to build the most‐likely model of the properties of interest from these statistics and probability tools and present the solution in a deterministic form. For example, we can compute the most‐likely value of the probability distribution of porosity and fluid saturation at each location in the reservoir. However, subsurface models are often highly uncertain owing to the lack of direct measurements, the limited quality and resolution of the available geophysical data, the approximations in the physical models, and the natural variability and heterogeneity of subsurface rocks. Therefore, the uncertainty of the predictions should always be considered in any decision‐making process associated with subsurface models.
In this chapter, we review the main concepts of probability and statistics. These results are used in the following chapters to build mathematical methodologies for reservoir modeling, such as geostatistical simulations and inverse methods.
1.2 Probability
In this review, E represents a generic event and P(E) represents its probability. For example, E might