Figure 1.1. Buildup of the kernel estimate of PDF. Normal kernel functions (thick dashed lines) are associated with data points (triangles at abscissa). The PDF estimate (thin solid line) is the sum of the kernel functions. Whereas in the estimation with constant bandwidth, (a), all kernel functions are the same, in the estimation with variable bandwidth, (b), the kernels are wider in the regions of sparse data. For a color version of this figure, see www.iste.co.uk/limnios/statistical.zip
The final shape of the density estimate strongly depends on the value of h, which is demonstrated in the example in Figure 1.2. In the applications of the kernel estimators presented here, the criterion to select the bandwidth, h, is the minimizer of the mean integrated square error (MISE):
where f is the actual (unknown) PDF, is its kernel estimate and E[•] denotes the expected value. Starting from MISE, Silverman (1986) derived the simplified score function of the form:[1.8]
where K(•) is the kernel function, and
Kijko et al. (2001) showed that for the normal kernel function, [1.2], the score function becomes:[1.9]
and the bandwidth that minimizes M1(h) is the root of the equation:
When the random variable, for which the distribution functions are to be estimated, X, is defined over a finite or semi-finite interval, or its density is sharply zeroed outside a finite or semi-finite interval, the estimation of the distribution functions is modified according to Silverman (1986). Suppose that either X ∈ [x*, x*], or the PDF, fX(x), is not continuous in x* and x*, and fX(x) = 0 for x < x* and x > x*. To get the kernel estimates of PDF and CDF, the original data sample, {xi}, i = 1,.., n, is mirrored symmetrically around x* and x* resulting in the sample {2x* − xi, xi, 2x* − xi}, i = 1,.. n. Based on this sample, a density
is estimated and the desired estimates of PDF and CDF are:
Figure 1.2. The dependence of density estimate on bandwidth, h. The actual distribution was the superposition of two normal distributions, 0.7 × NORM(0,1) + 0.3 × NORM(2,2). The exact PDF is presented in (a). The estimates were built from 500-element samples drawn from the actual distribution, using the normal kernel and constant bandwidths: (b) h = 0.1, (c) h = 0.25 and (d) h = 0.5. For a color version of this figure, see www.iste.co.uk/limnios/statistical.zip
When the interval [x*, x*] is semi-finite,
then the original sample is mirrored around and the sample or the sample is used to estimate The desired estimates of PDF and CDF are:In connection with the above-mentioned problems with probabilistic functional models of earthquake parameters, the flexibility of kernel density estimation makes it a perfect tool for modeling the probabilistic distributions of such parameters. In particular, the kernel estimation method responds well to the needs of the PSHA.
The mathematical model in PSHA can be formulated, e.g., as the following expression for the probability that the ground motion amplitude parameter, amp, at the point (x0, y0), during D time units will exceed the value a(x0, y0):
Pr[amp(x0, y0) ≥ a(x0, y0), D] =
[1.13]
where r(x0, y0) is the epicentral distance of an earthquake to the receiving point (x0, y0), M is the earthquake magnitude, N(D) is the number of earthquakes in D, Pr[amp(x0, y0) ≥ a(x0, y0)|M, r] is the probability that amp will exceed a due to the earthquake of magnitude, M, being distanced from the receiving point of r, fr is the PDF of epicentral distance, f(M|N(D) ≠ 0) is the probability density of earthquake magnitude, M, conditional upon earthquake occurrence in D and ℳ is its domain. fr is straightforwardly linked to the two-dimensional probability distribution of earthquake epicenters, fxy(x, y). fxy(x, y) and f(M|N(D) ≠ 0) represent the properties of seismic source (source effect), and Pr[amp(x0, y0) ≥ a(x0, y0)|M, r] represents the properties of the vibration transmission from the source to the receiving point (path and site effects).
The conditional probability of source magnitude, f(M|N(D) ≠ 0), can be evaluated from the probability mass function (PMF) of the number of earthquakes in D time units, Pr[N(D)=n], and the unconditional PDF, fM, and the CDF, FM, of magnitude:
Under the assumption