Informatics and Machine Learning. Stephen Winters-Hilt. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Stephen Winters-Hilt
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Математика
Год издания:	0
isbn:	9781119716761

Скачать книгу

equals StartFraction p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis Over sigma-summation Underscript i equals 1 Overscript upper L Endscripts p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis EndFraction"/>

p left-parenthesis x Subscript i Baseline bar y Subscript j Baseline right-parenthesis equals StartFraction p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis Over sigma-summation Underscript i equals 1 Overscript upper L Endscripts p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis EndFraction

Bayes' Rule provides an update rule for probability distributions in response to observed information. Terminology:

p(xi ) is referred to as the “prior distribution on X” in this context.

p(xi ∣ yj ) is referred to as the “posterior distribution on X given Y.”

2.5.3 Estimation Based on Maximal Conditional Probabilities

There are two ways to do an estimation given a conditional problem. The first is to seek a maximal probability based on the optimal choice of outcome (maximum a posteriori [MAP]), versus a maximal probability (referred to as a “likelihood” in this context) given choice of conditioning (maximum likelihood [ML]).

MAP Estimate:

Provides an estimate of r.v. X given that Y = y_j in terms of the posterior probability:

ModifyingAbove upper X With ampersand c period circ semicolon Subscript upper M upper A upper P Baseline equals argmax Underscript x element-of upper X Endscripts p left-parenthesis x bar y Subscript j Baseline right-parenthesis

ML Estimate:

Provides an estimate of r.v. X given that Y = y_j in terms of the maximum likelihood:

ModifyingAbove upper X With ampersand c period circ semicolon Subscript upper M upper L Baseline equals argmax Underscript x element-of upper X Endscripts p left-parenthesis y Subscript j Baseline bar x right-parenthesis

2.6 Emergent Distributions and Series

In this section we consider a r.v., X, with specific examples where those outcomes are fully enumerated (such as 0 or 1 outcomes corresponding to a coin flip). We review a series of observations of the r.v., X, to arrive at the LLN. The emergent structure to describe a r.v. from a series of observations is often described in terms of probability distributions, the most famous being the Gaussian Distribution (a.k.a. the Normal, or Bell curve).

2.6.1 The Law of Large Numbers (LLN)

The LLN will now be derived in the classic “weak” form. The “strong” form is derived in the modern mathematical context of Martingales in Appendix C.1.

Let X_k be independent identically distributed (iid) copies of X, and let X be the real number “alphabet.” Let μ = E(X), σ² = Var(X), and denote

x overbar Subscript upper N Baseline equals StartFraction 1 Over upper N EndFraction sigma-summation Underscript k equals 1 Overscript upper N Endscripts upper X Subscript upper K

upper E left-parenthesis x overbar Subscript upper N Baseline right-parenthesis equals mu

upper V a r left-parenthesis x overbar Subscript upper N Baseline right-parenthesis equals StartFraction 1 Over upper N squared EndFraction sigma-summation Underscript k equals 1 Overscript upper N Endscripts upper V a r left-parenthesis upper X Subscript k Baseline right-parenthesis equals StartFraction 1 Over upper N EndFraction sigma squared

From Chebyshev: P(| x overbar _N − μ|>k) ≤ Var(_N)/k² = StartFraction 1 Over italic upper N k squared EndFraction σ²

As N➔∞ get the LLN (weak):

If X_k are iid copies of X, for k = 1,2,…, and X is a real and finite alphabet, and μ = E(X), σ² = Var(X), then: P(| x overbar _N − μ| > k)➔0, for any k > 0. Thus, the arithmetic mean of a sequence of iid r.v.s converges to their common expectation. The weak form has convergence “in probability,” while the strong form has convergence “with probability one.”

2.6.2 Distributions

2.6.2.1 The Geometric Distribution(Emergent Via Maxent)

Here, we talk of the probability of seeing something after k tries when the probability of seeing that event at each try is “p.” Suppose we see an event for the first time after k tries, that means the first (k − 1) tries were nonevents (with probability (1 − p) for each try), and the final observation then occurs with probability p, giving rise to the classic formula for the geometric distribution:

upper P left-parenthesis upper X equals k right-parenthesis equals left-parenthesis 1 en-dash p right-parenthesis Superscript left-parenthesis k minus 1 right-parenthesis Baseline p

Schematic illustration of the Geometric distribution, P(X equals k) equals (1 minus p)(k-1)p, with p equals 0.8.

Figure 2.3 The Geometric distribution, P(X = k) = (1 − p)^(k−1) p, with p = 0.8.

As far as normalization, i.e. do all outcomes sum to one, we have:

Total Probability = ∑_{k = 1}(1 – p)^(k−1) p = p[1 + (1 – p) + (1 – p)² + (1 – p)³ + …] = p[1/(1 − (1 − p))] = 1

So total probability already sums to one with no further normalization needed. In Figure 2.3 is a geometric distribution for the case where p = 0.8:

2.6.2.2 The Gaussian (aka Normal) Distribution (Emergent Via LLN Relation and Maxent)

upper N Subscript x Baseline left-parenthesis mu comma sigma squared right-parenthesis equals exp left-parenthesis en-dash left-parenthesis x en-dash mu right-parenthesis squared slash left-parenthesis 2 sigma squared right-parenthesis

<div class=