Informatics and Machine Learning. Stephen Winters-Hilt. Читать онлайн. Newlib. NEWLIB.NET

Автор: Stephen Winters-Hilt
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Математика
Год издания: 0
isbn: 9781119716761
Скачать книгу
equals StartFraction p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis Over sigma-summation Underscript i equals 1 Overscript upper L Endscripts p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis EndFraction"/> p left-parenthesis x Subscript i Baseline bar y Subscript j Baseline right-parenthesis equals StartFraction p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis Over sigma-summation Underscript i equals 1 Overscript upper L Endscripts p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis EndFraction

      Bayes' Rule provides an update rule for probability distributions in response to observed information. Terminology:

       p(xi ) is referred to as the “prior distribution on X” in this context.

       p(xi ∣ yj ) is referred to as the “posterior distribution on X given Y.”

      2.5.3 Estimation Based on Maximal Conditional Probabilities

      There are two ways to do an estimation given a conditional problem. The first is to seek a maximal probability based on the optimal choice of outcome (maximum a posteriori [MAP]), versus a maximal probability (referred to as a “likelihood” in this context) given choice of conditioning (maximum likelihood [ML]).

       MAP Estimate:

      Provides an estimate of r.v. X given that Y = yj in terms of the posterior probability:

ModifyingAbove upper X With ampersand c period circ semicolon Subscript upper M upper A upper P Baseline equals argmax Underscript x element-of upper X Endscripts p left-parenthesis x bar y Subscript j Baseline right-parenthesis

       ML Estimate:

      Provides an estimate of r.v. X given that Y = yj in terms of the maximum likelihood:

ModifyingAbove upper X With ampersand c period circ semicolon Subscript upper M upper L Baseline equals argmax Underscript x element-of upper X Endscripts p left-parenthesis y Subscript j Baseline bar x right-parenthesis

      In this section we consider a r.v., X, with specific examples where those outcomes are fully enumerated (such as 0 or 1 outcomes corresponding to a coin flip). We review a series of observations of the r.v., X, to arrive at the LLN. The emergent structure to describe a r.v. from a series of observations is often described in terms of probability distributions, the most famous being the Gaussian Distribution (a.k.a. the Normal, or Bell curve).

      2.6.1 The Law of Large Numbers (LLN)

      The LLN will now be derived in the classic “weak” form. The “strong” form is derived in the modern mathematical context of Martingales in Appendix C.1.

      Let Xk be independent identically distributed (iid) copies of X, and let X be the real number “alphabet.” Let μ = E(X), σ2 = Var(X), and denote

x overbar Subscript upper N Baseline equals StartFraction 1 Over upper N EndFraction sigma-summation Underscript k equals 1 Overscript upper N Endscripts upper X Subscript upper K upper E left-parenthesis x overbar Subscript upper N Baseline right-parenthesis equals mu upper V a r left-parenthesis x overbar Subscript upper N Baseline right-parenthesis equals StartFraction 1 Over upper N squared EndFraction sigma-summation Underscript k equals 1 Overscript upper N Endscripts upper V a r left-parenthesis upper X Subscript k Baseline right-parenthesis equals StartFraction 1 Over upper N EndFraction sigma squared

      From Chebyshev: P(| x overbarNμ|>k) ≤ Var(x overbarN)/k2 = StartFraction 1 Over italic upper N k squared EndFraction σ2

      As N➔∞ get the LLN (weak):

      If Xk are iid copies of X, for k = 1,2,…, and X is a real and finite alphabet, and μ = E(X), σ2 = Var(X), then: P(|x overbarNμ| > k)➔0, for any k > 0. Thus, the arithmetic mean of a sequence of iid r.v.s converges to their common expectation. The weak form has convergence “in probability,” while the strong form has convergence “with probability one.”

      2.6.2 Distributions

      2.6.2.1 The Geometric Distribution(Emergent Via Maxent)

      Here, we talk of the probability of seeing something after k tries when the probability of seeing that event at each try is “p.” Suppose we see an event for the first time after k tries, that means the first (k − 1) tries were nonevents (with probability (1 − p) for each try), and the final observation then occurs with probability p, giving rise to the classic formula for the geometric distribution:

upper P left-parenthesis upper X equals k right-parenthesis equals left-parenthesis 1 en-dash p right-parenthesis Superscript left-parenthesis k minus 1 right-parenthesis Baseline p Schematic illustration of the Geometric distribution, P(X equals k) equals (1 minus p)(k-1)p, with p equals 0.8.

      Total Probability = ∑k = 1(1 – p)(k−1) p = p[1 + (1 – p) + (1 – p)2 + (1 – p)3 + …] = p[1/(1 − (1 − p))] = 1

      2.6.2.2 The Gaussian (aka Normal) Distribution (Emergent Via LLN Relation and Maxent)

upper N Subscript x Baseline left-parenthesis mu comma sigma squared right-parenthesis equals exp left-parenthesis en-dash left-parenthesis x en-dash mu right-parenthesis squared slash left-parenthesis 2 sigma squared right-parenthesis 


                  <div class= Скачать книгу