The T2-statistic is obtained by (3.21) as T2 = 19.71. The right-hand side of (3.22) at α = 0.05 is obtained as F0 = 8.13. Since the observed value of T2 exceeds the critical value F0, we reject the null hypothesis H0 and conclude that the mean vector of the three side temperatures of the defective billets is significantly different from the nominal mean vector. In addition, the p-value is 0.0004 < α =0.05, which further confirms that H0 should be rejected.
3.5 Bayesian Inference for Normal Distribution
Let D = {x1, x2,…, xn} denote the observed data set. In the maximum likelihood estimation, the distribution parameters are considered as fixed. The estimation errors are obtained by considering the random distribution of possible data sets D. By contrast, in Bayesian inference, we treat the observed data set D as the only data set. The uncertainty in the parameters is characterized through a probability distribution over the parameters.
In this subsection, we focus on Bayesian inference of normal distribution when the mean μ is unknown and the covariance matrix Σ is assumed as known. The Bayesian inference is based on the Bayes’ theorem. In general, the Bayes’ theorem is about the conditional probability of an event A given that an event B occurs:
Applying Bayes’ theorem for Bayesian inference of μ, we have
where g(μ) is the prior distribution of μ, which is the distribution before observing the data, and f(μ|D) is called as the posterior distribution, which is the distribution after we have observed D. The function f(D|μ) on the right-hand side of (3.25) is the density function for the observed data set D. If it is viewed as a function of the unknown parameter μ, f(D|μ) is exactly the likelihood function of μ. Therefore the Bayes’ theorem can be stated in words as
where ∝ stands for “is proportional to”. Note the denominator p(D) in the right-hand side of (3.25) is a constant which does not depend on the parameter μ. It plays the normalization role to ensure the left-hand side is a valid probability density function and integrates to one. Taking the integral of the right-hand side of (3.25) with respect to μ and setting it to be equal to one, it is easy to see that
A point estimate of μ can be obtained by maximizing the posterior distribution. This method is called the maximum a posteriori (MAP) estimate. The MAP estimate of μ can be written as
From (3.27), it can be seen that the MAP estimate is closely related to MLE. Without the prior g(μ), the MAP is the same as the MLE. So if the prior follows a uniform distribution, the MAP and MLE will be equivalent. Following this argument, if the prior distribution has a flat shape, we expect that the MAP and MLE are similar.
We first consider a simple case where the data follow a univariate normal distribution with unknown mean μ and known variance σ2. The likelihood function based on a random sample of independent observations D = {x1, x2,…, xn} is given by
Based on (3.26), we have
where g(μ) is the probability density function of the prior distribution. We choose a normal distribution N(μ0, σ02) as the prior for μ. This prior is a conjugate prior because the resulting posterior distribution will also be normal. By completing the square in the exponent of the likelihood and prior, the posterior distribution can be obtained as
where