and |Σ| = 1 − ρ2. Substituting Σ−1 and |Σ| in (3.8), we have
From (3.9), if ρ = 0, the joint density can be written as f(x1,x2) = f(x1)f(x2), where f(x) is the univariate normal density as given in (3.7), with μ = 0 and σ = 1. So in this case X1 and X2 are independent. This result is true for general multivariate normal distribution, as discussed later in this section.
By solving the characteristic equation |Σ − λI| = 0, the two eigenvalues of Σ are λ1 = 1 + ρ and λ2 = 1 – ρ. Based on Σv = λv, the corresponding eigenvectors can be obtained as
So the major axis of the ellipse contour of constant density is along the line x1 = x2 and the minor axis is orthogonal to the major axis. The larger the correlation coefficient ρ, the more elongated the ellipse contour. As an example, two bivariate normal distributions with ρ = 0 and ρ = 0.75 are shown in Figure 3.1(a) and Figure 3.1(b), respectively. Notice how the presence of correlation causes the probability distribution to concentrate along the line x1 = x2. When ρ = 0, it is easy to see that the constant-density contour is a circle, as shown in Figure 3.2(a). For ρ = 0.75, the constant-density contour is an ellipse shown in Figure 3.2(b).
Figure 3.1 Two bivariate normal distributions, (a) ρ = 0 (b) ρ = 0.75
Figure 3.2 Contour plots for the distributions in Figure 3.1
Properties of the Multivariate Normal Distribution
We list some of the most useful properties of the multivariate normal distribution. These properties make it convenient to manipulate normal distributions, which is one of the reasons for the popularity of the normal distribution. Suppose the random vector X follows a p-dimensional normal distribution Np(μ,Σ).
Normality of linear combinations of the variables in X. Let c be a vector of constants. From (3.3) and (3.4), we have E(cT X) = cT μ and var(cT X) (cT Σc. This is true for any random vector X. When X follows a multivariate normal distribution, we have the additional property that cT X also follows a (univariate) normal distribution. That is, if X ∼ Np(μ,Σ, then cT X ∼ N(cT μ, cT Σc). In general, if C is a q × p matrix, CX still follows a multivariate normal distribution. From (3.1) and (3.2), we have E(CX) = Cμ and cov(CX) = CΣCT. So CX ∼ Nq(Cμ, CΣCT).
Normality of subvectors. Let X1 = (X1, X2,…, Xq) be the subvector of the first q elements of X and X2 = (Xq+1, Xq+2,…, Xp) be the subvector of the remaining p − q elements of X. From (3.5) and (3.6), μ and Σ can be partitioned as (3.10)where μi and Σii are the mean vector and covariance matrix of Xi, for i = 1, 2. If X follows a multivariate normal distribution, we have the additional property that both X1 and X2 follow a multivariate normal distribution. That is, if X ∼ Np (μ, Σ), then X1 ∼ Nq (μ1, Σ11) and X2 ∼ Np–q (μ2, Σ22). A special case of this property is that each element of X also follows a (univariate) normal distribution. That is, if X ∼ Np (μ, Σ), then Xj ∼ N(μj, σjj), j = 1, 2,…, p. The converse of this result is not true. If each element of a random vector X follows a univariate normal distribution, X may not follow a multivariate normal distribution.
Zero covariance implies independence. If X ∼ Np (μ, Σ) and , the mean vector and covariance matrix of X can be partitioned as in (3.10). The subvectors X1 and X2 are independent if and only if Σ12 = 0. Specifically, for any two elements Xi and Xj of X, Xi and Xj are independent if and only if σij = cov(Xi, Xj) = 0. Note that if Xi and Xj do not follow joint normal distribution, and Xi and Xj are independent, we still have cov(Xi, Xj) = 0. However, the converse is not necessarily true. That is, if cov(Xi, Xj) = 0, Xi and Xj may not be independent.
Conditional distributions are normal. Suppose and the mean vector and covariance matrix of X is given by (3.10). If X1 and X2 are not independent, we have Σ12 ≠ 0 and the conditional distribution of X1 given X2 = x2, is multivariate normal with (3.11) (3.12)Note that the mean vector of the conditional distribution is a linear function of x2. But the covariance matrix of the conditional distribution does not depend on x2. If X1 and X2 are independent, clearly the conditional distribution of X1 given X2 = x2 is simply Nq (μ1, Σ11), the unconditional distribution of X1.
3.3 Maximum Likelihood Estimation for Multivariate Normal Distributions
If the population distribution is assumed to be multivariate normal with mean vector μ and covariance matrix Σ. The parameters μ and Σ can be estimated from a random sample of n observations x1, x2,…, xn. A commonly used method for parameter estimation is the maximum likelihood estimation (MLE), and the estimated parameter values are called the maximum likelihood estimates. The idea of the maximum likelihood estimation is to find μ and Σ that maximize the joint density of the x’s, which is called the likelihood function. For multivariate normal distribution, the likelihood function is