Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Yong Chen
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Математика
Год издания:	0
isbn:	9781119666301

Скачать книгу

straight x subscript ij minus straight x with bar on top subscript straight j right parenthesis left parenthesis straight x subscript ik minus straight x subscript straight k right parenthesis over denominator straight n minus 1 end fraction."/> (2.5)

The diagonal elements of S, sjj, j = 1,…,p are the sample variance of the jth variable. It is easy to see that when k = j, the sample covariance in (2.5) is equal to sj², the sample variance of the jth variable. So both notations sjj and sj² represent the sample variance of xj. It is also obvious from (2.5) that skj. So the sample covariance matrix S is a symmetric matrix. The sample covariance matrix S can also be written by the observation vector xi as

$bold S equals fraction numerator 1 over denominator n minus 1 end fraction sum from i equals 1 to n of left parenthesis bold x subscript i minus bold x with bold bar on top right parenthesis left parenthesis bold x subscript i minus bold x with bold bar on top right parenthesis to the power of T.$ (2.6)

Similarly, we define the sample correlation matrix as

bold R equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row 1 cell r subscript 12 end cell horizontal ellipsis cell r subscript 1 p end subscript end cell row cell r subscript 21 end cell 1 horizontal ellipsis cell r subscript 2 p end subscript end cell row vertical ellipsis vertical ellipsis blank vertical ellipsis row cell r subscript n 1 end subscript end cell cell r subscript n 2 end subscript end cell horizontal ellipsis 1 end table close parentheses.

The (j, k)th element of R is the sample correlation of the jth and kth variables:

$r subscript j k end subscript equals fraction numerator s subscript j k end subscript over denominator s subscript j s subscript k end fraction.$

The sample correlation between a variable and itself is equal to 1. So the diagonal elements of a sample correlation matrix are all equal to 1. The sample correlation matrix R is obviously symmetric since rjk = rkj.

Example 2.4 Consider the data set in Table 2.1. In Example 2.2, we found that x̄₁ = 2479.5 and x̄₂ = 170.35. Similarly, we can obtain x̄₃ = 65.41. So the mean vector of x = (x₁ x₂ x₃)T is given by

bold x with bold bar on top equals left parenthesis 2479.5 text end text 170.35 text end text 65.41 right parenthesis to the power of T.

In Example 2.2, we calculated the sample variances, sample covariance, and sample correlation of x₁ and x₂. Similarly, we can obtain the sample variance of x₃ and its sample covariance and correlation with the other two variables as

s subscript 3 superscript 2 equals 3.71 comma space s subscript 13 equals 820.8 comma space space s subscript 23 equals 15.56 comma space r subscript 13 equals 0.832 comma space r subscript 23 equals 0.881.

Note that while s₂₃ is much smaller than s₁₃, r₂₃ is greater than r₁₃, which indicates that the linear association between x₂ and x₃ is stronger than that of x₁ and x₃. This clearly shows that the magnitude of the covariance itself is not meaningful in characterizing how strong the relationship of two variables is. Combining all the sample variance, covariance, and correlation information, the sample covariance matrix and sample correlation matrix of x = (x₁ x₂ x₃)T can be written as

bold S equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell 262829.2 end cell cell 4316.8 end cell cell 820.8 end cell row cell 4316.8 end cell cell 84.07 end cell cell 15.56 end cell row cell 820.8 end cell cell 15.56 end cell cell 3.71 end cell end table close parentheses comma space of 1em bold R equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row 1 cell 0.918 end cell cell 0.832 end cell row cell 0.918 end cell 1 cell 0.881 end cell row cell 0.832 end cell cell 0.881 end cell 1 end table close parentheses.

2.2.3 Linear Combination of Variables

We are often interested in some linear combinations of the variables x₁, x₂,…, xp. For example, for the auto_spec data set, two of the variables are city.mpg and highway.mpg. If you expect that 60% of the mileage for a car is on highway and 40% is on local roads, then the average MPG for a car can be estimated as 0.6 × highway.mpg + 0.4 × city.mpg, which is a linear combination of city.mpg and highway.mpg. In general, let c₁, c₂,…, cp be constants and consider the linear combination of the variables x₁, x₂,…, xp given by

z equals c subscript 1 x subscript 1 plus c subscript 2 x subscript 2 plus horizontal ellipsis plus c subscript p x subscript p.

For each observation of the data set, the corresponding value of the variable z can be found by

z subscript i equals c subscript 1 x subscript i 1 end subscript plus c subscript 2 x subscript i 2 end subscript plus horizontal ellipsis plus c subscript p x subscript i p end subscript equals bold italic C to the power of bold italic T bold X subscript bold i comma i equals 1 comma horizontal ellipsis comma p comma

where cT = (c₁ c₂ … cp). It can be seen that the sample mean of z is

(2.7)

The sample variance of z can be found as

$S subscript z superscript 2 equals fraction numerator sum from i equals 1 to n of open parentheses z subscript i minus z with bar on top close parentheses squared over denominator n minus 1 end fraction equals bold C to the power of bold T bold Sc bold.$ (2.8)

Because sample variance is always non-negative, for any c ∈ ℛp we have c^T Sc ≥ 0 from (2.8). Therefore, the sample covariance matrix S is always a positive semidefinite matrix.

In general, if we have q linear combinations of x₁, x₂,…, xp defined

Скачать книгу