Data Science in Theory and Practice. Maria Cristina Mariani. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Maria Cristina Mariani
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Математика
Год издания:	0
isbn:	9781119674733

Скачать книгу

sample correlation coefficient is a measure of the linear association between two variables and does not depend on the units of measurement, i.e. when you construct the sample correlation coefficient, the units of measurement that are used cancel out. The sample correlation matrix is analogous to the covariance matrix with correlations in place of covariances:

(3.8)

The population correlation matrix similar to (3.8) is defined as follows:

(3.9)

where

rho equals StartFraction sigma Subscript i k Baseline Over StartRoot sigma Subscript i i Baseline EndRoot StartRoot sigma Subscript k k Baseline EndRoot EndFraction period

We note that even though the signs of the sample correlation and the sample covariance are the same, the correlation is easier to interpret because its magnitude is bounded. It is bounded within the closed interval negative 1 less-than-or-equal-to r less-than-or-equal-to 1 . To summarize, the sample correlation has the following properties:

1 The value of the sample correlation must lie between and inclusive. indicates perfect linear relationship and indicates perfect inverse relationship.

2 The sample correlation measures the strength of the linear association between two variables. If equals to zero, it implies no linear association between the components. Otherwise, the sign of indicates the direction of the association. If is positive, it means that as one variable gets larger the other gets larger. If is negative, it means that as one gets larger, the other gets smaller (often called an “inverse” correlation). A larger value of implies greater linear strength. This is an indication that both variables move in the opposite direction if one variable increases, the other variable decreases with the same magnitude (and vice versa).

Example 3.4 Consider the following data matrix introduced in Example 3.1:

bold upper X equals Start 3 By 2 Matrix 1st Row 1st Column 48 2nd Column 3 2nd Row 1st Column 22 2nd Column 1 3rd Row 1st Column 50 2nd Column 2 EndMatrix period

Each receipt yields a pair of measurements, total dollar sales, and number of movies sold. We find the sample correlation bold upper R as follows:

StartLayout 1st Row 1st Column r 12 2nd Column equals StartFraction s 12 Over StartRoot s 11 EndRoot StartRoot s 22 EndRoot EndFraction 2nd Row 1st Column Blank 2nd Column equals StartFraction 13 Over StartRoot 244 EndRoot StartRoot 1 EndRoot EndFraction equals 0.8321 comma 3rd Row 1st Column r 21 2nd Column equals r 12 period EndLayout

Therefore,

bold upper R equals Start 2 By 2 Matrix 1st Row 1st Column 1 2nd Column 0.832 2nd Row 1st Column 0.832 2nd Column 1 EndMatrix period

In this example, we observe the variables x 1 and x 2 are highly positively correlated since r equals 0.832 . This implies that if dollar sales ( x 1 ) increases, the number of movies sold ( x 2 ) also increases.

3.6 Linear Combinations of Variables

Most often, we are interested in linear combinations of the variables x 1 comma x 2 comma ellipsis comma x Subscript p Baseline . In this section, we investigate the means, variances, and covariances of linear combinations.

Let a 1 comma a 2 comma ellipsis comma a Subscript p Baseline be constants and consider the linear combination of the elements of the vector bold upper X ,

(3.10) z equals a 1 x 1 plus a 2 x 2 plus midline-horizontal-ellipsis plus a Subscript p Baseline x Subscript p Baseline equals bold a Superscript upper T Baseline bold upper X comma

where bold a Superscript upper T Baseline equals left-parenthesis a 1 comma a 2 comma ellipsis comma a Subscript p Baseline right-parenthesis . If the same coefficient vector bold a is applied to each bold x Subscript i in a sample, we have

(3.11) z Subscript i Baseline equals a 1 x Subscript i Baseline 1 Baseline plus a 2 x Subscript i Baseline 2 Baseline plus midline-horizontal-ellipsis plus a Subscript p Baseline x Subscript i p Baseline equals bold a Superscript upper T Baseline bold x Subscript i Baseline comma i equals 1 comma 2 comma ellipsis comma n period

For example, if i equals 1 , we have

StartLayout 1st Row 1st Column z 1 2nd Column equals bold a Superscript upper T Baseline bold x Subscript 1 Baseline 2nd Row 1st Column Blank 2nd Column equals left-parenthesis a 1 comma a 2 comma ellipsis comma a Subscript p Baseline right-parenthesis Start 4 By 1 Matrix 1st Row x 11 2nd Row x 12 3rd Row vertical-ellipsis 4th Row x 12 EndMatrix period EndLayout

3.6.1 Linear Combinations of Sample Means

The sample mean of Скачать книгу