Data Science in Theory and Practice. Maria Cristina Mariani. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Maria Cristina Mariani
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Математика
Год издания:	0
isbn:	9781119674733

Скачать книгу

Subscript 1 comma 1 Baseline 2nd Column sigma Subscript 1 comma 2 Baseline 3rd Column midline-horizontal-ellipsis 4th Column sigma Subscript 1 comma p Baseline 2nd Row 1st Column sigma Subscript 2 comma 1 Baseline 2nd Column sigma Subscript 2 comma 2 Baseline 3rd Column midline-horizontal-ellipsis 4th Column sigma Subscript 2 comma p Baseline 3rd Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column Blank 4th Column vertical-ellipsis 4th Row 1st Column sigma Subscript p comma 1 Baseline 2nd Column sigma Subscript p comma 2 Baseline 3rd Column midline-horizontal-ellipsis 4th Column sigma Subscript p comma p Baseline EndMatrix period"/>

Just like the sample covariance case defined in (3.1), the diagonal elements sigma Subscript j j Baseline equals sigma Subscript j Superscript 2 are the population variances of the bold upper X 's, and the off‐diagonal elements sigma Subscript i k are the population covariances of all possible pairs of s, i.e. upper X Subscript i k for i not-equals k .

The notation sigma-summation for the covariance matrix is widely used and seems natural because is the uppercase version of sigma .

Example 3.3 Consider the following data matrix introduced in Example 3.1:

bold upper X equals Start 3 By 2 Matrix 1st Row 1st Column 48 2nd Column 3 2nd Row 1st Column 22 2nd Column 1 3rd Row 1st Column 50 2nd Column 2 EndMatrix period

Each receipt yields a pair of measurements, total dollar sales, and number of movies sold. Since there are three receipts, we have a total of three observations on each variable. We find the sample variances and covariance bold upper S Subscript n as follows:

StartLayout 1st Row 1st Column s 11 2nd Column equals one half sigma-summation Underscript j equals 1 Overscript 3 Endscripts left-parenthesis x Subscript j Baseline 1 Baseline minus x overbar Subscript 1 Baseline right-parenthesis squared 2nd Row 1st Column Blank 2nd Column one half left-parenthesis left-parenthesis 48 minus 40 right-parenthesis squared plus left-parenthesis 22 minus 40 right-parenthesis squared plus left-parenthesis 50 minus 40 right-parenthesis squared right-parenthesis equals 244 comma 3rd Row 1st Column s 22 2nd Column equals one half sigma-summation Underscript j equals 1 Overscript 3 Endscripts left-parenthesis x Subscript j Baseline 2 Baseline minus x overbar Subscript 2 Baseline right-parenthesis squared 4th Row 1st Column Blank 2nd Column one half left-parenthesis left-parenthesis 3 minus 2 right-parenthesis squared plus left-parenthesis 1 minus 2 right-parenthesis squared plus left-parenthesis 2 minus 2 right-parenthesis squared right-parenthesis equals 1 comma 5th Row 1st Column s 12 2nd Column equals one half sigma-summation Underscript j equals 1 Overscript 3 Endscripts left-parenthesis x Subscript j Baseline 1 Baseline minus x overbar Subscript 1 Baseline right-parenthesis left-parenthesis x Subscript j Baseline 2 Baseline minus x overbar Subscript 2 Baseline right-parenthesis 6th Row 1st Column Blank 2nd Column one half left-parenthesis left-parenthesis 48 minus 40 right-parenthesis left-parenthesis 3 minus 2 right-parenthesis plus left-parenthesis 22 minus 40 right-parenthesis left-parenthesis 1 minus 2 right-parenthesis plus left-parenthesis 50 minus 40 right-parenthesis left-parenthesis 2 minus 2 right-parenthesis right-parenthesis equals 13 comma 7th Row 1st Column s 21 2nd Column equals s 12 period EndLayout

Therefore,

bold upper S Subscript n Baseline equals Start 2 By 2 Matrix 1st Row 1st Column 244 2nd Column 13 2nd Row 1st Column 13 2nd Column 1 EndMatrix period

3.5 Correlation Matrices

A correlation matrix is a table showing correlation coefficients between variables. Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. The sample correlation between the th and th variables is defined as

(3.6) r Subscript i k Baseline equals StartFraction s Subscript i k Baseline Over StartRoot s Subscript i i Baseline EndRoot StartRoot s Subscript k k Baseline EndRoot EndFraction comma

where

StartLayout 1st Row 1st Column s Subscript i k 2nd Column equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j i Baseline minus x overbar Subscript i Baseline right-parenthesis left-parenthesis x Subscript j k Baseline minus x overbar Subscript k Baseline right-parenthesis comma i equals 1 comma 2 comma ellipsis comma p and k equals 1 comma 2 comma ellipsis comma p comma 2nd Row 1st Column s Subscript i i 2nd Column equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j i Baseline minus x overbar Subscript i Baseline right-parenthesis squared comma i equals 1 comma 2 comma ellipsis comma p comma 3rd Row 1st Column s Subscript k k 2nd Column equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j k Baseline minus x overbar Subscript k Baseline right-parenthesis squared comma k equals 1 comma 2 comma ellipsis comma p period EndLayout

Substituting s Subscript i k Baseline comma s Subscript i i Baseline and s Subscript k k into (3.6) and canceling terms, we obtain

(3.7)

for i equals 1 comma 2 comma ellipsis comma p and k equals 1 comma 2 comma ellipsis comma p . We note that the sample correlation is symmetric since r Subscript i k Baseline equals r Subscript k i for all and .

The

Скачать книгу