Data Science in Theory and Practice. Maria Cristina Mariani. Читать онлайн. Newlib. NEWLIB.NET

Автор: Maria Cristina Mariani
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Математика
Год издания: 0
isbn: 9781119674733
Скачать книгу
p period"/>

      Example 3.2 Consider the following data matrix introduced in Example 3.1:

bold upper X equals Start 3 By 2 Matrix 1st Row 1st Column 48 2nd Column 3 2nd Row 1st Column 22 2nd Column 1 3rd Row 1st Column 50 2nd Column 2 EndMatrix period

      Each receipt yields a pair of measurements, total dollar sales, and number of movies sold. To find the sample mean x overbar, we calculate the average of each column as follows:

StartLayout 1st Row 1st Column x overbar Subscript 1 2nd Column equals one third sigma-summation Underscript j equals 1 Overscript 3 Endscripts x Subscript j Baseline 1 Baseline equals one third left-parenthesis 48 plus 22 plus 50 right-parenthesis equals 40 comma 2nd Row 1st Column x overbar Subscript 2 2nd Column equals one third sigma-summation Underscript j equals 1 Overscript 3 Endscripts x Subscript j Baseline 2 Baseline equals one third left-parenthesis 3 plus 1 plus 2 right-parenthesis equals 2 period EndLayout

      Therefore,

bold upper X overbar equals StartBinomialOrMatrix x overbar Subscript 1 Baseline Choose x overbar Subscript 2 Baseline EndBinomialOrMatrix equals StartBinomialOrMatrix 40 Choose 2 EndBinomialOrMatrix period

      This implies that the average dollar sales for two movies is $40.00. Therefore, the average amount of dollars that cost a movie is 20 dollars.

      The variance–covariance matrix (or simply the covariance matrix) of a random vector bold upper X is given by

Cov left-parenthesis bold upper X right-parenthesis equals upper E left-bracket left-parenthesis bold upper X minus upper E left-parenthesis bold upper X right-parenthesis right-parenthesis left-parenthesis bold upper X minus upper E left-parenthesis bold upper X right-parenthesis right-parenthesis Superscript upper T Baseline right-bracket comma

      where upper E left-parenthesis bold upper X right-parenthesis is the mean vector.

      In (3.1), the covariance matrix consists of the variances of the variables along the main diagonal and the covariances between each pair of variables in the other matrix positions. The sample covariance of the ith and kth variables, s Subscript i k, is calculated using the ith and kth columns of bold upper X:

      (3.2)s Subscript i k Baseline equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j i Baseline minus x overbar Subscript i Baseline right-parenthesis left-parenthesis x Subscript j k Baseline minus x overbar Subscript k Baseline right-parenthesis comma i equals 1 comma 2 comma ellipsis comma p comma k equals 1 comma 2 comma ellipsis comma p comma

      where n is the number of measurements.

      For example if i equals 1 comma k equals 2:

      (3.3)s 12 equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j Baseline 1 Baseline minus x overbar Subscript 1 Baseline right-parenthesis left-parenthesis x Subscript j Baseline 2 Baseline minus x overbar Subscript 2 Baseline right-parenthesis

      and if i equals 1 comma k equals 1:

      we have the sample variance.

      The sample covariance measures the association between the ith and kth variables. The sample covariance reduces to the sample variance when i equals k as observed in (3.4). We note that the sample covariance matrix (3.1) is symmetric, i.e. s Subscript i k Baseline equals s Subscript k i for all i and k because of its definition. Other names used for the covariance matrix are variance matrix, variance–covariance matrix, and dispersion matrix. In finance the concept of covariance is applied in portfolio theory, in the diversification method, that reduces the risk by choosing assets that do not present a high positive covariance with each other.

      If bold upper X is a random vector taking on any possible value in a multivariate population, the population covariance matrix is defined as