Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP. Bhisham C. Gupta. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Bhisham C. Gupta
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Математика
Год издания:	0
isbn:	9781119516620

Скачать книгу

that the data points are concentrated around the straight line within a narrow band. The upward trend indicates a positive association between the two variables, while the width of the band indicates the strength of the association, which in this case is quite strong. As the association between the two variables gets stronger and stronger, the band enclosing the plotted points becomes narrower and narrower. A downward trend indicates a negative association between the two variables.

A numerical measure of association between two numerical variables is called the Pearson correlation coefficient, named after the English statistician Karl Pearson (1857–1936). Note that a correlation coefficient does not measure causation. In other words, correlation and causation are different concepts. Causation causes correlation, but not necessarily the converse. The correlation coefficient between two numerical variables in a set of sample data is usually denoted by r, and the correlation coefficient for population data is denoted by the Greek letter images (rho). The correlation coefficient r based on n pairs of images , say images is defined as

(2.9.1) equation

(2.9.2) equation

Table 2.9.1 Cholesterol levels and systolic BP of 10 randomly selected US males.

Subject	1	2	3	4	5	6	7	8	9	10
Cholesterol (x)	195	180	220	160	200	220	200	183	139	155
Systolic BP (y)	130	128	138	122	140	148	142	127	116	123

Scatterplot of systolic blood pressure versus cholesterol level displaying a positive slope line with 10 scattered circle markers.

Figure 2.9.1 MINITAB printout of scatter plot for the data in Table 2.9.1.

The correlation coefficient is a dimensionless measure that can attain any value in the interval images . As the strength of the association between the two variables grows, the absolute value of r approaches 1. Thus, when there is a perfect association between the two variables, images or images , depending on whether the association is positive or negative. In other words, images , if the two variables are moving in the same direction, and images , if the two variables are moving in the opposite direction.

Perfect association means that if we know the value of one variable, then the value of the other variable can be determined without any error. The other special case is when images , which does not mean that there is no association between the two variables, but rather that there is no linear association between the two variables. As a general rule, the linear association is weak, moderate, or strong when the absolute value of images is less than 0.3, between 0.3 and 0.7, or greater than 0.7, respectively. For instance, if (2.9.1) is computed for the data in Table 2.9.1, then images . Hence, we can conclude that the association between the two variables X and Y is strong.

MINITAB:

1 Enter the pairs of data in columns C1 and C2. Label the columns X and Y.

2 From the Menu bar select Graph Scatterplot. This prompts a dialog box to appear on the screen. In this dialog box, select scatterplot With Regression and click OK. This prompts the following dialog box to appear:In this dialog box, under the X and Y variables, enter the columns in which you have placed the data. Use the desired options and click OK. The Scatter plot shown in Figure 2.9.1 appears in the Session window.

3 For calculating the correlation coefficient, select from the Menu bar Stat Basic Statistics Correlation. Then, enter the variables C1 and C2 in the dialog box.

USING R

We can use a built in ‘plot()’ function in R to generate scatter plots. Extra arguments such as ‘pch’ and ‘cex’ can be used to specify the plotting symbol and size of the symbol, respectively. Finally, the function ‘abline()’ can be used to embed the trend line to the scatter plot as follows. The function ‘cor()’ can be used to calculate the Pearson correlation coefficient. The whole task can be completed by running the following R code in the R Console window.

x = c(195,180,220,160,200,220,200,183,139,155) y = c(130,128,138,122,140,148,142,127,116,123) #To plot the data in a scatter plot plot(x, y, pch = 20, cex = 2, main = ‘Scatterplot for Cholesterol Level and Systolic Blood Pressure Data’, xlab = ‘Cholesterol Level’, ylab = ‘Systolic Blood Pressure’) #To add a trend line abline(lm(y images x), col = ‘red’) #To calculate the Pearson correlation coefficient cor(x, y) images 0.9242063

The resulting R scatter plot for the data in Table 2.9.1 looks exactly the same as in the MINTAB printout in Figure 2.9.1.

PRACTICE PROBLEMS FOR SECTION 2.9

1 The following data give the heights (cm) and weights (lb) of 10 male undergraduate students:Heights170167172171165170168172175172Weights182172179172174179188168185169Draw a scatter plot for these data. By observing this scatter plot, do you expect the correlation between heights and weights to be

Скачать книгу