Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP. Bhisham C. Gupta. Читать онлайн. Newlib. NEWLIB.NET

Автор: Bhisham C. Gupta
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Математика
Год издания: 0
isbn: 9781119516620
Скачать книгу
3, 8, 5, 6, 10, 17, 19, 20, 3, 2, 11

      Solution: In the data set of this example, each value occurs once except 3, which occurs twice. Thus, the mode for this set is

equation

      Example 2.5.7 (Data set with no mode) Find the mode for the following data set:

       1, 7, 19, 23, 11, 12, 1, 12, 19, 7, 11, 23

      Solution: Note that in this data set, each value occurs twice. Thus, this data set does not have any mode.

      Example 2.5.8 (Tri‐modal data set) Find the mode for the following data set:

       5, 7, 12, 13, 14, 21, 7, 21, 23, 26, 5

      Solution: In this data set, values 5, 7, and 21 occur twice, and the rest of the values occur only once. Thus, in this example, there are three modes, that is,

equation

A left-skewed distribution curve with mean < median < mode (left), a right-skewed distribution curve with mode < median < mean (middle), and a symmetric (bell-shaped) distribution curve with mean = median = mode (right).

      Definition 2.5.3

      A data set is symmetric when the values in the data set that lie equidistant from the mean, on either side, occur with equal frequency.

      Definition 2.5.4

      A data set is left‐skewed when values in the data set that are greater than the median occur with relatively higher frequency than those values that are smaller than the median. The values smaller than the median are scattered to the left far from the median.

      Definition 2.5.5

      A data set is right‐skewed when values in the data set that are smaller than the median occur with relatively higher frequency than those values that are greater than the median. The values greater than the median are scattered to the right far from the median.

      2.5.2 Measures of Dispersion

Image described by caption and surrounding text.

      Range

      The range of a data set is the easiest measure of dispersion to calculate. Range is defined as

      (2.5.5)equation

      The range is not an efficient measure of dispersion because it takes into consideration only the largest and the smallest values and none of the remaining observations. For example, if a data set has 100 distinct observations, it uses only two observations and ignores the remaining 98 observations. As a rule of thumb, if the data set contains 10 or fewer observations, the range is considered a reasonably good measure of dispersion. For data sets containing more than 10 observations, the range is not considered to be an efficient measure of dispersion.

      Example 2.5.9 (Tensile strength) The following data gives the tensile strength (in psi) of a sample of certain material submitted for inspection. Find the range for this data set:

       8538.24, 8450.16, 8494.27, 8317.34, 8443.99, 8368.04, 8368.94, 8424.41, 8427.34, 8517.64

      Solution: The largest and the smallest values in the data set are 8538.24 and 8317.34, respectively. Therefore, the range for this data set is

equation

      Variance

      For example, if the values in a data set are images, and the sample average is images, then images are the deviations from the sample average. It is then natural to find the sum of these deviations and to argue that if this sum is large, the values differ too much from each other, but if this sum is small, they do not differ from each other too much. Unfortunately, this argument does not hold, since, as is easily proved, the sum of these deviations is always zero, no matter how much the values in the data set differ. This is true because some of the deviations are positive and some are negative. To avoid the fact that this summation is zero, we can square these deviations and then take their sum. The variance is then the average value of the sum of the squared deviations from images. If the data set represents a population, then the deviations are taken from the population mean images. Thus, the population