2.7 Measures of Relative Position
This section introduces measures of relative position that divide the data into percentages to help locate any data value in the whole data set. Commonly used measures of relative position are percentiles and quartiles: percentiles divide the data into one hundred parts, such that each part contains at the most 1% of the data, and quartiles divide the data into four parts, such that each part contains at the most 25% of the data. Then from quartiles, we can derive another measure, which is called the interquartile range (IQR), to give the range of the middle 50% of the data values. This is obtained by first organizing the data in an ascending order and then trimming 25% of the data values from the lower and the upper ends. A quantile is a value which divide a distribution or an ordered sample such that a specified proportion of observations fall below that value. For instance, the percentiles and quartiles are very specific quantiles.
2.7.1 Percentiles
Percentiles divide the data into one hundred equal parts; each part contains at the most 1% of the data and is numbered from 1 to 99. For example, the median of a data set is the 50th percentile, which divides the data into two equal parts so that at most 50% of the data fall below the median and at most 50% of the data fall above it. The procedure for determining the percentiles is similar to the procedure used for determining the median. We compute the percentiles as follows:
1 Step 1. Write the data values in an ascending order and rank them from 1 to .
2 Step 2. Find the rank of the pth percentile (), which is given by(2.7.1)
3 Step 3. Find the data value that corresponds to the rank of the pth percentile.
We illustrate this procedure with the following example.
Example 2.7.1 (Engineers' salaries) The following data give the salaries (in thousands of dollars) of 15 engineers in a corporation:
62 48 52 63 85 51 95 76 72 51 69 73 58 55 54
1 Find the 70th percentile for these data.
2 Find the percentile corresponding to the salary of $60,000.
Solution: (a) We proceed as follows:
1 Step 1. Write the data values in the ascending order and rank them from 1 to 15.Salaries485151525455586263697273768595Ranks123456789101112131415
2 Step 2. Find the rank of the 70th percentile, which from (2.7.1) is given by
3 Step 3. Find the data value that corresponds to the ranks 11 and 12, which in this example are 72 and 73, respectively. Then, the 70th percentile is given byThus, the 70th percentile of the salary data is $72,200.That is, at most 70% of the engineers are making less than $72,200 and at most 30% of the engineers are making more than $72,200.
(b) Now we want to find the percentile
(2.7.2)
Thus, the percentile corresponding to the salary of $60,000 is
Hence, the engineer who makes a salary of $60,000 is at the 44th percentile. In other words, at most 44% of the engineers are making less than $60,000, or at most 56% are making more than $60,000.
2.7.2 Quartiles
In the previous discussion, we considered the percentiles that divide the data into 100 equal parts. Some of the percentiles have special importance, such as the 25th, 50th, and 75th percentiles. These percentiles are also known as the first, second, and third quartiles (denoted by
Figure 2.7.1 Quartiles and percentiles.
2.7.3 Interquartile Range (IQR)
Often we are more interested in finding information about the middle 50% of a population. A measure of dispersion relative to the middle 50% of the population or sample data is known as the IQR. This range is obtained by trimming 25% of the values from the bottom and 25% of the values from the top. This is equivalent to finding the spread between the first quartile and the third quartile, which is IQR and is defined as
(2.7.3)
Example 2.7.2 (Engineers' salaries) Find the IQR for the salary data in Example 2.7.1:
Salaries: 48, 51, 51, 52, 54, 55, 58, 62, 63, 69, 72, 73, 76, 85, 95
Solution: In order to find the IQR, we need to find the quartiles
Consulting Step 3 of Example 2.7.1, we find that
Notes:
1 The IQR gives an estimate of the range of the middle 50% of the population.
2 The IQR is potentially a more meaningful measure of dispersion than the range as it is not affected by the extreme values that may be present in the data. By trimming 25% of the data from the bottom and 25% from the top, we eliminate the extreme values that may be present in the data set. Thus, the IQR is often used as a measure of comparison between two or more data sets on similar studies.
2.7.4 Coefficient of Variation
The