1 at most 50% of the values in the set are less than , and
2 at most 50% of the values in the set are greater than .
We now turn our attention to the stem‐and‐leaf plot invented by John Tukey. This plot is a graphical tool used to display quantitative data. Each data value is split into two parts, the part with leading digits is called the stem, and the rest is called the leaf. Thus, for example, the data value 5.15 is divided in two parts with 5 for a stem and 15 for a leaf.
A stem‐and‐leaf plot is a powerful tool used to summarize quantitative data. The stem‐and‐leaf plot has numerous advantages over both the frequency distribution table and the frequency histogram. One major advantage of the stem‐and‐leaf plot over the frequency distribution table is that from a frequency distribution table, we cannot retrieve the original data, whereas from a stem‐and‐leaf plot, we can easily retrieve the data in its original form. In other words, if we use the information from a stem‐and‐leaf plot, there is no loss of information, but this is not true of the frequency distribution table. We illustrate the construction of the stem‐and‐leaf plot with the following example.
Example 2.4.7 (Spare parts supply) A manufacturing company has been awarded a huge contract by the Defense Department to supply spare parts. In order to provide these parts on schedule, the company needs to hire a large number of new workers. To estimate how many workers to hire, representatives of the Human Resources Department decided to take a random sample of 80 workers and find the number of parts each worker produces per week. The data collected is given in Table 2.4.5. Prepare a stem‐and‐leaf diagram for these data.
Table 2.4.5 Number of parts produced per week by each worker.
73 | 70 | 68 | 79 | 84 | 85 | 77 | 75 | 61 | 69 | 74 | 80 | 83 | 82 | 86 | 87 | 78 | 81 | 68 | 71 |
74 | 73 | 69 | 68 | 87 | 85 | 86 | 87 | 89 | 90 | 92 | 71 | 93 | 67 | 66 | 65 | 68 | 73 | 72 | 83 |
76 | 74 | 89 | 86 | 91 | 92 | 65 | 64 | 62 | 67 | 63 | 69 | 73 | 69 | 71 | 76 | 77 | 84 | 83 | 85 |
81 | 87 | 93 | 92 | 81 | 80 | 70 | 63 | 65 | 62 | 69 | 74 | 76 | 83 | 85 | 91 | 89 | 90 | 85 | 82 |
Solution: The stem‐and‐leaf plot for the data in Table 2.4.5 is as shown in Figure 2.4.13.
The first column in Figure 2.4.13 gives the cumulative frequency starting from the top and from the bottom of the column but ending at the stem that lies before the stem containing the median. The number in parentheses indicates the stem that contains the median value of the data, and the frequency of that stem.
Figure 2.4.13 Stem‐and‐leaf plot for the data in Example 2.4.7 with increment 10.
Figure 2.4.14 Stem‐and‐leaf plot for the data in Example 2.4.7 with increment 5.
Carefully examining the stem‐and‐leaf plot in Figure 2.4.13, we note that the data are clustered together; each stem has many leaves. This situation is the same as when we have too few classes in a frequency distribution table. Thus having too many leaves on the stems makes the stem‐and‐leaf diagram less informative. This problem can be resolved by splitting each stem into two, five, or more stems depending on the size of the data. Figure 2.4.14 shows a stem‐and‐leaf plot when we split each stem into two stems.
The first column in the above stem‐and‐leaf plots counts from the top, and at the bottom is the number of workers who have produced up to and beyond certain number of parts. For example, in Figure 2.4.14, the entry in the third row from the top indicates that 35 workers produced fewer than 75 parts/wk, whereas the entry in the third row from the bottom indicates that 37 workers produced at least 80 parts/wk. The number within parentheses gives the number of observations on that stem and indicates that the middle value or the median of the data falls on that stem. Furthermore, the stem‐and‐leaf plots in Figure