After tallying the data, we find that of the 110 companies, 28 belong in the first category, 26 in the second category, 20 in the third category, 16 in the fourth category, and 20 in the last category. Thus, a frequency distribution table for the data in Table 2.3.1 is as shown in Table 2.3.2.
Table 2.3.2 Frequency distribution for the data in Table 2.3.1.
Frequency | Cumulative | Cumulative | |||
Categories | Tally | or count | frequency | Percentage | percentage |
1 | ///// ///// ///// ///// ///// /// | 28 | 28 | 25.45 | 25.45 |
2 | ///// ///// ///// ///// ///// / | 26 | 54 | 23.64 | 49.09 |
3 | ///// ///// ///// ///// | 20 | 74 | 18.18 | 67.27 |
4 | ///// ///// ///// / | 16 | 90 | 14.55 | 81.82 |
5 | ///// ///// ///// ///// | 20 | 110 | 18.18 | 100.00 |
Total | 110 | 100.00 |
Interestingly, we can put technology to work on data in Table 2.3.1 to produce Table 2.3.2.
Example 2.3.2 (Industrial revenue) Using MINITAB and R, construct a frequency distribution table for the data in Table 2.3.1.
Solution:
MINITAB
1 Enter the data in column C1 of the Worksheet Window and name it Categories.
2 From the Menu bar, select Stat Tables Tally Individual Variables
3 In this dialog box, enter C1 in the box under Variables.
4 Check all the boxes under Display and click OK.
5 The frequency distribution table as shown below appears in the Session window.
This frequency distribution table may also be obtained by using R as follows:
USING R
R has built in ‘table()’ function that can be used to get the basic frequency distribution of categorical data. To get the cumulative frequencies, we can apply built in ‘cumsum()’ function to tabulated frequency data. Then using the ‘cbind()’ function we combine categories, frequencies, cumulative frequencies, and cumulative percentages to build the final distribution table. In addition, we can use the ‘colnames()’ function to name the columns of the final table as needed. The task can be completed running the following R code in R Console window.
#Assign given data to the variable data data = c(4,3,5,3,4,1,2,3,4,3,1,5,3,4,2,1,1,4,5,3,2,5,2,5,2,1,2,3,3,2, 1,5,3,2,1,1,2,1,2,4,5,3,5,1,3,1,2,1,4,1,4,5,4,1,1,2,4,1,4,1,2,4,3,4,1, 4,1,4,1,2,1,5,3,1,5,2,1,2,3,1,2,2,1,1,2,1,5,3,2,5,5,2,5,3,5,2,3,2,3,5, 2,3,5,5,2,3,2,5,1,4) #To get frequencies data.freq = table(data) #To combine necessary columns freq.dist = cbind(data.freq, cumsum(data.freq), 100*cumsum(data.freq)/sum(data.freq)) #To name the table columns colnames(freq.dist) = c(‘Frequency’,‘Cum.Frequency’,‘Cum Percentage’) freq.dist #R output
Frequency | Cum.Frequency | Cum Percentage | |
1 | 28.00 | 28.00 | 25.45 |
2 | 26.00 | 54.00 | 49.09 |
3 | 20.00 | 74.00 | 67.27 |
4 | 16.00 | 90.00 | 81.82 |
5 | 20.00 | 110.00 | 100.00 |
Note that sometimes a quantitative data set is such that it consists of only a few distinct observations that occur repeatedly. These kind of data are usually summarized in the same manner as the categorical data. The categories are represented by the distinct observations. We illustrate this scenario with the following example.
Example 2.3.3 (Hospital data) The following data show the number of coronary artery bypass graft surgeries performed at a hospital in a 24‐hour period for each of the last 50 days. Bypass surgeries are usually performed when a patient has multiple blockages or when the left main coronary artery is blocked. Construct a frequency distribution table for these data.
1 | 2 | 1 | 5 | 4 | 2 | 3 | 1 | 5 | 4 | 3 | 4 | 6 | 2 |
|