Quantitative Data
Quantitative data is the numbers. Quantitative (or numerical) data is data that can be measured and aggregated.
● Brent Spiner's date of birth is Wednesday, February 2, 1949.
● His height is 5 ft 9 in (180 cm) tall.
● He made 177 appearances in episodes of Star Trek.
● Data's positronic brain is capable of 60 trillion operations per second.
You'll have noticed that date of birth appears in both ordinal and quantitative data types. Time is unusual in that it can be both. In Chapter 31, we look in detail about how you treat time influences your choice of visualization types.
Other types of quantitative measures include sales, profit, exam scores, pageviews, and number of patients in a hospital.
Quantitative data can be expressed in two ways: as discrete or continuous data. Discrete data is presented at predefined, exact points – there's no “in between.” For example, Brent Spiner appeared in 177 episodes of Star Trek; he couldn't have appeared in 177.5 episodes. Continuous data allows for the “in between,” as there is an infinite number of possible intermediate values. For example, Brent Spiner grew to a height of 5 ft 9 in but at one point in his life he was 4 ft 7.5 in tall.
Encoding Data in Charts
We've now looked at preattentive attributes and the three types of data. It's time to see how to combine that knowledge into building charts. Let's look at some charts and see how they encode the different types of data. Sticking with Star Trek, Figure 1.12 shows the IMDB.com ratings of every episode of Star Trek: The Next Generation.
Figure 1.12 Every episode of Star Trek: The Next Generation rated.
Source: IMDB.com
Table 1.3 shows the different types of data, what type it is, and how it's been encoded.
Table 1.3 Data used in Figure 1.12.
Let's look at a few more charts to see how preattentive features have been used. Figure 1.13 is from The Economist. Look at each chart and see if you can work out which types of data are being graphed and how they are being encoded.
Figure 1.13 “A terrible record” from The Economist, July 2016.
Source: START, University of Maryland. The Economist, http://tabsoft.co/2agK3if
Table 1.4 shows how each data type is encoded.
Table 1.4 Data used in the bar chart in Figure 1.13.
Let's look at another example. Figure 1.14 was part of the Makeover Monday project run by Andy Cotgreave and Andy Kriebel throughout 2016. This entry was by Dan Harrison. It takes data on malaria deaths from the World Health Organization. Table 1.5 describes the data used in the chart.
Table 1.5 Data used in the bar chart in Figure 1.14.
Figure 1.14 Deaths from malaria, 2000–2014.
Source: World Health Organization. Chart part of the Makeover Monday project
How did you do? As you progress through the book, stop and analyze some of the views in the scenarios: Think about which data types are being used and how they have been encoded.
Color
Color is one of the most important things to understand in data visualization and frequently is misused. You should not use color just to spice up a boring visualization. In fact, many great data visualizations don't use color at all and are informative and beautiful.
In Figure 1.15, we see Shine Pulikathara's visualization that won the 2015 Tableau Iron Viz competition. Notice his simple use of color.
Figure 1.15 Winning visualization by Shine Pulikathara during the 2015 Tableau Iron Viz competition.
Source: Used with permission from Shine Pulikathara.
Color should be used purposefully. For example, color can be used to draw the attention of the reader, highlight a portion of data, or distinguish between different categories.
Use of Color
Color should be used in data visualization in three primary ways: sequential, diverging, and categorical.
In addition, there is often the need to highlight data or alert the reader of something important. Figure 1.16 offers an example of each of these color schemes.
Figure 1.16 Use of color in data visualization.
Sequential color is the use of a single color from light to dark. An example is encoding the total amount of sales by state in blue, where the darker blue shows higher sales and a lighter blue shows lower sales. Figure 1.17 shows the unemployment rate by state using a sequential color scheme.
Figure 1.17 Unemployment rate by state using a sequential color scheme.
Diverging color is used to show a range diverging from a midpoint. This color can be used in the same manner as the sequential color scheme but can encode two different ranges of a measure (positive and negative) or a range of a measure between two categories. An example is the degree to which electorates may vote Democratic or Republican in each state, as shown in Figure 1.18.
Figure 1.18 Degree of Democratic (blue) versus Republican (red) voter sentiment in each state.