1.1 THE POWER OF VISUALIZATION ON LINKED DATA
On any kind of data visualization enables serendipity and exploration. On LD it allows users to start understanding data previously unknown and to get the picture of the dataset in their mind, or to penetrate in some portions of the source. Moreover, visualization over LD is probably the only way to enable users without technical skills to grasp the meaning of the content of LD sources. Furthermore, also domain experts can take advantage of a visual exploration of the dataset resulting in reduction in time.
Following the idea of Tim-Berners Lee, each resource should have a unique name that starts with HTTP. It means that reality can be replicated over the Internet. Each resource of the world can have its digital alter ego. Moreover, the pillar of linked data is that resources should be connected to other resources.
The simplest form of relationship are personal relation; John is a friend of Martin, Martin is the son of Peter, and somehow John is remotely connected to Martin. However, this can be extended to every existing field. Biology, Sociology, and Art are only a few areas in which LD can be deployed. LD has the power to universally express everything. However, how to visualize everything? One possible choice is to learn Semantic Web techonologies, write SPARQL queries and then analyze the results. Despite the difficulty of writing SPARQL queries, this approach can be used only when the results are limited, since the information that can be displayed on a screen are limited. The other possible approach is to exploit the power of visualization.
The first tests on graphic visualization date back to 1890. In 1890, Herman Hollerith revolutionized the world of data analysis with a creative and innovative idea: he used punch cards to collect and analyze the U.S. census data. Using punch cards saved two years and five million dollars over the manual tabulation techniques used in the previous census while enabling more thorough analysis of the data [Blodgett and Schultz, 1969]. We currently face an analogous development in the filed of LD. Since 2006, many researchers developed original solutions for solving the task of LD visualization and now we can exploit different tools and different visualization layouts.
Listing 1.1: Query for extracting relations between classes
For example, how can a user understand the content of the Wikipathways dataset1? Assuming that the user wants to know the contents of the dataset, he/she could formulate a SPARQL query to extract the classes and relations similar to the Listing 1.1 and then analyze the results, as shown in Figure 1.1.
Adopting a graphical visualization, instead, can simplify a lot the analysis of the results. For example, the previous information can be obtained through one of the visualization provided by the tool H-BOLD (Figure 1.2). As it can be seen, displaying the same information with a graph, it is more easy to understand the connections and paths among the classes.
Figure 1.1: Results of the query in Listing 1.1.
Figure 1.2: HBOLD schema visualization of the Wikipathways dataset.
A crucial and impressive aspect of LD is that information are interlinked with different sources. Therefore, starting from the URI of a resource it is possible to display not only the information that describe the resource within the dataset, but also information from outside datasets. Figure 1.3 depicts how a LD visualization tool is able to create a collage of information from disparate sources. In that example, LodView2 has been exploited to merge all information about London. Starting from the URI of the resource London in Dbpedia (http://dbpedia.org/resource/London), the tool look at all the outcoming links and illustrates all the labels and pictures associated with the URIs of these links. What is displayed is an overview of images and information related to London from different data sources.
Figure 1.3: LodView visualization of London.
1.2 THE WEB OF LINKED, OPEN, AND SEMANTIC DATA
Tim Berners-Lee had a grand vision for the Internet when he began development of the World Wide Web in 1989 [Gillmor, 2004, Chapter 2]. He envisioned a read/write Web. However, what had emerged in the 1990s was an essentially read-only Web, the so-called Web 1.0. The users’ interactions with the Web were limited to the search and the reading of information. The lack of active interaction between users and the Web lead, in 1999, to the birth of the Web 2.0. For the first time, common users were able to write and share information with everyone. This era empowered users with a few new concepts like blogs, social media, and video-streaming platforms like Twitter, Facebook, and Youtube.
Over time, users started to upload textual and multimedia content at an incredibly high rate and, as a consequence, more and more people started to use the Web for several different purposes. The high volume of web pages and the higher number of requests required Web applications to find new ways for handling documents. Machines needed to understand what data they are handling. The main idea was to provide a context to the documents in a machine-readable format. This new revolution, the Web 3.0, is called Semantic Web or Web of Data.
With the advent of the Semantic Web, users started to publish content together with metadata, i.e., other data that provide some context about the main data in a machine-understandable way. The machine-readable descriptions enable content managers to add meaning to the content. In this way, a machine can process knowledge itself, instead of text, using processes similar to human deductive reasoning and inference, thereby obtaining more meaningful results and helping computers to perform automated information gathering and research. Making data understandable to machines implies, anyway, the sharing of a common data structure. To solve this issue, the RDF (Resource Description Framework) was the language proposed by the W3C for achieving a common data structure.
The Semantic Web also allows creating links among data on the Web. So that a person or machine can explore the Web of data. With Linked Data, when you have some of it, you can find other, related, data. Like the Web of hypertext, the Web of data is constructed with documents on the Web and the links between arbitrary things are described by RDF. The URIs identify any kind of object or concept.
Connecting your own data to other information already present on the Web resulted in at least two important consequences. The first is the possibility to add even more information and provide a more extended context, and the second is the creation of a global network of LD, the Giant Global Graph.
Alongside the arise of the Semantic Web, the Web shifted from a web pages-oriented Web to a data-oriented Web (Figure 1.4). Users of the Web started to publish data online and governments foresee in opening data, a way for enroling the citizen in the governative life of the city.
The volume of data is growing exponentially everywhere. Each minute, 149,513 emails are sent, 3.3 million Facebook posts are created, 65,972 Instagram photos are uploaded, 448,800 Tweets are constructed, and 500 hours of YouTube videos are uploaded. The tremendous increase of data through the Internet of Things (continuous increase of connected devices, sensors, and smartphones), has contributed to the rise of a “data-driven” era. Moreover, future predictions argue that by 2020, every person will generate 1.7 megabytes in just a second.
Each sector is affected by this