Alongside the developement of OWL, a countless number of vocabularies have been developed. Just to name a few, VoID22 (Vocabulary of Interlinked Dataset) contains terms for providing metadata to a dataset, FoaF23 (Friend of a Friend) operates in the Social Network domains and contains terms for describing people and their relations, SKOS24 (Simple Knowledge Organization System) is used for sharing and linking knowledge organization systems like thesauri or taxonomies while the RDF Data Cube Vocabulary25 can be used for publishing multi-dimensional data like statistcs.
The SPARQL26 Protocol and RDF Query Language (SPARQL) is a W3C recommendation and it is the standard query language for RDF data since 2008. SPARQL is one the key technology of the Semantic Web and it is used to retrieve and manipulate RDF data from the knowledge graphs available on the Web. The evaluation of SPARQL queries is based on graph pattern matching. Graph Patterns are templates that consist of a series of triples that the SPARQL engine looks for inside the store.
SPARQL allows four query forms: SELECT, ASK, CONSTRUCT, and DESCRIBE. The SELECT query form returns a solution sequence, i.e., a sequence of variables and their bindings. The ASK query form returns a Boolean value (yes or no), indicating whether a query pattern matches or not. The CONSTRUCT query form returns an RDF graph structured according to the graph template of the query. Finally, the DESCRIBE query form returns an RDF graph which provides a “description” of the matching resources. Thus, based on the query forms, the SPARQL query results may be RDF Graphs, SPARQL solution sequences and Boolean values.
Unfortunately, this SPARQL version presented different vacancies including the lack of the support to data management operators so, in 2013, the W3C SPARQL working group published SPARQL 1.127 which extended the original SPARQL query language in several aspects. Precisely, SPARQL 1.1 introduced features for manipulating the content of the store and introduced the support for nested queries and aggregation functions.
At last, triples need to be stored in a triplestore. Different proposal have been developed over the year. Monolithic Triple Storage are triplestore that store all the triples in a single table. They are sure easy to implement and work for huge number of properties but it requires an intelligent index system and several self join during queries. A slightly lighter version of monolithic storage imply to associate each URI and Literal with a numerical identifier. It ends up in two tables; one holds the association URI/Literal—number and the other contains the triples in a numerical fashion. Property Tables are triplestore which create a table for each class. This way, the tuple with the same characteristics are grouped together. It resemble the structure of relational DB and queries requires fewer joins but it potentially contains an high number of NULL values. Vertically Partitioned Table triplestore create a two-column table for every property of the dataset. Each table contains the subject, in the first column, and the object, in the second column, of the triples with that specific predicate. This system grants good performance when then number of property is low, otherwise it is particularly expansive in computational terms. Hexastores are particular structure that create an index for each possible combination of triple elements in order to enable efficient processing at the cost of six times the disk required for storing data.
Quadstores are the natural evolution of the triplestores. The main difference between them is that the quadstores store tuples of four elements: Subject, Predicate, Object, and Graph.
Furthermore, the data contained in these structures (both triplestore and quadstores) tend to be very atomic since the nodes in the graph are primitive data type like strings, integers, date, etc and the relations connect those kind of data. Graph Databases model the graph following an object oriented fashion. The nodes are not simple primitive kind of data but instances of the graph. Generally, each instance has property that describes itself (datatype properties) and properties that relates it to other objects (object properties), so the datatype property are integrated together forming a sort of description for the instance while object properties are treated as the arcs that connect different instances. Therefore, in graph databases, the nodes are not simple strings but pure object with a moltitude of datatype properties. Some popular graph databases are Neo4j28 and Amazon Neptune.29
1.8 CONCLUSIONS
In this chapter, we have introduced the story of the Web, starting from the description of a Web of interlinked documents till the Web of Data. The potential hidden behind the useage of the meaning of the words can boost the advent of intelligent agents so we explored the fundamentals that gave birth to the Semantic Web ranging from RDF to Linked Data to the SPARQL query language to the storage technologies. Moreover, we have reported some statistics collected by different Open Data agencies worldwide about the dimension, value, and impact that Open and Linked Data have been estimated to reach in the global economic market.
1 https://www.wikipathways.org/index.php/Portal:Semantic_Web
3 https://www.w3.org/DesignIssues/LinkedData.html
5 https://www.ted.com/talks/tim_berners_lee_on_the_next_Web#t-960912
6 http://opendefinition.org/od/2.1/en/
7 https://public.resource.org/8_principles.html
8 https://en.wikipedia.org/wiki/Network_effect
10 https://www.insight-center.org/
12 https://data.europa.eu/euodp/en/home
13 https://www.europeandataportal.eu/
15 http://data.unescap.org/sdg/
18 http://cassandra.apache.org/