The tutorial was initially motivated by the Ontology 101 (Noy and McGuinness, 2001) paper, which was based on an expansion of a pedagogical example and ontology that McGuinness provided for wine and food pairing as an introduction to conceptual modeling along with a methodology for working with description logics (Brachman et al., 1991a). It was also influenced by a number of later papers such as Nardi and Brachman’s introductory chapter in the DL Handbook (Baader et al., 2003), which described how to build an ontology starting from scratch. None of the existing references, however, really discussed the more holistic approach we take, including how to capture requirements, develop terminology and definitions, or iteratively refine the terms, definitions, and relationships between them with subject matter experts through the development process. There were other resources that described use case development or terminology work, several of which we reference, but did not touch on the nuances needed specifically for ontology design. There were many references for performing some of these tasks related to data modeling, but not for developing an ontology using a data model as a starting point, what distinguished one from the other, or why that mattered. And nothing we found addressed requirements and methodologies for selecting ontologies that might be reused as a part of a new development activity, which is essential today. Nothing provided a comprehensive, end-to-end view of the ontology development, deployment, and maintenance lifecycle, either.
In 2015, we extended the tutorial to a full 13-week graduate course, which we teach together at Rensselaer Polytechnic Institute (RPI), where Dr. McGuinness is a constellation chair and professor of computer and cognitive science. We needed a reference we could use for that course as well as for the professional training that we often provide as consultants. That increased our motivation to put this together, although business commitments and health challenges slowed us down a bit. The content included in this initial edition reflects the original tutorial and the first five lectures of our Ontology Engineering course at RPI. It covers the background, requirements gathering, terminology development, and initial conceptual modeling aspects of the overall ontology engineering lifecycle. Although most of our work leverages the World Wide Web Consortium (W3C) Resource Description Framework (RDF), Web Ontology Language (OWL), SPARQL, and other Semantic Web standards, we’ve steered away from presenting many technical, and especially syntactic, details of those languages, aside from illustrating specific points. Other references we cite, especially some publications in this series as well as the Semantic Web for the Working Ontologist (Allemang and Hendler, 2011), cover those topics well. We have also intentionally limited our coverage of description logic as the underlying technology as many resources exist. The examples we’ve given come from a small number of use cases that are representative of what we see in many of our projects, but that tend to be more accessible to our students than some of the more technical, domain-specific ontologies we develop on a regular basis.
This book is written primarily for an advanced undergraduate or beginning graduate student, or anyone interested in developing enterprise data systems using knowledge representation and semantic technologies. It is not directed at a seasoned practitioner in an enterprise per se, but such a person should find it useful to fill in gaps with respect to background knowledge, methodology, and best practices in knowledge representation.
We purposefully pay more attention to history, research, and fundamentals than a book targeted for a corporate audience would do. Readers should have a basic understanding of software engineering principles, such as knowing the difference between programs and data, the basics of data management, the differences between a data dictionary and data schema, and the basics of querying a database. We also assume that readers have heard of representation formats including XML and have some idea of what systems design and architecture entail. Our goal is to introduce the discipline of ontology engineering, which relates to all of these things but is a unique discipline in its own right. We will outline the basic steps involved in any ontology engineering project, along with how to avoid a number of common pitfalls, what kinds of tools are useful at each step, and how to structure the work towards a successful outcome.
Readers may consider reading the entire book as a part of their exploration of knowledge engineering generally, or may choose to read individual chapters that, for the most part, are relatively self-contained. For example, many have already used Chapter 3 along with the use case template provided in our class and book materials. Others have found the terminology chapter and related template useful for establishing common vocabularies, enterprise glossaries, and other artifacts independently of the modeling activities that follow.
Our intent is to continue adding chapters and appendices in subsequent editions to support our teaching activities and based on feedback from students and colleagues. We plan to incorporate our experience in ontology engineering over the entire development lifecycle as well as cover patterns specific to certain kinds of applications. Any feedback on what we have presented here or on areas for potential expansion, as we revise and augment the content for future audiences, would be gratefully appreciated.
CHAPTER 1
Foundations
Ontologies have become increasingly important as the use of knowledge graphs, machine learning, and natural language processing (NLP), and the amount of data generated on a daily basis has exploded. Many ontology projects have failed, however, due at least in part to a lack of discipline in the development process. This book is designed to address that, by outlining a proven methodology for the work of ontology engineering based on the combined experience of the authors, our colleagues, and students. Our intent for this chapter is to provide a very basic introduction to knowledge representation and a bit of context for the material that follows in subsequent chapters.
1.1 BACKGROUND AND DEFINITIONS
Most mature engineering fields have some set of authoritative definitions that practitioners know and depend on. Having common definitions makes it much easier to talk about the discipline and allows us to communicate with one another precisely and reliably about our work. Knowing the language makes you part of the club.
We hear many overlapping and sometimes confusing definitions for “ontology,” partly because the knowledge representation (KR) field is still maturing from a commercial perspective, and partly because of its cross-disciplinary nature. Many professional ontologists have background and training in fields including formal philosophy, cognitive science, computational linguistics, data and information architecture, software engineering, artificial intelligence, or library science. As commercial awareness about linked data and ontologies has increased, people knowledgeable about a domain but not trained in any of these areas have started to build ontologies for use in applications as well. Typically, these individuals are domain experts looking for solutions to something their IT departments haven’t delivered, or they are enterprise architects who have run into brick walls attempting to use more traditional technologies to address tough problems. The result is that there is not as much consensus about what people mean when they talk about ontologies as one might think, and people often talk past one another without realizing that they are doing so.
There are a number of well-known definitions and quotes in the knowledge representation field that practitioners often cite, and we list a few here to provide common grounding:
“An ontology is a specification of a conceptualization.” (Gruber, 1993)
This definition is one of the earliest and most cited definitions for ontology