A knowledge graph typically contains both the ontology and related data. In practice, we have found that it is important to keep the ontology and data as separate resources, especially during development. The practice of maintaining them separately but combining them in knowledge graphs and/or applications makes them easier to maintain. Once established, ontologies tend to evolve slowly, whereas the data on which applications depend may be highly volatile. Data for well-known code sets, which might change less frequently than some data sets, can be managed in the form of “OWL ontologies,” but, even in these cases, the individuals should be separate from the ontology defining them to aid in testing, debugging, and integration with other code sets. These data resources are not ontologies in their own right, although they might be identified with their own namespace, etc.
Most inference engines require in-memory deductive databases for efficient reasoning (including commercially available reasoners). The knowledge base may be implemented in a physical, external database, such as a triple store, graph database, or relational database, but reasoning is typically done on a subset (partition) of that knowledge base in memory.
1.6 REASONING, TRUTH MAINTENANCE, AND NEGATION
Reasoning is the mechanism by which the logical assertions made in an ontology and related knowledge base are evaluated by an inference engine. For the purposes of this discussion, a logical assertion is simply an explicit statement that declares that a certain premise is true. A collection of logical assertions, taken together, form a logical theory. A consistent theory is one that does not contain any logical contradictions. This means that there is at least one interpretation of the theory in which all of the assertions are provably true. Reasoning is used to check for contradictions in a collection of assertions. It can also provide a way of finding information that is implicit in what has been stated. In classical logic, the validity of a particular conclusion is retained even if new information is asserted in the knowledge base. This may change if some of the prior knowledge, or preconditions, are actually hypothetical assumptions that are invalidated by the new information. The same idea applies for arbitrary actions—new information can make preconditions invalid.
Reasoners work by using the rules of inference to look for the “deductive closure” of the information they are given. They take the explicit statements and the rules of inference and apply those rules to the explicit statements until there are no more inferences they can make. In other words, they find any information that is implicit among the explicit statements. For example, from the following statement about flowering plants, if it has been asserted that x is a flowering plant, then a reasoner can infer that x has a bloom y, and that y has a characteristic which includes a bloom color z:
(forall ((x FloweringPlant))
(exists ((y Bloom)(z BloomColor))(and (hasPart x y)(hasCharacteristic y z))) )
During the reasoning process, the reasoner looks for additional information that it can infer and checks to see if what it believes is consistent. Additionally, since it is generally applying rules of inference, it also checks to make sure it is not in an infinite loop. When some kind of logical inconsistency is uncovered, then the reasoner must determine, from a given invalid statement, whether or not others are also invalid. The process associated with tracking the threads that support determining which statements are invalid is called truth maintenance. Understanding the impact of how truth maintenance is handled is extremely important when evaluating the appropriateness of a particular inference engine for a given task.
If all new information asserted in a knowledge base is monotonic, then all prior conclusions will, by definition, remain valid. Complications can arise, however, if new information negates a prior statement. “Non-monotonic” logical systems are logics in which the introduction of new axioms can invalidate old theorems (McDermott and Doyle, 1980). What is important to understand when selecting an inference engine is whether or not you need to be able to invalidate previous assertions, and if so, how the conflict detection and resolution is handled. Some questions to consider include the following.
• What happens if conclusive information to prove the assumption is not available?
• The assumption cannot be proven?
• The assumption is not provable using certain methods?
• The assumption is not provable in a fixed amount of time?
The answers to these questions can result in different approaches to negation and differing interpretations by non-monotonic reasoners. Solutions include chronological and “intelligent” backtracking algorithms, heuristics, circumscription algorithms, justification or assumption-based retraction, depending on the reasoner and methods used for truth maintenance.
Two of the most common reasoning methods are forward and backward chaining. Both leverage “if-then” rules, for example, “If it is raining, then the ground is wet.” In the forward chaining process, the reasoner attempts to match the “if” portion (or antecedent) of the rule and when a match is found, it asserts the “then” portion (or the consequent) of the rule. Thus, if the reasoner has found the statement “it is raining” in the knowledge base, it can apply the rule above to deduce that “The ground is wet.” Forward chaining is viewed as data driven and can be used to draw all of the conclusions one can deduce from an initial state and a set of inference rules if a reasoner executes all of the rules whose antecedents are matched in the knowledge base.
Backward chaining works in the other direction. It is often viewed as goal directed. Suppose that the goal is to determine whether or not the ground is wet. A backward chaining approach would look to see if the statement, “the ground is wet,” matches any of the consequents of the rules, and if so, determine if the antecedent is in the knowledge base currently, or if there is a way to deduce the antecedent of the rule. Thus, if a backward reasoner was trying to determine if the ground was wet and it had the rule above, it would look to see if it had been told that it is raining or if it could infer (using other rules) that it is raining.
Another type of reasoning, called tableau (sometimes tableaux) reasoning, is based on a technique that checks for satisfiability of a finite set of formulas. The semantic tableaux was introduced in 1950s for classical logic and was adopted as the reasoning paradigm in description logics starting in the late 1990s. The tableau method is a formal proof procedure that uses a refutation approach—it begins from an opposing point of view. Thus, when the reasoner is trying to prove that something is true, it begins with an assertion that it is false and attempts to establish whether this is satisfiable. In our running example, if it is trying to prove that the ground is wet, it will assert that it is NOT the case that the ground is wet, and then work to determine if there is an inconsistency. While this may be counterintuitive, in that the reasoner proposes the opposite of what it is trying to prove, this method has proven to be very efficient for description logic processing in particular, and most description logic-based systems today use tableau reasoning.
Yet another family of reasoning, called logic programming, begins with a set of sentences in a particular form. Rules are written as clauses such as H :- B1, … Bn. One reads this as H or the “head” of the rule is true if B1 through Bn are true. B1-Bn is called the body. There are a number of logic programming languages in use today, including Prolog, Answer Set Programming, and Datalog.
1.7 EXPLANATIONS AND PROOF
When a reasoner draws a particular conclusion, many users and applications want to understand why. Primary motivating factors for requiring support for explanations in the reasoners include interoperability, reuse, trust, and debugging in general. Understanding the provenance of the information (i.e., where it came from and when) and results (e.g., what sources were used to produce the result, what part