Setting up the environment and loading of input data as described above takes 3.91 s in XSB, and 14.9 s in LogicBlox.
Query 1.1 The first query asks for the grand-advisors of David Scott Warren.
XSB. Recall the rules and query in XSB:
Note that a call to q1
will be made with its argument as a free variable, and therefore the following calls will be made in the body of the rule for q1
: (i) a call to person
with the second argument bound due to the first subgoal, since that argument is a constant; (ii) a call to adv
with the second argument bound due to the second subgoal, since D
will be bound by the first subgoal; (iii) another call to adv
with the second argument bound due to the third subgoal, since X
will be bound by the second subgoal; and (iv) another call to person
with the first argument bound due to the final subgoal, since Y
will be bound by the third subgoal.
The rules and query may be analyzed for each relevant predicate to find binding patterns for predicates, as shown above. For optimal performance, it is critical to have indexes corresponding to each binding pattern for the IDB predicates. As noted, XSB provides indexes on the first argument by default. For each non-default index, we write the indexing directives as follows:
The query is answered in 261 ms without the index, and in 238 ms with the index, 9.6% faster. The effect is not so dramatic, since only one subgoal binds the second argument of person
. We will see more dramatic effects in other queries.
LogicBlox. Recall that in LogicBlox, we do not have control over the indexing mechanism. Running the first query takes 1400 ms. However, LogicBlox does not take the query into account, and therefore will infer facts for the adv
predicate for every possible pair, but the only relevant ones are David Scott Warren’s advisors, and their advisors in turn. To avoid the extra computation, we can perform demand transformation, which yields the following rules:
The demand
predicate stores David Scott Warren, and his advisors, so that the new rules only infer facts of the adv
predicate that are relevant to our query. This version takes 944 ms, 33% faster than the original rules. This example illustrates that demand transformation is important, even for a small query.
Query 1.2 This query finds the people who got their degrees from the same university as their advisor.
XSB. Running the rules and query as given in XSB results in quite a long execution time, 49.2 min. However, if we analyze the rules for binding patterns, and therefore add the following indexing directive, which indexes the dissertation
predicate on its first, second, and the combination of second and fifth arguments:
the running time is reduced to 511 ms, almost 5800 times faster than the original! The effect here is more dramatic than in the previous query because of repeated calls to dissertation
with a different binding pattern than only the first argument being bound.
LogicBlox. The original rules and query take 2.14 s to run. Performing demand transformation in the original rules does not result in a new program, since in this order of subgoals, there is no IDB predicate with bound arguments in a left-to-right evaluation. However, we can change the order of subgoals to obtain the following rule for q2
, to effect a different demand for adv
without changing the semantics or performance of the given rule.
The demand for adv can be defined as
We can see that this join is quite expensive, joining every pair of people and filtering with respect to their universities. Therefore, demand transformation results in slower performance. Our experiments confirm this behavior, since this set of rules takes 77.9 s, 36 times slower.
Query 1.3 This query finds the people who worked in a different area than their advisors.
XSB. This query is similar in essence to Query 2, and without indexing the query takes 47.4 minutes to run. With the correct indexing directives, it takes 325 ms, 8750 times faster.
LogicBlox. Since Queries 2 and 3 are very similar, we again get the best performance with the initial rules, since the demand transformation after subgoal reordering results in extra work. The original rules and query take 1.74 s to run. If the adv
predicate is moved to a later position and demand transformation is applied, the resulting rules and query take 74.4 s, 43 times slower.
Queries 1.4/5 These queries find the academic ancestors of David Scott Warren, and their count, respectively. The count is only a small overhead for each system, therefore we have grouped the queries together.
XSB. For all the prior queries, it can be shown that, based on the rules and the nature of the data (the genealogy information does not contain cycles), this query would terminate even without tabling. However, even in the absence of repeated subqueries, recursion could lead to exponential time complexity. As we discussed, the solution to this problem in top-down systems is tabling. To enable tabling, we write the following directive with the rules for ancestry:
Recall that our query will generate an initial subquery for anc
with the second argument bound (to David Scott Warren). However, adding the tabling directive does not seem to help, since for every person, this set of rules will generate a query: Is this person an ancestor of David Scott Warren?
This strategy results in a lot of unnecessary computation, and XSB quickly runs out of memory; tabling did not help us get the answers here. Can we, however, reduce the number of queries, and therefore evaluate the query efficiently? The answer was already provided in Section 1.5.2.1, where we showed that left-recursive transitive closure is much more efficient than right-recursive, if tabling is used. Indeed, if we reorder the goals in the second rule as follows:
we will only ever ask: Who is an ancestor of David Scott Warren? And, voilà, we get the answers quickly, in 3.19 s. This example shows that tabling and subgoal ordering are essential to optimizing queries for top-down evaluation. Getting the count for Query 5 only takes a marginal 0.2% longer. The performance can be further improved by indexing the tabled predicate adv
on its second argument, since it is always called with the second argument bound, with the directive:
This change improves the runtime to 181 ms, 17 times faster than without the index.
Apart from indexing and reordering, we can change the recursion to obtain another left-recursive version of the rules above:
However, this version runs slower at 11.3 s, since it immediately creates a subquery with both arguments free (neither X
nor Z
are bound in the head), therefore