LogicBlox. Since basic bottom-up evaluation computes the ancestry for every possible pair of people, it is extremely inefficient not to take the query into account. LogicBlox takes 79.1 s to get the answers to the query, in contrast to XSB answering the query in a fraction of a second.
The time complexity of the evaluation for this query in LogicBlox is quadratic (with a polylog factor) in the size of the genealogy graph. We can reduce that time to linear, and observe a benefit that brings the performance to acceptable levels, if we perform demand transformation after reordering the goals as we did for XSB. These changes yield the following rules:
After this transformation, the query takes only 1.14 s in LogicBlox, 70 times faster than the original rules. Adding the count for Query 5 results in a marginal increase of only 0.03%.
Summary. We have illustrated the behavior of the two systems for each of our queries. In particular, specifying indexes and tabling for XSB, and demand transformation for LogicBlox have been shown to dramatically improve the running time. Note that these improvements are not constant-time, but rather asymptotic speedups. We have also shown that even when the systems theoretically should have similar asymptotic behavior, differences in implementation can result in significant variations in actual running times.
1.9 Conclusions
Datalog was a product of converging trends in databases, logic programming, and artificial intelligence with the goal of being able to support more advanced (than conventional databases) reasoning while being amenable to optimization and alternative evaluation strategies needed for larger data sets. After the initial surge in enthusiasm, the field entered a period of reduced interest and activity for a number of reasons. Among the most important ones was the blow-back due to “over-promise.” In the 1980s and early 1990s, the technology was simply not ready for production deployment, and it is not coincidental that this period of decline had a big overlap with the so-called “AI Winter.” Back then, the necessary theory and technology was yet to be developed and the pace was not keeping up with the expectations. On top of it, computers were severely underpowered compared to today and simply could not support the computational demands of either AI or logic programming. Another important factor was that the chosen application-area target—databases—was not optimal: DBMSs are demanding to implement, the query language is just one component of a DBMS out of many, and special-purpose solutions—albeit clunky—were available for applications requiring recursion (such as a bill of materials).
By the late 1990s, however, the situation started to change: computers became hundreds of times more powerful and the technology finally started to catch up with the needs. Importantly, that technology came both from the “ivory tower”—academia—as well as from application-driven research and practitioners. Moreover, people discovered that the application base for Datalog is much broader than just data management. Datalog has shown advantages in areas that involve complex, mutually recursive reasoning (program analysis) and benefits for distributed, asynchronous evaluation (declarative networking). Datalog was likewise recognized as an important reasoning tool for the Semantic Web as well as a design and prototyping tool. It is often used as an intermediate form that is easy to map to because it supports analysis and optimization, and can be translated to a variety of representations and evaluation techniques. Finally, Datalog simply took its place in the diverse family of programming languages.
The original idea about Datalog as a query language for DBMS turned out to be not that outlandish either: companies such as LogicBlox and Datomic have built DBMSs based on Datalog, and they include schemas, concurrency control and most of the usual accouterments of database systems. It just took more time.
In summary, do not expect Datalog (or any other language) to replace other technologies going forward, but it is clear that it has established itself as a viable option in commercially relevant applications. One should expect that there will be additional areas where Datalog will show advantages, such as data analytics.
Acknowledgments
The authors wish to thank Jeff Ullman, Raghu Ramakrishnan, Molham Aref, Moshe Vardi, Joe Hellerstein, and Serge Abiteboul for their comments on the ups and downs of Datalog. TJ Green helped early on with the structure of this chapter. Georg Gottlob and Bob Kowalski read this material and shared with us invaluable comments and critique, as did several anonymous reviewers. Special thanks is due Annie Liu for, without her gentle but persistent push and thoughtful comments, this chapter would not have been completed.
References
S. Abiteboul and C. Beeri. Oct. 1995. The power of languages for the manipulation of complex values. The VLDB Journal, 4(4):727–794. DOI: 10.1007/BF01354881. 34
S. Abiteboul and S. Grumbach. Sept. 1987. COL: A logic-based language for complex objects. In Advanced in Database Programming Languages: Proc. of the Workshop on Database Programming Languages, pp. 253–276. DOI: 10.1016/0022-0000(93)90021-N. 33
S. Abiteboul and P. C. Kanellakis. May/Jun. 1989. Object identity as a query language primitive. In ACM SIGMOD Conference on Management of Data, pp. 159–173. DOI: 0.1145/66926.66941. 33
S. Abiteboul and V. Vianu. 1990. Procedural languages for database queries and updates. Journal of Computer and System Sciences, 41(2):181–229. DOI: 10.1016/0022-0000(90)90036-K. 43
S. Abiteboul, R. Hull, and V. Vianu. 1995. Foundations of Databases. Addison-Wesley. 76
S. Abiteboul, P. Buneman, and D. Suciu. 2000. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Francisco, CA. 34, 36
S. Abiteboul, Z. Abrams, S. Haar, and T. Milo. 2005. Diagnosis of asynchronous discrete event systems: Datalog to the rescue! In Proc. of the Twenty-fourth ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems, PODS ’05, pp. 358–367. ACM. DOI: 10.1145/1065167.1065214. 82
S. Abiteboul, M. Bienvenu, A. Galland, and E. Antoine. 2011. A rule-based language for web data management. In Proc. of the Thirtieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’11, pp. 293–304. ACM. DOI: 10.1145/1989284.1989320. 84
F. Afrati and S. S. Cosmadakis. 1989. Expressiveness of restricted recursive queries. In Proc. of the Twenty-first Annual ACM Symposium on Theory of Computing, STOC ’89, pp. 113–126. ACM. DOI: 10.1145/73007.73018. 77
F. Afrati, C. H. Papadimitriou, G. Papageorgiou, A. Roussou, Y. Sagiv, and J. D. Ullman. 1986. Convergence of sideways query evaluation. In ACM Symposium on Principles of Database Systems, pp. 24–30. DOI: 10.1145/6012.15400.. 42
R. Agrawal, R. Cochrane, and B. Lindsay. 1991. On maintaining priorities in a production rule system. In International Conference on Very Large Data Bases, pp. 479–487. Morgan Kaufmann, San Francisco, CA. 42
A. V. Aho and J. D. Ullman. 1979. Universality of data retrieval languages. In Proc. of the 6th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL ’79, pp. 110–119. ACM. DOI: 10.1145/567752.567763. 13, 50, 77
H. Aït-Kaci and R. Nasr. 1986. LOGIN: A logic programming language with builtin inheritance. Journal of Logic Programming, 3: 185–215. DOI: 10.1016/0743-1066(86)90013-0. 33, 34
H. Aït-Kaci and A. Podelski. Aug. 1993. Towards a meaning of LIFE. Journal of Logic Programming, 16: 195–234. DOI: 10.1016/0743-1066(93)90043-G. 33, 34
J. J. Alferes, A. Brogi, J. A. Leite, and L. M. Pereira. Sept. 2002. Evolving logic programs. In Proc. of Logics in Artificial