Formal semantics and high performance in declarative machine learning using datalog

J Wang, J Wu, M Li, J Gu, A Das, C Zaniolo - The VLDB Journal, 2021 - Springer
The VLDB Journal, 2021Springer
With an escalating arms race to adopt machine learning (ML) in diverse application
domains, there is an urgent need to support declarative machine learning over distributed
data platforms. Toward this goal, a new framework is needed where users can specify ML
tasks in a manner where programming is decoupled from the underlying algorithmic and
system concerns. In this paper, we argue that declarative abstractions based on Datalog are
natural fits for machine learning and propose a purely declarative ML framework with a …
Abstract
With an escalating arms race to adopt machine learning (ML) in diverse application domains, there is an urgent need to support declarative machine learning over distributed data platforms. Toward this goal, a new framework is needed where users can specify ML tasks in a manner where programming is decoupled from the underlying algorithmic and system concerns. In this paper, we argue that declarative abstractions based on Datalog are natural fits for machine learning and propose a purely declarative ML framework with a Datalog query interface. We show that using aggregates in recursive Datalog programs entails a concise expression of ML applications, while providing a strictly declarative formal semantics. This is achieved by introducing simple conditions under which the semantics of recursive programs is guaranteed to be equivalent to that of aggregate-stratified ones. We further provide specialized compilation and planning techniques for semi-naive fixpoint computation in the presence of aggregates and optimization strategies that are effective on diverse recursive programs and distributed data platforms. To test and demonstrate these research advances, we have developed a powerful and user-friendly system on top of Apache Spark. Extensive evaluations on large-scale datasets illustrate that this approach will achieve promising performance gains while improving both programming flexibility and ease of development and deployment for ML applications.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果