CS*: Approximate Query Processing on Big Data using Scalable Join Correlated Sample Synopsis- 学术资源搜索

CS*: Approximate Query Processing on Big Data using Scalable Join Correlated Sample Synopsis

F Yu, WC Hou - 2019 IEEE International Conference on Big …, 2019 - ieeexplore.ieee.org

F Yu, WC Hou

2019 IEEE International Conference on Big Data (Big Data), 2019•ieeexplore.ieee.org

Complex join queries are expensive to process on big data. Providing fast and accurate approximations to join queries with common aggregate functions can bring tremendous benefits in many fields such as data management, data mining, and machine learning. The state-of-the-art methods mainly focus on generating non-reusable samples during query time which can be costly for big data applications. In this research, we develop a scalable sample-based synopsis, called Scalable Join Correlated Sample Synopsis (or CS*), which can be pre-computed and doesn’t rely on any index structure. CS* only needs to be generated once and can be used to answer all future queries on the same database. It efficiently maintains join relationships between sampled tuples thanks to the introduced scheme of scalable join correlated sampling and a unique numerical value called join ratio (or JR). We further introduce two novel data structures, namely count trace and join correlated histogram, to optimize the calculation of JR values in map-reduce. For query estimations, multiple unbiased estimators are developed on CS* to provide fast and accurate approximations for join queries with common aggregate functions, acyclic or cyclic join graphs, and dangling tuples. The experimental study on large datasets demonstrates that CS* can be efficiently generated and provides accurate join query estimations with small sampling fractions.

ieeexplore.ieee.org

展开收起

被引用次数：2 相关文章所有 2 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

CS*: Approximate Query Processing on Big Data using Scalable Join Correlated Sample Synopsis

引用