Skew-aware join optimization for array databases

J Duggan, O Papaemmanouil, L Battle… - Proceedings of the …, 2015 - dl.acm.org
J Duggan, O Papaemmanouil, L Battle, M Stonebraker
Proceedings of the 2015 ACM SIGMOD International Conference on Management of …, 2015dl.acm.org
Science applications are accumulating an ever-increasing amount of multidimensional data.
Although some of it can be processed in a relational database, much of it is better suited to
array-based engines. As such, it is important to optimize the query processing of these
systems. This paper focuses on efficient query processing of join operations within an array
database. These engines invariably``chunk''their data into multidimensional tiles that they
use to efficiently process spatial queries. As such, traditional relational algorithms need to be …
Science applications are accumulating an ever-increasing amount of multidimensional data. Although some of it can be processed in a relational database, much of it is better suited to array-based engines. As such, it is important to optimize the query processing of these systems. This paper focuses on efficient query processing of join operations within an array database. These engines invariably ``chunk'' their data into multidimensional tiles that they use to efficiently process spatial queries. As such, traditional relational algorithms need to be substantially modified to take advantage of array tiles. Moreover, most n-dimensional science data is unevenly distributed in array space because its underlying observations rarely follow a uniform pattern. It is crucial that the optimization of array joins be skew-aware. In addition, owing to the scale of science applications, their query processing usually spans multiple nodes. This further complicates the planning of array joins.
In this paper, we introduce a join optimization framework that is skew-aware for distributed joins. This optimization consists of two phases. In the first, a logical planner selects the query's algorithm (e.g., merge join), the granularity of the its tiles, and the reorganization operations needed to align the data. The second phase implements this logical plan by assigning tiles to cluster nodes using an analytical cost model. Our experimental results, on both synthetic and real-world data, demonstrate that this optimization framework speeds up array joins by up to 2.5X in comparison to the baseline.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果

Google学术搜索按钮

example.edu/paper.pdf
搜索
获取 PDF 文件
引用
References