A high throughput B+ tree for SIMD architectures

W Zhang, Z Yan, Y Lin, C Zhao… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
W Zhang, Z Yan, Y Lin, C Zhao, L Peng
IEEE Transactions on Parallel and Distributed Systems, 2019ieeexplore.ieee.org
B+ tree is one of the most important data structures and has been widely used in different
fields. With the increase of concurrent queries and data-scale in storage, designing an
efficient B+ tree structure has become critical. Due to abundant computation resources,
SIMD architectures provide potential opportunities to achieve high query throughput for B+
tree. However, prior methods cannot achieve satisfactory performance results due to low
resource utilization and poor memory performance. In this paper, we first identify the gaps …
B+tree is one of the most important data structures and has been widely used in different fields. With the increase of concurrent queries and data-scale in storage, designing an efficient B+tree structure has become critical. Due to abundant computation resources, SIMD architectures provide potential opportunities to achieve high query throughput for B+tree. However, prior methods cannot achieve satisfactory performance results due to low resource utilization and poor memory performance. In this paper, we first identify the gaps between B+tree and SIMD architectures. Concurrent B+tree queries involve many global memory accesses and different divergences, which mismatch with SIMD architecture features. Based on this observation, we propose Harmonia, a novel B+tree structure to bridge the gaps. In Harmonia, a B+tree structure is divided into a key region and a child region. The key region stores the nodeswith its keys in a breadth-first order. The child region is organized as a prefix-sum array, which only stores each node's first child index in the key region. Since the prefix-sum child region is small and the children's index can be retrieved through index computations, most of it can be stored in on-chip caches, which can achieve good cache locality. To make it more efficient, Harmonia also includes two optimizations: partially-sorted aggregation and narrowed thread-group traversal, which can mitigate memory and execution divergence and improve resource utilization. Evaluations on a 28-core INTEL CPU show that Harmonia can achieve up to 207 million queries per second, which is about 1.7X faster than that of CPU-based HB+Tree, a recent state-of-the-art solution. And on a Volta TITAN V GPU, it can achieve up to 3.6 billion queries per second, which is about 3.4X faster than that of GPU-based HB+Tree.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果