Skew-oblivious data routing for data intensive applications on FPGAs with HLS

X Chen, H Tan, Y Chen, B He… - 2021 58th ACM/IEEE …, 2021 - ieeexplore.ieee.org
2021 58th ACM/IEEE Design Automation Conference (DAC), 2021ieeexplore.ieee.org
FPGAs have become emerging computing infrastructures for accelerating applications in
datacenters. Meanwhile, high-level synthesis (HLS) tools have been proposed to ease the
programming of FPGAs. Even with HLS, irregular data-intensive applications require explicit
optimizations, among which multiple processing elements (PEs) with each owning a private
BRAM-based buffer are usually adopted to process multiple data per cycle. Data routing,
which dynamically dispatches multiple data to designated PEs, avoids data replication in …
FPGAs have become emerging computing infrastructures for accelerating applications in datacenters. Meanwhile, high-level synthesis (HLS) tools have been proposed to ease the programming of FPGAs. Even with HLS, irregular data-intensive applications require explicit optimizations, among which multiple processing elements (PEs) with each owning a private BRAM-based buffer are usually adopted to process multiple data per cycle. Data routing, which dynamically dispatches multiple data to designated PEs, avoids data replication in buffers compared to statically assigning data to PEs, hence saving BRAM usage. However, the workload imbalance among PEs vastly diminishes performance when processing skew datasets. In this paper, we propose a skew-oblivious data routing architecture that allocates secondary PEs and schedules them to share the workload of the overloaded PEs at run-time. In addition, we integrate the proposed architecture into a framework called Ditto to minimize the development efforts for applications that require skew handling. We evaluate Ditto on five commonly used applications: histogram building, data partitioning, pagerank, heavy hitter detection and hyperloglog. The results demonstrate that the generated implementations are robust to skew datasets and outperform the state-of-the-art designs in both throughput and BRAM usage efficiency.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果