An instruction roofline model for gpus

F Chern, B Hechtman, A Davis, R Guo… - Advances in …, 2022 - proceedings.neurips.cc

This paper presents a novel nearest neighbor search algorithm achieving TPU (Google
Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms …

被引用次数：21 相关文章所有 5 个版本

[PDF] vldb.org

GPU Database Systems Characterization and Optimization

J Cao, R Sen, M Interlandi, J Arulraj, H Kim - Proceedings of the VLDB …, 2023 - dl.acm.org

GPUs offer massive parallelism and high-bandwidth memory access, making them an
attractive option for accelerating data analytics in database systems. However, while modern …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Logan: High-performance gpu-based x-drop long-read alignment

A Zeni, G Guidi, M Ellis, N Ding… - 2020 IEEE …, 2020 - ieeexplore.ieee.org

Pairwise sequence alignment is one of the most computationally intensive kernels in
genomic data analysis, accounting for more than 90% of the runtime for key bioinformatics …

被引用次数：58 相关文章所有 13 个版本

[PDF] escholarship.org

A comprehensive methodology to optimize FPGA designs via the roofline model

M Siracusa, E Del Sozzo, M Rabozzi… - IEEE Transactions …, 2021 - ieeexplore.ieee.org

With reconfigurable fabrics delivering increasing performance over the years, Field-
Programmable Gate Arrays (FPGAs) are becoming an appealing solution for next …

被引用次数：32 相关文章所有 4 个版本

Hybrid, scalable, trace-driven performance modeling of GPGPUs

Y Arafa, AH Badawy, A ElWazir, A Barai… - Proceedings of the …, 2021 - dl.acm.org

In this paper, we present PPT-GPU, a scalable performance prediction toolkit for GPUs. PPT-
GPU achieves scalability through a hybrid high-level modeling approach where some …

被引用次数：27 相关文章所有 4 个版本

[HTML] nih.gov

Timemory: modular performance analysis for HPC

JR Madsen, MG Awan, H Brunie, J Deslippe… - … Conference, ISC High …, 2020 - Springer

HPC has undergone a significant transition toward heterogeneous architectures. This
transition has introduced several issues in code migration to support multiple frameworks for …

被引用次数：34 相关文章所有 7 个版本

Fast HBM access with FPGAs: Analysis, architectures, and applications

P Holzinger, D Reiser, T Hahn… - 2021 IEEE …, 2021 - ieeexplore.ieee.org

Over the past few decades, the gap between rapidly increasing computational power and
almost stagnating memory bandwidth has steadily worsened. Recently, 3D die-stacking in …

被引用次数：25 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] GPU-optimized approaches to molecular docking-based virtual screening in drug discovery: A comparative analysis

E Vitali, F Ficarelli, M Bisson, D Gadioli… - Journal of Parallel and …, 2024 - Elsevier

Finding a novel drug is a very long and complex procedure. Using computer simulations, it is
possible to accelerate the preliminary phases by performing a virtual screening that filters a …

被引用次数：21 相关文章所有 9 个版本

A cad-based methodology to optimize hls code via the roofline model

M Siracusa, L Di Tucci, M Rabozzi, S Williams… - Proceedings of the 39th …, 2020 - dl.acm.org

The intrinsic complexity of modern computing systems requires structured methods for
analyzing and optimizing application performance. In this context, the Roofline model …

被引用次数：24 相关文章所有 3 个版本

[PDF] arxiv.org

Hierarchical roofline performance analysis for deep learning applications

C Yang, Y Wang, T Kurth, S Farrell… - … Computing: Proceedings of …, 2021 - Springer

This paper presents a practical methodology for collecting performance data necessary to
conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the …

被引用次数：25 相关文章所有 5 个版本

高级搜索

QQ 群