Instruction roofline: An insightful visual performance model for gpus

M Haseeb, N Ding, J Deslippe… - … and productivity in HPC …, 2021 - ieeexplore.ieee.org

Traditional scientific simulations have for quite some time, dominated the workloads of high-
performance computing infrastructures across the world. With recent advancement in data …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

Dynamic stashing quantization for efficient transformer training

G Yang, D Lo, R Mullins, Y Zhao - arXiv preprint arXiv:2303.05295, 2023 - arxiv.org

Large Language Models (LLMs) have demonstrated impressive performance on a range of
Natural Language Processing (NLP) tasks. Unfortunately, the immense amount of …

被引用次数：8 相关文章所有 6 个版本

A comparison of two performance portability metrics

A Marowka - Concurrency and Computation: Practice and …, 2023 - Wiley Online Library

The rise in the demand for new performance portability frameworks for heterogeneous
computing systems has brought with it a number of proposals of workable metrics for …

被引用次数：5 相关文章

[PDF] nature.com

GPU-acceleration of the distributed-memory database peptide search of mass spectrometry data

M Haseeb, F Saeed - Scientific Reports, 2023 - nature.com

Database peptide search is the primary computational technique for identifying peptides
from the mass spectrometry (MS) data. Graphical Processing Units (GPU) computing is now …

被引用次数：1 相关文章所有 9 个版本

Shifting Between Compute and Memory Bounds: A Compression-Enabled Roofline Model

R Naraparaju, T Zhao, Y Hu, D Zhao… - SC24-W: Workshops …, 2024 - ieeexplore.ieee.org

In the evolving landscape of high-performance computing, especially to fight the end of
Moore's Law and Dennard's Scaling, the ability to shift between compute-bound and …

被引用次数：1 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Starlight: A kernel optimizer for GPU processing

A Zeni, E Del Sozzo, E D'Arnese, D Conficconi… - Journal of Parallel and …, 2024 - Elsevier

Over the past few years, GPUs have found widespread adoption in many scientific domains,
offering notable performance and energy efficiency advantages compared to CPUs …

被引用次数：1 相关文章所有 5 个版本

高级搜索

QQ 群