- 学术资源搜索

Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org

In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

被引用次数：65 相关文章所有 3 个版本

[PDF] arxiv.org

Flashattention-3: Fast and accurate attention with asynchrony and low-precision

J Shah, G Bikshandi, Y Zhang, V Thakkar… - arXiv preprint arXiv …, 2024 - arxiv.org

Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for
large language models and long-context applications. FlashAttention elaborated an …

被引用次数：44 相关文章所有 4 个版本

[PDF] wiley.com Full View

Review of data science trends and issues in porous media research with a focus on image‐based techniques

A Rabbani, AM Fernando, R Shams… - Water Resources …, 2021 - Wiley Online Library

Data science as a flourishing interdisciplinary domain of computer and mathematical
sciences is playing an important role in guiding the porous material research streams. In the …

被引用次数：26 相关文章所有 5 个版本

[PDF] ubc.ca

GPUWattch: Enabling energy optimizations in GPGPUs

J Leng, T Hetherington, A ElTantawy, S Gilani… - ACM SIGARCH …, 2013 - dl.acm.org

General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and
performance per watt has emerged as a more crucial evaluation metric than peak …

被引用次数：785 相关文章所有 21 个版本

[PDF] cmu.edu

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance

A Jog, O Kayiran, N Chidambaram Nachiappan… - ACM SIGPLAN …, 2013 - dl.acm.org

Emerging GPGPU architectures, along with programming models like CUDA and OpenCL,
offer a cost-effective platform for many applications by providing high thread level …

被引用次数：387 相关文章所有 21 个版本

[PDF] arxiv.org

Lightlda: Big topic models on modest computer clusters

J Yuan, F Gao, Q Ho, W Dai, J Wei, X Zheng… - Proceedings of the 24th …, 2015 - dl.acm.org

When building large-scale machine learning (ML) programs, such as massive topic models
or deep neural networks with up to trillions of parameters and training examples, one usually …

被引用次数：242 相关文章所有 10 个版本

[PDF] psu.edu

Hardware acceleration of database operations

J Casper, K Olukotun - Proceedings of the 2014 ACM/SIGDA …, 2014 - dl.acm.org

As the amount of memory in database systems grows, entire database tables, or even
databases, are able to fit in the system's memory, making in-memory database operations …

被引用次数：230 相关文章所有 11 个版本

[PDF] psu.edu

Orchestrated scheduling and prefetching for GPGPUs

A Jog, O Kayiran, AK Mishra, MT Kandemir… - Proceedings of the 40th …, 2013 - dl.acm.org

In this paper, we present techniques that coordinate the thread scheduling and prefetching
decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better …

被引用次数：258 相关文章所有 19 个版本

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps

N Vijaykumar, G Pekhimenko, A Jog… - ACM SIGARCH …, 2015 - dl.acm.org

Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent
execution of thousands of threads. Unfortunately, different bottlenecks during execution and …

被引用次数：134 相关文章所有 6 个版本

[PDF] archive.org

Scalable kernel fusion for memory-bound GPU applications

M Wahib, N Maruyama - SC'14: Proceedings of the …, 2014 - ieeexplore.ieee.org

GPU implementations of HPC applications relying on finite difference methods can include
tens of kernels that are memory-bound. Kernel fusion can improve performance by reducing …

被引用次数：125 相关文章所有 7 个版本

高级搜索

QQ 群