ArrayFire: a GPU acceleration platform

Z Yin, H Lan, G Tan, M Lu, AV Vasilakos… - Computational and …, 2017 - Elsevier

The last decade has witnessed an explosion in the amount of available biological sequence
data, due to the rapid progress of high-throughput sequencing projects. However, the …

被引用次数：85 相关文章所有 11 个版本

[PDF] arxiv.org

Wav2letter++: A fast open-source speech recognition system

V Pratap, A Hannun, Q Xu, J Cai, J Kahn… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

This paper introduces wav2letter++, a fast open-source deep learning speech recognition
framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for …

被引用次数：243 相关文章所有 8 个版本

[PDF] arxiv.org

Effective extensible programming: unleashing Julia on GPUs

T Besard, C Foket, B De Sutter - IEEE Transactions on Parallel …, 2018 - ieeexplore.ieee.org

GPUs and other accelerators are popular devices for accelerating compute-intensive,
parallelizable applications. However, programming these devices is a difficult task. Writing …

被引用次数：274 相关文章所有 7 个版本

[PDF] thecvf.com

PyPose: A library for robot learning with physics-based optimization

C Wang, D Gao, K Xu, J Geng, Y Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Deep learning has had remarkable success in robotic perception, but its data-centric nature
suffers when it comes to generalizing to ever-changing environments. By contrast, physics …

被引用次数：36 相关文章所有 10 个版本

[PDF] albany.edu

Voodoo-a vector algebra for portable database performance on modern hardware

H Pirk, O Moll, M Zaharia, S Madden - Proceedings of the VLDB …, 2016 - dl.acm.org

In-memory databases require careful tuning and many engineering tricks to achieve good
performance. Such database performance engineering is hard: a plethora of data and …

被引用次数：150 相关文章所有 8 个版本

[PDF] mlr.press

Flashlight: Enabling innovation in tools for machine learning

JD Kahn, V Pratap, T Likhomanenko… - International …, 2022 - proceedings.mlr.press

As the computational requirements for machine learning systems and the size and
complexity of machine learning frameworks increases, essential framework innovation has …

被引用次数：26 相关文章所有 6 个版本

[PDF] ic.ac.uk

Efficient top-k query processing on massively parallel hardware

A Shanbhag, H Pirk, S Madden - Proceedings of the 2018 International …, 2018 - dl.acm.org

A common operation in many data analytics workloads is to find the top-k items, ie, the
largest or smallest operations according to some sort order (implemented via LIMIT or …

被引用次数：83 相关文章所有 4 个版本

[PDF] arxiv.org

Automated translation and accelerated solving of differential equations on multiple GPU platforms

U Utkarsh, V Churavy, Y Ma, T Besard… - Computer Methods in …, 2024 - Elsevier

We demonstrate a high-performance vendor-agnostic method for massively parallel solving
of ensembles of ordinary differential equations (ODEs) and stochastic differential equations …

被引用次数：13 相关文章所有 8 个版本

[PDF] academia.edu

Multi-GPU system design with memory networks

G Kim, M Lee, J Jeong, J Kim - 2014 47th Annual IEEE/ACM …, 2014 - ieeexplore.ieee.org

GPUs are being widely used to accelerate different workloads and multi-GPU systems can
provide higher performance with multiple discrete GPUs interconnected together. However …

被引用次数：68 相关文章所有 8 个版本

[PDF] nature.com

A 3D ray traced biological neural network learning model

B Yuen, X Dong, T Lu - Nature Communications, 2024 - nature.com

Training large neural networks on big datasets requires significant computational resources
and time. Transfer learning reduces training time by pre-training a base model on one …

被引用次数：1 相关文章所有 7 个版本

高级搜索

QQ 群