A performance analysis framework for identifying potential benefits in GPGPU applications

RA Bridges, N Imam, TM Mintz - ACM Computing Surveys (CSUR), 2016 - dl.acm.org

Modern graphics processing units (GPUs) have complex architectures that admit
exceptional performance and energy efficiency for high-throughput applications. Although …

被引用次数：155 相关文章所有 4 个版本

[PDF] whiterose.ac.uk

End-to-end deep learning of optimization heuristics

C Cummins, P Petoumenos, Z Wang… - 2017 26th …, 2017 - ieeexplore.ieee.org

Accurate automatic optimization heuristics are necessary for dealing with thecomplexity and
diversity of modern hardware and software. Machine learning is aproven technique for …

被引用次数：259 相关文章所有 10 个版本

[PDF] arxiv.org

A survey on agent-based simulation using hardware accelerators

J Xiao, P Andelfinger, D Eckhoff, W Cai… - ACM Computing Surveys …, 2019 - dl.acm.org

Due to decelerating gains in single-core CPU performance, computationally expensive
simulations are increasingly executed on highly parallel hardware platforms. Agent-based …

被引用次数：53 相关文章所有 10 个版本

[PDF] computermachines.org

GPGPU performance and power estimation using machine learning

G Wu, JL Greathouse, A Lyashevsky… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org

Graphics Processing Units (GPUs) have numerous configuration and design options,
including core frequency, number of parallel compute units (CUs), and available memory …

被引用次数：275 相关文章所有 9 个版本

[PDF] psu.edu

A simplified and accurate model of power-performance efficiency on emergent GPU architectures

S Song, C Su, B Rountree… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org

Emergent heterogeneous systems must be optimized for both power and performance at
exascale. Massive parallelism combined with complex memory hierarchies form a barrier to …

被引用次数：216 相关文章所有 6 个版本

[PDF] iitd.ac.in

Demystifying tensorrt: Characterizing neural network inference engine on nvidia edge devices

O Shafi, C Rai, R Sen… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org

Edge devices are seeing tremendous growth in sensing and computational capabilities.
Running state-of-the-art deep neural network (NN) based data processing on multi-core …

被引用次数：52 相关文章所有 3 个版本

[PDF] arxiv.org

Kernelet: High-throughput GPU kernel executions with dynamic slicing and scheduling

J Zhong, B He - IEEE Transactions on Parallel and Distributed …, 2013 - ieeexplore.ieee.org

Graphics processors, or GPUs, have recently been widely used as accelerators in shared
environments such as clusters and clouds. In such shared environments, many kernels are …

被引用次数：173 相关文章所有 11 个版本

[PDF] escholarship.org

[图书][B] Understanding latency hiding on GPUs

V Volkov - 2016 - search.proquest.com

Modern commodity processors such as GPUs may execute up to about a thousand of
physical threads per chip to better utilize their numerous execution units and hide execution …

被引用次数：124 相关文章所有 4 个版本

[PDF] acm.org

Automated smartnic offloading insights for network functions

Y Qiu, J Xing, KF Hsu, Q Kang, M Liu… - Proceedings of the …, 2021 - dl.acm.org

The gap between CPU and networking speeds has motivated the development of
SmartNICs for NF (network functions) offloading. However, offloading performance is …

被引用次数：51 相关文章所有 10 个版本

[PDF] researchgate.net

A performance analysis framework for optimizing OpenCL applications on FPGAs

Z Wang, B He, W Zhang, S Jiang - 2016 IEEE International …, 2016 - ieeexplore.ieee.org

Recently, FPGA vendors such as Altera and Xilinx have released OpenCL SDK for
programming FPGAs. However, the architecture of FPGA is significantly different from that of …

被引用次数：117 相关文章所有 7 个版本

高级搜索

QQ 群