Unifying primary cache, scratch, and register file memories in a throughput processor

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2014 - dl.acm.org

Recent years have witnessed phenomenal growth in the computational capabilities and
applications of GPUs. However, this trend has also led to a dramatic increase in their power …

被引用次数：294 相关文章所有 13 个版本

[PDF] thecvf.com

Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving

B Wu, F Iandola, PH Jin… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com

Object detection is a crucial task for autonomous driving. In addition to requiring high
accuracy to ensure safety, object detection for autonomous driving also requires real-time …

被引用次数：784 相关文章所有 11 个版本

[PDF] academia.edu

Ultra-performance Pascal GPU and NVLink interconnect

D Foley, J Danskin - IEEE Micro, 2017 - ieeexplore.ieee.org

This article introduces Nvidia's high-performance Pascal GPU. GP100 features in-package
high-bandwidth memory, support for efficient FP16 operations, unified memory, and …

被引用次数：286 相关文章所有 6 个版本

[PDF] ieee.org

A survey of techniques for architecting and managing GPU register file

S Mittal - IEEE Transactions on Parallel and Distributed …, 2016 - ieeexplore.ieee.org

To support their massively-multithreaded architecture, GPUs use very large register file (RF)
which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs …

被引用次数：57 相关文章所有 8 个版本

[PDF] github.io

Adaptive cache management for energy-efficient GPU computing

X Chen, LW Chang, CI Rodrigues, J Lv… - 2014 47th Annual …, 2014 - ieeexplore.ieee.org

With the SIMT execution model, GPUs can hide memory latency through massive
multithreading for many applications that have regular memory access patterns. To support …

被引用次数：211 相关文章所有 16 个版本

[PDF] utexas.edu

Scaling the power wall: a path to exascale

O Villa, DR Johnson, M Oconnor… - SC'14: Proceedings …, 2014 - ieeexplore.ieee.org

Modern scientific discovery is driven by an insatiable demand for computing performance.
The HPC community is targeting development of supercomputers able to sustain 1 ExaFlops …

被引用次数：187 相关文章所有 15 个版本

[PDF] psu.edu

Coordinated static and dynamic cache bypassing for GPUs

X Xie, Y Liang, Y Wang, G Sun… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org

The massive parallel architecture enables graphics processing units (GPUs) to boost
performance for a wide range of applications. Initially, GPUs only employ scratchpad …

被引用次数：167 相关文章所有 10 个版本

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps

N Vijaykumar, G Pekhimenko, A Jog… - ACM SIGARCH …, 2015 - dl.acm.org

Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent
execution of thousands of threads. Unfortunately, different bottlenecks during execution and …

被引用次数：134 相关文章所有 6 个版本

[PDF] psu.edu

Mascar: Speeding up GPU warps by reducing memory pitstops

A Sethia, DA Jamshidi, S Mahlke - 2015 IEEE 21st International …, 2015 - ieeexplore.ieee.org

With the prevalence of GPUs as throughput engines for data parallel workloads, the
landscape of GPU computing is changing significantly. Non-graphics workloads with high …

被引用次数：110 相关文章所有 7 个版本

[PDF] toronto.edu

Zorua: A holistic approach to resource virtualization in GPUs

N Vijaykumar, K Hsieh, G Pekhimenko… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org

This paper introduces a new resource virtualization framework, Zorua, that decouples the
programmer-specified resource usage of a GPU application from the actual allocation in the …

被引用次数：84 相关文章所有 27 个版本

高级搜索

QQ 群