A stall-aware warp scheduling for dynamically optimizing thread-level parallelism in GPGPUs

Y Yu, W Xiao, X He, H Guo, Y Wang… - Proceedings of the 29th …, 2015 - dl.acm.org
General-Purpose Graphic Processing Units (GPGPU) have been widely used in high
performance computing as application accelerators due to their massive parallelism and …

Streaming architectures and technology trends

J Owens - ACM SIGGRAPH 2005 Courses, 2005 - dl.acm.org
Modern technology allows the designers of today's processors to incorporate enormous
computation resources into their latest chips. The challenge for these architects is to …

[PDF][PDF] Compiler techniques for scalable performance of stream programs on multicore architectures

MI Gordon - 2010 - Citeseer
Given the ubiquity of multicore processors, there is an acute need to enable the
development of scalable parallel applications without unduly burdening programmers …

[图书][B] Performance analysis and tuning for general purpose graphics processing units (GPGPU)

H Kim, R Vuduc, S Baghsorkhi - 2012 - books.google.com
General-purpose graphics processing units (GPGPU) have emerged as an important class
of shared memory parallel processing architectures, with widespread deployment in every …

Juggler: a dependence-aware task-based execution framework for GPUs

ME Belviranli, S Lee, JS Vetter, LN Bhuyan - Proceedings of the 23rd …, 2018 - dl.acm.org
Scientific applications with single instruction, multiple data (SIMD) computations show
considerable performance improvements when run on today's graphics processing units …

Efficient handling of stream buffers in GPU stream-based computing platform

S Yamagiwa, M Arai, K Wada - Proceedings of 2011 IEEE …, 2011 - ieeexplore.ieee.org
GPU-based computing has become one of the popular high performance computing fields.
The field is called GPGPU. This paper presents design and implementation of a uniform …

Improving the efficiency of GPGPU work-queue through data awareness

L Huang, Y Lü, L Shen, Z Wang - ACM Transactions on Architecture and …, 2017 - dl.acm.org
The architecture and programming model of current GPGPUs are best suited for applications
that are dominated by structured control and data flows across large regular datasets …

GPU-accelerated high-throughput online stream data processing

Z Chen, J Xu, J Tang, KA Kwiat… - … Transactions on Big …, 2016 - ieeexplore.ieee.org
The Single Instruction Multiple Data (SIMD) architecture of Graphic Processing Units (GPUs)
makes them perfect for parallel processing of big data. In this paper, we present the design …

Many-thread aware instruction-level parallelism: Architecting shader cores for GPU computing

P Xiang, Y Yang, M Mantor, N Rubin… - Proceedings of the 21st …, 2012 - dl.acm.org
The design philosophy of many-core architectures such as graphics processing units
(GPUs) is to exploit thread-level parallelism (TLP) to achieve high throughput. Compared to …

[引用][C] Computing speeds soar with parallel processing

H Falk - Computer Design, 1988 - dl.acm.org
Computing speeds soar with parallel processing | Computer Design skip to main content ACM
Digital Library home ACM home Google, Inc. (search) Advanced Search Browse About Sign in …