Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS

Q Jiao, M Lu, HP Huynh, T Mitra - 2015 IEEE/ACM International …, 2015 - ieeexplore.ieee.org
Current generation GPUs can accelerate high-performance, compute-intensive applications
by exploiting massive thread-level parallelism. The high performance, however, comes at …

Roofline-aware DVFS for GPUs

C Nugteren, GJ van den Braak… - … of International Workshop …, 2014 - dl.acm.org
Graphics processing units (GPUs) are becoming increasingly popular for compute
workloads, mainly because of their large number of processing elements and high …

[HTML][HTML] vkpolybench: A crossplatform Vulkan Compute port of the PolyBench/GPU benchmark suite

N Capodieci, R Cavicchioli - SoftwareX, 2021 - Elsevier
PolyBench is a well-known set of benchmarks characterized by embarrassingly parallel
kernels able to run on Graphic Processing Units (GPUs). While Polybench GPU kernels …

Evaluation of autoparallelization toolkits for commodity gpus

D Williams, V Codreanu, P Yang, B Liu, F Dong… - Parallel Processing and …, 2014 - Springer
In this paper we evaluate the performance of the OpenACC and Mint toolkits against C and
CUDA implementations of the standard PolyBench test suite. Our analysis reveals that …

[PDF][PDF] Improving GPU performance: reducing memory conflicts and latency

GJW van den Braak - 2015 - research.tue.nl
Modern day life is unimaginable without all the ICT technology we use every day, like
computers, tablets, smart phones, digital cameras, etc. All this technology uses an enormous …

[PDF][PDF] Improving the programmability of GPU architectures

C Nugteren - 2014 - research.tue.nl
In front of you lies the pinnacle of a PhD-student's hard work: the thesis. However, hidden
from the reader's eyes is the path that has led to this result. How was the subject chosen …

Scaling application properties to exascale

G Mariani, A Anghel, R Jongerius… - Proceedings of the 12th …, 2015 - dl.acm.org
Exascale computing systems will execute computationally intensive tasks on unprecedented
amounts of data. Tuning the design of such systems for a specific application or for an …

[PDF][PDF] 通用图形处理器线程调度优化方法研究综述

何炎祥, 张军, 沈凡凡, 江南, 李清安, 刘子骏 - 计算机学报, 2016 - cjc.ict.ac.cn
摘要随着通用图形处理器(GPGPU) 并行计算能力的日益增强, 其应用范围越来越广泛.
然而由于不规则计算任务使得通用图形处理器资源难以得到充分利用, 其性能并未达到最大 …

Memory Request Priority Based Warp Scheduling for GPUs

J Zhang, Y He, F Shen, Q Li… - Chinese Journal of …, 2018 - Wiley Online Library
High performance of GPGPU comes from its super massive multithreading, which makes it
more and more widely used especially in the field of throughputoriented. Data locality is one …

[PDF][PDF] Accelerating a movie recommender system using VirtualCL on a heterogeneous GPU cluster

A Bhatnagar - 2015 - pure.tue.nl
Present day market offers a large number of movies which overwhelm people with choices.
In order to quickly navigate through all the possible movies and find the interesting ones, the …