A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling

E Konstantinidis, Y Cotronis - Journal of Parallel and Distributed Computing, 2017 - Elsevier
Typically, the execution time of a kernel on a GPU is a difficult to predict measure as it
depends on a wide range of factors. Performance can be limited by either memory transfer …

GPGPU performance estimation with core and memory frequency scaling

Q Wang, X Chu - IEEE Transactions on Parallel and Distributed …, 2020 - ieeexplore.ieee.org
Contemporary graphics processing units (GPUs) support dynamic voltage and frequency
scaling to balance computational performance and energy consumption. However, accurate …

Energy-aware non-preemptive task scheduling with deadline constraint in dvfs-enabled heterogeneous clusters

Q Wang, X Mei, H Liu, YW Leung, Z Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Energy conservation of large data centers for high performance computing workloads, such
as deep learning with Big Data, is of critical significance, where cutting down a few percent …

Energy-efficient run-time mapping and thread partitioning of concurrent OpenCL applications on CPU-GPU MPSoCs

AK Singh, A Prakash, KR Basireddy… - ACM Transactions on …, 2017 - dl.acm.org
Heterogeneous Multi-Processor Systems-on-Chips (MPSoCs) containing CPU and GPU
cores are typically required to execute applications concurrently. However, as will be shown …

Energy-aware task scheduling with deadline constraint in DVFS-enabled heterogeneous clusters

X Mei, Q Wang, X Chu, H Liu, YW Leung… - arXiv preprint arXiv …, 2021 - arxiv.org
Energy conservation of large data centers for high-performance computing workloads, such
as deep learning with big data, is of critical significance, where cutting down a few percent of …

Collaborative adaptation for energy-efficient heterogeneous mobile SoCs

AK Singh, KR Basireddy, A Prakash… - IEEE Transactions …, 2019 - ieeexplore.ieee.org
Heterogeneous Mobile System-on-Chips (SoCs) containing CPU and GPU cores are
becoming prevalent in embedded computing, and they need to execute applications …

A statistic approach for power analysis of integrated GPU

Q Wang, N Li, L Shen, Z Wang - Soft Computing, 2019 - Springer
As datasets grow, high performance computing has gradually become an important tool for
artificial intelligence, particularly due to the powerful and efficient parallel computing …

Metric selection for gpu kernel classification

SK Shekofteh, H Noori, M Naghibzadeh… - ACM Transactions on …, 2019 - dl.acm.org
Graphics Processing Units (GPUs) are vastly used for running massively parallel programs.
GPU kernels exhibit different behavior at runtime and can usually be classified in a simple …

Resource scheduling of information platform for general grid computing framework

M You, W Luo, M He - International Journal of Web and Grid …, 2020 - inderscienceonline.com
The corresponding concepts and calculation methods are very in line with the requirements
of information platform resource scheduling. Based on this, this paper discusses the related …

SDAM: a combined stack distance-analytical modeling approach to estimate memory performance in GPUs

M Kiani, A Rajabzadeh - The Journal of Supercomputing, 2021 - Springer
Graphics processing units (GPUs) are powerful in performing data-parallel applications.
Such applications most often rely on the GPU's memory hierarchy to deliver high …