MGPUSim: Enabling multi-GPU performance modeling and optimization

Y Sun, T Baruah, SA Mojumder, S Dong… - Proceedings of the 46th …, 2019 - dl.acm.org
The rapidly growing popularity and scale of data-parallel workloads demand a
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …

Blockmaestro: Enabling programmer-transparent task-based execution in gpu systems

AA Abdolrashidi, HA Esfeden… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
As modern GPU workloads grow in size and complexity, there is an ever-increasing demand
for GPU computational power. Emerging workloads contain hundreds or thousands of GPU …

CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization

P Dalmia, RS Kumar, MD Sinclair - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
Chiplets are transforming computer system designs, allowing system designers to combine
heterogeneous computing resources at unprecedented scales. Breaking larger, mono-lithic …

Mozart: Taming taxes and composing accelerators with shared-memory

V Suresh, B Mishra, Y Jing, Z Zhu, N Jin… - Proceedings of the …, 2024 - dl.acm.org
Resource-constrained system-on-chips (SoCs) are increasingly heterogeneous with
specialized accelerators for various tasks. Acceleration taxes due to control and data …

[PDF][PDF] 5g emerging technology and affected industries: Quick survey

S Sarraf - American Scientific Research Journal for Engineering …, 2019 - researchgate.net
The fifth generation of cellular mobile communication called 5G will be publicly available in
the near future to connect more than 8.4 billion devices. 5G has been designed to improve …

{M³x}: Autonomous Accelerators via {Context-Enabled}{Fast-Path} Communication

N Asmussen, M Roitzsch, H Härtig - 2019 USENIX Annual Technical …, 2019 - usenix.org
Performance and efficiency requirements are driving a trend towards specialized
accelerators in both datacenters and embedded devices. In order to cut down …

Global Optimizations & Lightweight Dynamic Logic for Concurrency

S Pati, S Aga, N Jayasena, MD Sinclair - arXiv preprint arXiv:2409.02227, 2024 - arxiv.org
Modern accelerators like GPUs are increasingly executing independent operations
concurrently to improve the device's compute utilization. However, effectively harnessing it …

Edge: Event-driven gpu execution

TH Hetherington, M Lubeznov, D Shah… - 2019 28th …, 2019 - ieeexplore.ieee.org
GPUs are known to benefit structured applications with ample parallelism, such as deep
learning in a datacenter. Recently, GPUs have shown promise for irregular streaming …

Design considerations for GPU‐aware collective communications in MPI

I Faraji, A Afsahi - Concurrency and Computation: Practice and …, 2018 - Wiley Online Library
GPU accelerators have established themselves in the state‐of‐the‐art clusters by offering
high performance and energy efficiency. In such systems, efficient inter‐process GPU …

NUMA-Aware Queue Scheduler for Multi-Chiplet GPUs

N Surawar - 2024 - minds.wisconsin.edu
Chiplet-based architectures have recently emerged as a technique to improve yields and
enable continued performance scaling. However, the increased modularity and scalability …