[HTML][HTML] A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives

B Peccerillo, M Mannino, A Mondelli… - Journal of Systems …, 2022 - Elsevier
In recent years, the limits of the multicore approach emerged in the so-called “dark silicon”
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …

Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration

H Genc, S Kim, A Amid, A Haj-Ali, V Iyer… - 2021 58th ACM/IEEE …, 2021 - ieeexplore.ieee.org
DNN accelerators are often developed and evaluated in isolation without considering the
cross-stack, system-level effects in real-world environments. This makes it difficult to …

Processors, methods, and systems with a configurable spatial accelerator

KE Fleming, KD Glossop, SC Steely Jr, J Tang… - US Patent …, 2020 - Google Patents
2017-08-09 Assigned to INTEL CORPORATION reassignment INTEL CORPORATION
ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors …

Mosaic: a GPU memory manager with application-transparent support for multiple page sizes

R Ausavarungnirun, J Landgraf, V Miller… - Proceedings of the 50th …, 2017 - dl.acm.org
Contemporary discrete GPUs support rich memory management features such as virtual
memory and demand paging. These features simplify GPU programming by providing a …

Telekine: Secure computing with cloud {GPUs}

T Hunt, Z Jia, V Miller, A Szekely, Y Hu… - … USENIX Symposium on …, 2020 - usenix.org
GPUs have become ubiquitous in the cloud due to the dramatic performance gains they
enable in domains such as machine learning and computer vision. However, offloading …

Thunderclap: Exploring vulnerabilities in operating system IOMMU protection via DMA from untrustworthy peripherals

AT Markettos, C Rothwell, BF Gutstein, A Pearce… - 2019 - repository.cam.ac.uk
Thunderclap: Exploring Vulnerabilities in Operating System IOMMU Protection via DMA
from Untrustworthy Peripherals Page 1 Thunderclap: Exploring Vulnerabilities in Operating …

Batch-aware unified memory management in GPUs for irregular workloads

H Kim, J Sim, P Gera, R Hadidi, H Kim - Proceedings of the Twenty-Fifth …, 2020 - dl.acm.org
While unified virtual memory and demand paging in modern GPUs provide convenient
abstractions to programmers for working with large-scale applications, they come at a …

A framework for memory oversubscription management in graphics processing units

C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org
Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …

Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency

R Ausavarungnirun, V Miller, J Landgraf… - ACM SIGPLAN …, 2018 - dl.acm.org
Graphics Processing Units (GPUs) exploit large amounts of threadlevel parallelism to
provide high instruction throughput and to efficiently hide long-latency stalls. The resulting …

G10: Enabling an efficient unified gpu memory and storage architecture with smart tensor migrations

H Zhang, Y Zhou, Y Xue, Y Liu, J Huang - … of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
To break the GPU memory wall for scaling deep learning workloads, a variety of architecture
and system techniques have been proposed recently. Their typical approaches include …