A survey of techniques for optimizing deep learning on GPUs

S Mittal, S Vaishay - Journal of Systems Architecture, 2019 - Elsevier
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …

A review on prognostics methods for engineering systems

J Guo, Z Li, M Li - IEEE Transactions on Reliability, 2019 - ieeexplore.ieee.org
Due to the advancements in sensing technologies and computational capabilities, system
health assessment and prognostics have been extensively investigated in the literature …

XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI Engine

X Jia, Y Zhang, G Liu, X Yang, T Zhang… - ACM Transactions on …, 2024 - dl.acm.org
Today, convolutional neural networks (CNNs) are widely used in computer vision
applications. However, the trends of higher accuracy and higher resolution generate larger …

Training energy-efficient deep spiking neural networks with single-spike hybrid input encoding

G Datta, S Kundu, PA Beerel - 2021 International Joint …, 2021 - ieeexplore.ieee.org
Spiking Neural Networks (SNNs) have emerged as an attractive alternative to traditional
deep learning frameworks, since they provide higher computational efficiency in event …

Characterizing and demystifying the implicit convolution algorithm on commercial matrix-multiplication accelerators

Y Zhou, M Yang, C Guo, J Leng, Y Liang… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Many of today's deep neural network accelerators, eg, Google's TPU and NVIDIA's tensor
core, are built around accelerating the general matrix multiplication (ie, GEMM). However …

AI in medical physics: guidelines for publication

I El Naqa, JM Boone, SH Benedict… - Medical …, 2021 - Wiley Online Library
The Abstract is intended to provide a concise summary of the study and its scientific findings.
For AI/ML applications in medical physics, a problem statement and rationale for utilizing …

Demystifying bert: System design implications

S Pati, S Aga, N Jayasena… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Transfer learning in natural language processing (NLP) uses increasingly large models that
tackle challenging problems. Consequently, these applications are driving the requirements …

Buddy compression: Enabling larger memory for deep learning and hpc workloads on gpus

E Choukse, MB Sullivan, M O'Connor… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
GPUs accelerate high-throughput applications, which require orders-of-magnitude higher
memory bandwidth than traditional CPU-only systems. However, the capacity of such high …

Fusionstitching: boosting memory intensive computations for deep learning workloads

Z Zheng, P Zhao, G Long, F Zhu, K Zhu, W Zhao… - arXiv preprint arXiv …, 2020 - arxiv.org
We show in this work that memory intensive computations can result in severe performance
problems due to off-chip memory access and CPU-GPU context switch overheads in a wide …

Alcop: Automatic load-compute pipelining in deep learning compiler for ai-gpus

G Huang, Y Bai, L Liu, Y Wang, B Yu… - … of Machine Learning …, 2023 - proceedings.mlsys.org
Pipelining between data loading and computation is a critical tensor program optimization
for GPUs. In order to unleash the high performance of latest GPUs, we must perform a …