DeLTA: GPU performance model for deep learning applications with in-depth memory system traffic...

S Mittal, S Vaishay - Journal of Systems Architecture, 2019 - Elsevier

The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …

被引用次数：210 相关文章所有 5 个版本

A review on prognostics methods for engineering systems

J Guo, Z Li, M Li - IEEE Transactions on Reliability, 2019 - ieeexplore.ieee.org

Due to the advancements in sensing technologies and computational capabilities, system
health assessment and prognostics have been extensively investigated in the literature …

被引用次数：156 相关文章

XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI Engine

X Jia, Y Zhang, G Liu, X Yang, T Zhang… - ACM Transactions on …, 2024 - dl.acm.org

Today, convolutional neural networks (CNNs) are widely used in computer vision
applications. However, the trends of higher accuracy and higher resolution generate larger …

被引用次数：34 相关文章所有 3 个版本

[PDF] arxiv.org

Training energy-efficient deep spiking neural networks with single-spike hybrid input encoding

G Datta, S Kundu, PA Beerel - 2021 International Joint …, 2021 - ieeexplore.ieee.org

Spiking Neural Networks (SNNs) have emerged as an attractive alternative to traditional
deep learning frameworks, since they provide higher computational efficiency in event …

被引用次数：43 相关文章所有 6 个版本

[PDF] arxiv.org

Characterizing and demystifying the implicit convolution algorithm on commercial matrix-multiplication accelerators

Y Zhou, M Yang, C Guo, J Leng, Y Liang… - 2021 IEEE …, 2021 - ieeexplore.ieee.org

Many of today's deep neural network accelerators, eg, Google's TPU and NVIDIA's tensor
core, are built around accelerating the general matrix multiplication (ie, GEMM). However …

被引用次数：40 相关文章所有 6 个版本

[PDF] wiley.com

AI in medical physics: guidelines for publication

I El Naqa, JM Boone, SH Benedict… - Medical …, 2021 - Wiley Online Library

The Abstract is intended to provide a concise summary of the study and its scientific findings.
For AI/ML applications in medical physics, a problem statement and rationale for utilizing …

被引用次数：39 相关文章所有 4 个版本

Demystifying bert: System design implications

S Pati, S Aga, N Jayasena… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

Transfer learning in natural language processing (NLP) uses increasingly large models that
tackle challenging problems. Consequently, these applications are driving the requirements …

被引用次数：29 相关文章所有 2 个版本

[PDF] arxiv.org

Buddy compression: Enabling larger memory for deep learning and hpc workloads on gpus

E Choukse, MB Sullivan, M O'Connor… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

GPUs accelerate high-throughput applications, which require orders-of-magnitude higher
memory bandwidth than traditional CPU-only systems. However, the capacity of such high …

被引用次数：59 相关文章所有 9 个版本

[PDF] arxiv.org

Fusionstitching: boosting memory intensive computations for deep learning workloads

Z Zheng, P Zhao, G Long, F Zhu, K Zhu, W Zhao… - arXiv preprint arXiv …, 2020 - arxiv.org

We show in this work that memory intensive computations can result in severe performance
problems due to off-chip memory access and CPU-GPU context switch overheads in a wide …

被引用次数：33 相关文章所有 3 个版本

[PDF] mlsys.org

Alcop: Automatic load-compute pipelining in deep learning compiler for ai-gpus

G Huang, Y Bai, L Liu, Y Wang, B Yu… - … of Machine Learning …, 2023 - proceedings.mlsys.org

Pipelining between data loading and computation is a critical tensor program optimization
for GPUs. In order to unleash the high performance of latest GPUs, we must perform a …

被引用次数：11 相关文章所有 6 个版本

高级搜索

QQ 群