FPGA-based accelerators of deep learning networks for learning and classification: A review

A Shawahna, SM Sait, A El-Maleh - ieee Access, 2018 - ieeexplore.ieee.org
Due to recent advances in digital technologies, and availability of credible data, an area of
artificial intelligence, deep learning, has emerged and has demonstrated its ability and …

The future of FPGA acceleration in datacenters and the cloud

C Bobda, JM Mbongue, P Chow, M Ewais… - ACM Transactions on …, 2022 - dl.acm.org
In this article, we survey existing academic and commercial efforts to provide Field-
Programmable Gate Array (FPGA) acceleration in datacenters and the cloud. The goal is a …

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

Y Gan, Y Zhang, D Cheng, A Shetty, P Rathi… - Proceedings of the …, 2019 - dl.acm.org
Cloud services have recently started undergoing a major shift from monolithic applications,
to graphs of hundreds or thousands of loosely-coupled microservices. Microservices …

A configurable cloud-scale DNN processor for real-time AI

J Fowers, K Ovtcharov, M Papamichael… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org
Interactive AI-powered services require low-latency evaluation of deep neural network
(DNN) models-aka"" real-time AI"". The growing demand for computationally expensive …

Azure accelerated networking:{SmartNICs} in the public cloud

D Firestone, A Putnam, S Mundkur, D Chiou… - … USENIX Symposium on …, 2018 - usenix.org
Modern cloud architectures rely on each server running its own networking stack to
implement policies such as tunneling for virtual networks, security, and load balancing …

{LegoOS}: A disseminated, distributed {OS} for hardware resource disaggregation

Y Shan, Y Huang, Y Chen, Y Zhang - 13th USENIX Symposium on …, 2018 - usenix.org
The monolithic server model where a server is the unit of deployment, operation, and failure
is meeting its limits in the face of several recent hardware and application trends. To improve …

ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars

A Shafiee, A Nag, N Muralimanohar… - ACM SIGARCH …, 2016 - dl.acm.org
A number of recent efforts have attempted to design accelerators for popular machine
learning algorithms, such as those involving convolutional and deep neural networks (CNNs …

VTR 8: High-performance CAD and customizable FPGA architecture modelling

KE Murray, O Petelin, S Zhong, JM Wang… - ACM Transactions on …, 2020 - dl.acm.org
Developing Field-programmable Gate Array (FPGA) architectures is challenging due to the
competing requirements of various application domains and changing manufacturing …

Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions

N Vasilache, O Zinenko, T Theodoridis, P Goyal… - arXiv preprint arXiv …, 2018 - arxiv.org
Deep learning models with convolutional and recurrent networks are now ubiquitous and
analyze massive amounts of audio, image, video, text and graph data, with applications in …

Nvidia tensor core programmability, performance & precision

S Markidis, SW Der Chien, E Laure… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called Tensor Core
that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The …