[HTML][HTML] An updated survey of efficient hardware architectures for accelerating deep convolutional neural networks

M Capra, B Bussolino, A Marchisio, M Shafique… - Future Internet, 2020 - mdpi.com
Deep Neural Networks (DNNs) are nowadays a common practice in most of the Artificial
Intelligence (AI) applications. Their ability to go beyond human precision has made these …

Efficient hardware architectures for accelerating deep neural networks: Survey

P Dhilleswararao, S Boppu, MS Manikandan… - IEEE …, 2022 - ieeexplore.ieee.org
In the modern-day era of technology, a paradigm shift has been witnessed in the areas
involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

Spatten: Efficient sparse attention architecture with cascade token and head pruning

H Wang, Z Zhang, S Han - 2021 IEEE International Symposium …, 2021 - ieeexplore.ieee.org
The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …

Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead

M Capra, B Bussolino, A Marchisio, G Masera… - IEEE …, 2020 - ieeexplore.ieee.org
Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning
(DL) is already present in many applications ranging from computer vision for medicine to …

Enable deep learning on mobile devices: Methods, systems, and applications

H Cai, J Lin, Y Lin, Z Liu, H Tang, H Wang… - ACM Transactions on …, 2022 - dl.acm.org
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial
intelligence (AI), including computer vision, natural language processing, and speech …

AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing

T Geng, A Li, R Shi, C Wu, T Wang, Y Li… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Deep learning systems have been successfully applied to Euclidean data such as images,
video, and audio. In many applications, however, information and their relationships are …

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org
Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

I-GCN: A graph convolutional network accelerator with runtime locality enhancement through islandization

T Geng, C Wu, Y Zhang, C Tan, C Xie, H You… - MICRO-54: 54th annual …, 2021 - dl.acm.org
Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three
years. Compared with other deep learning modalities, high-performance hardware …

Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration

H Genc, S Kim, A Amid, A Haj-Ali, V Iyer… - 2021 58th ACM/IEEE …, 2021 - ieeexplore.ieee.org
DNN accelerators are often developed and evaluated in isolation without considering the
cross-stack, system-level effects in real-world environments. This makes it difficult to …