Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

Diana: An end-to-end hybrid digital and analog neural network soc for the edge

P Houshmand, GM Sarda, V Jain… - IEEE Journal of Solid …, 2022 - ieeexplore.ieee.org
DIgital-ANAlog (DIANA), a heterogeneous multi-core accelerator, combines a reduced
instruction set computer-five (RISC-V) host processor with an analog in-memory computing …

A full-stack search technique for domain optimized deep learning accelerators

D Zhang, S Huda, E Songhori, K Prabhu, Q Le… - Proceedings of the 27th …, 2022 - dl.acm.org
The rapidly-changing deep learning landscape presents a unique opportunity for building
inference accelerators optimized for specific datacenter-scale workloads. We propose Full …

Dosa: Differentiable model-based one-loop search for dnn accelerators

C Hong, Q Huang, G Dinh, M Subedar… - Proceedings of the 56th …, 2023 - dl.acm.org
In the hardware design space exploration process, it is critical to optimize both hardware
parameters and algorithm-to-hardware mappings. Previous work has largely approached …

Defines: Enabling fast exploration of the depth-first scheduling space for dnn accelerators through analytical modeling

L Mei, K Goetschalckx, A Symons… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
DNN workloads can be scheduled onto DNN accelerators in many different ways: from layer-
by-layer scheduling to cross-layer depth-first scheduling (aka layer fusion, or cascaded …

Tinyvers: A tiny versatile system-on-chip with state-retentive eMRAM for ML inference at the extreme edge

V Jain, S Giraldo, J De Roose, L Mei… - IEEE Journal of Solid …, 2023 - ieeexplore.ieee.org
Extreme edge devices or Internet-of-Things (IoT) nodes require both ultra-low power (ULP)
always-on (AON) processing as well as the ability to do on-demand sampling and …

Telamalloc: Efficient on-chip memory allocation for production machine learning accelerators

M Maas, U Beaugnon, A Chauhan, B Ilbeyi - Proceedings of the 28th …, 2022 - dl.acm.org
Memory buffer allocation for on-chip memories is a major challenge in modern machine
learning systems that target ML accelerators. In interactive systems such as mobile phones …

Leveraging domain information for the efficient automated design of deep learning accelerators

C Sakhuja, Z Shi, C Lin - 2023 IEEE International Symposium …, 2023 - ieeexplore.ieee.org
Deep learning accelerators are important tools for feeding the growing demand for deep
learning applications. The automated design of such accelerators—which is important for …

Benchmarking and modeling of analog and digital SRAM in-memory computing architectures

P Houshmand, J Sun, M Verhelst - arXiv preprint arXiv:2305.18335, 2023 - arxiv.org
In-memory-computing is emerging as an efficient hardware paradigm for deep neural
network accelerators at the edge, enabling to break the memory wall and exploit massive …

Demystifying map space exploration for NPUs

SC Kao, A Parashar, PA Tsai… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Map Space Exploration is the problem of finding optimized mappings of a Deep Neural
Network (DNN) model on an accelerator. It is known to be extremely computationally …