Bucket Getter: A Bucket-based Processing Engine for Low-bit Block Floating Point (BFP) DNNs

YC Lo, RS Liu - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
Block floating point (BFP), an efficient numerical system for deep neural networks (DNNs),
achieves a good trade-off between dynamic range and hardware costs. Specifically, prior …

Accurate Block Quantization in LLMs with Outliers

N Trukhanov, I Soloveychik - arXiv preprint arXiv:2403.20137, 2024 - arxiv.org
The demand for inference on extremely large scale LLMs has seen enormous growth in the
recent months. It made evident the colossal shortage of dedicated hardware capable of …

TangramFP: Energy-Efficient, Bit-Parallel, Multiply-Accumulate for Deep Neural Networks

Y Yao, X Chen, H Atmer… - 2024 IEEE 36th …, 2024 - ieeexplore.ieee.org
As energy consumption becomes a primary concern for deep learning acceleration, the
need to optimize not only data movement but also compute is becoming important. The …

SLaNC: Static LayerNorm Calibration

M Salmani, N Trukhanov, I Soloveychik - arXiv preprint arXiv:2410.10553, 2024 - arxiv.org
The ever increasing sizes of Large Language Models (LLMs) beyond hundreds of billions of
parameters have generated enormous pressure on the manufacturers of dedicated …