Scaling laws for generative mixed-modal language models

A Aghajanyan, L Yu, A Conneau… - International …, 2023 - proceedings.mlr.press
Generative language models define distributions over sequences of tokens that can
represent essentially any combination of data modalities (eg, any permutation of image …

Reduced precision floating-point optimization for Deep Neural Network On-Device Learning on microcontrollers

D Nadalini, M Rusci, L Benini, F Conti - Future Generation Computer …, 2023 - Elsevier
Abstract Enabling On-Device Learning (ODL) for Ultra-Low-Power Micro-Controller Units
(MCUs) is a key step for post-deployment adaptation and fine-tuning of Deep Neural …

Autosparse: Towards automated sparse training of deep neural networks

A Kundu, NK Mellempudi, DT Vooturi, B Kaul… - arXiv preprint arXiv …, 2023 - arxiv.org
Sparse training is emerging as a promising avenue for reducing the computational cost of
training neural networks. Several recent studies have proposed pruning methods using …

Small reals representations for deep learning at the edge: A comparison

M Cococcioni, F Rossi, E Ruffaldi… - Conference on Next …, 2022 - Springer
The pervasiveness of deep neural networks (DNNs) in edge devices enforces new
requirements on information representation. Low precision formats from 16 bits down to 1 or …

Language Adaptation on a Tight Academic Compute Budget: Tokenizer Swapping Works and Pure bfloat16 Is Enough

K Dobler, G de Melo - arXiv preprint arXiv:2408.15793, 2024 - arxiv.org
We investigate continued pretraining of LLMs for language adaptation on a tight academic
budget: a setting in which only a few GPUs can be used in parallel, for a heavily constrained …

Adaptive loss scaling for mixed precision training

R Zhao, B Vogel, T Ahmed - arXiv preprint arXiv:1910.12385, 2019 - arxiv.org
Mixed precision training (MPT) is becoming a practical technique to improve the speed and
energy efficiency of training deep neural networks by leveraging the fast hardware support …

The hidden power of pure 16-bit floating-point neural networks

J Yun, B Kang, Z Fu - arXiv preprint arXiv:2301.12809, 2023 - arxiv.org
Lowering the precision of neural networks from the prevalent 32-bit precision has long been
considered harmful to performance, despite the gain in space and time. Many works …

[图书][B] Number Systems for Deep Neural Network Architectures

In this introductory chapter, we provide an overview of the main topics covered in this book
and the motivations to write it. The importance of efficient number systems for Deep Neural …

MultiPosits: Universal Coding of 

P Lindstrom - Conference on Next Generation Arithmetic, 2022 - Springer
Recently proposed real-number representations like Posits and Elias codes provide
attractive alternatives to IEEE floating point for representing real numbers in science and …

On the challenges in programming mixed-precision deep neural networks

R Zhao, W Luk, C Xiong, X Niu, KH Tsoi - Proceedings of the 4th ACM …, 2020 - dl.acm.org
Deep Neural Networks (DNNs) are resilient to reduced data precision, which motivates
exploiting low-precision data formats for more efficient computation, especially on custom …