Getting vit in shape: Scaling laws for compute-optimal model design

IM Alabdulmohsin, X Zhai… - Advances in Neural …, 2024 - proceedings.neurips.cc
Scaling laws have been recently employed to derive compute-optimal model size (number
of parameters) for a given compute duration. We advance and refine such methods to infer …

Compute-efficient deep learning: Algorithmic trends and opportunities

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org
Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

Mimonets: Multiple-input-multiple-output neural networks exploiting computation in superposition

N Menet, M Hersche, G Karunaratne… - Advances in …, 2023 - proceedings.neurips.cc
With the advent of deep learning, progressively larger neural networks have been designed
to solve complex tasks. We take advantage of these capacity-rich models to lower the cost of …

The Impact of Depth on Compositional Generalization in Transformer Language Models

J Petty, S Steenkiste, I Dasgupta, F Sha… - Proceedings of the …, 2024 - aclanthology.org
To process novel sentences, language models (LMs) must generalize compositionally—
combine familiar elements in new ways. What aspects of a model's structure promote …

A comparative analysis of deep neural network architectures for sentence classification using genetic algorithm

B Rogers, N Noman, S Chalup, P Moscato - Evolutionary Intelligence, 2024 - Springer
Because of the number of different architectures, numerous settings of their hyper-
parameters and disparity among their sizes, it is difficult to equitably compare various deep …

Evolutionary neural architecture search on transformers for RUL prediction

H Mo, G Iacca - Materials and Manufacturing Processes, 2023 - Taylor & Francis
Remaining useful life (RUL) predictions are a key enabler for predictive maintenance. Data-
driven approaches, typically based on deep neural networks (DNNs), have shown success …

Simulating weighted automata over sequences and trees with transformers

M Rizvi-Martel, M Lizaire, C Lacroce… - International …, 2024 - proceedings.mlr.press
Transformers are ubiquitous models in the natural language processing (NLP) community
and have shown impressive empirical successes in the past few years. However, little is …

Simulating Weighted Automata over Sequences and Trees with Transformers

M Rizvi, M Lizaire, C Lacroce, G Rabusseau - arXiv preprint arXiv …, 2024 - arxiv.org
Transformers are ubiquitous models in the natural language processing (NLP) community
and have shown impressive empirical successes in the past few years. However, little is …