Wide attention is the way forward for transformers?

IM Alabdulmohsin, X Zhai… - Advances in Neural …, 2024 - proceedings.neurips.cc

Scaling laws have been recently employed to derive compute-optimal model size (number
of parameters) for a given compute duration. We advance and refine such methods to infer …

被引用次数：38 相关文章所有 5 个版本

[PDF] jmlr.org

Compute-efficient deep learning: Algorithmic trends and opportunities

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org

Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

被引用次数：48 相关文章所有 4 个版本

[PDF] neurips.cc

Mimonets: Multiple-input-multiple-output neural networks exploiting computation in superposition

N Menet, M Hersche, G Karunaratne… - Advances in …, 2023 - proceedings.neurips.cc

With the advent of deep learning, progressively larger neural networks have been designed
to solve complex tasks. We take advantage of these capacity-rich models to lower the cost of …

被引用次数：10 相关文章所有 7 个版本

[PDF] aclanthology.org

The Impact of Depth on Compositional Generalization in Transformer Language Models

J Petty, S Steenkiste, I Dasgupta, F Sha… - Proceedings of the …, 2024 - aclanthology.org

To process novel sentences, language models (LMs) must generalize compositionally—
combine familiar elements in new ways. What aspects of a model's structure promote …

被引用次数：8 相关文章

A comparative analysis of deep neural network architectures for sentence classification using genetic algorithm

B Rogers, N Noman, S Chalup, P Moscato - Evolutionary Intelligence, 2024 - Springer

Because of the number of different architectures, numerous settings of their hyper-
parameters and disparity among their sizes, it is difficult to equitably compare various deep …

被引用次数：8 相关文章

[PDF] unitn.it

Evolutionary neural architecture search on transformers for RUL prediction

H Mo, G Iacca - Materials and Manufacturing Processes, 2023 - Taylor & Francis

Remaining useful life (RUL) predictions are a key enabler for predictive maintenance. Data-
driven approaches, typically based on deep neural networks (DNNs), have shown success …

被引用次数：12 相关文章所有 2 个版本

[PDF] mlr.press

Simulating weighted automata over sequences and trees with transformers

M Rizvi-Martel, M Lizaire, C Lacroce… - International …, 2024 - proceedings.mlr.press

Transformers are ubiquitous models in the natural language processing (NLP) community
and have shown impressive empirical successes in the past few years. However, little is …

[PDF] arxiv.org

Simulating Weighted Automata over Sequences and Trees with Transformers

M Rizvi, M Lizaire, C Lacroce, G Rabusseau - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers are ubiquitous models in the natural language processing (NLP) community
and have shown impressive empirical successes in the past few years. However, little is …

高级搜索

QQ 群