Zoneout: Regularizing rnns by randomly preserving hidden activations

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

被引用次数：654 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Data augmentation: A comprehensive survey of modern approaches

A Mumuni, F Mumuni - Array, 2022 - Elsevier

To ensure good performance, modern machine learning models typically require large
amounts of quality annotated data. Meanwhile, the data collection and annotation processes …

被引用次数：406 相关文章所有 3 个版本

[PDF] jmlr.org

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org

The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

被引用次数：858 相关文章所有 27 个版本

A systematic review on overfitting control in shallow and deep neural networks

MM Bejani, M Ghatee - Artificial Intelligence Review, 2021 - Springer

Shallow neural networks process the features directly, while deep networks extract features
automatically along with the training. Both models suffer from overfitting or poor …

被引用次数：383 相关文章所有 4 个版本

[PDF] ieee.org

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

被引用次数：169 相关文章所有 6 个版本

[PDF] neurips.cc

Hippo: Recurrent memory with optimal polynomial projections

A Gu, T Dao, S Ermon, A Rudra… - Advances in neural …, 2020 - proceedings.neurips.cc

A central problem in learning from sequential data is representing cumulative history in an
incremental fashion as more data is processed. We introduce a general framework (HiPPO) …

被引用次数：473 相关文章所有 9 个版本

[PDF] arxiv.org

Prophetnet: Predicting future n-gram for sequence-to-sequence pre-training

W Qi, Y Yan, Y Gong, D Liu, N Duan, J Chen… - arXiv preprint arXiv …, 2020 - arxiv.org

This paper presents a new sequence-to-sequence pre-training model called ProphetNet,
which introduces a novel self-supervised objective named future n-gram prediction and the …

被引用次数：498 相关文章所有 4 个版本

[PDF] neurips.cc

Dropblock: A regularization method for convolutional networks

G Ghiasi, TY Lin, QV Le - Advances in neural information …, 2018 - proceedings.neurips.cc

Deep neural networks often work well when they are over-parameterized and trained with a
massive amount of noise and regularization, such as weight decay and dropout. Although …

被引用次数：1242 相关文章所有 7 个版本

[PDF] github.io

An empirical evaluation of generic convolutional and recurrent networks for sequence modeling

S Bai, JZ Kolter, V Koltun - arXiv preprint arXiv:1803.01271, 2018 - arxiv.org

For most deep learning practitioners, sequence modeling is synonymous with recurrent
networks. Yet recent results indicate that convolutional architectures can outperform …

被引用次数：6742 相关文章所有 9 个版本

[PDF] neurips.cc

Neural architecture optimization

R Luo, F Tian, T Qin, E Chen… - Advances in neural …, 2018 - proceedings.neurips.cc

Automatic neural architecture design has shown its potential in discovering powerful neural
network architectures. Existing methods, no matter based on reinforcement learning or …

被引用次数：809 相关文章所有 5 个版本

高级搜索

QQ 群