A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

[HTML][HTML] Data augmentation: A comprehensive survey of modern approaches

A Mumuni, F Mumuni - Array, 2022 - Elsevier
To ensure good performance, modern machine learning models typically require large
amounts of quality annotated data. Meanwhile, the data collection and annotation processes …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

A systematic review on overfitting control in shallow and deep neural networks

MM Bejani, M Ghatee - Artificial Intelligence Review, 2021 - Springer
Shallow neural networks process the features directly, while deep networks extract features
automatically along with the training. Both models suffer from overfitting or poor …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Hippo: Recurrent memory with optimal polynomial projections

A Gu, T Dao, S Ermon, A Rudra… - Advances in neural …, 2020 - proceedings.neurips.cc
A central problem in learning from sequential data is representing cumulative history in an
incremental fashion as more data is processed. We introduce a general framework (HiPPO) …

Prophetnet: Predicting future n-gram for sequence-to-sequence pre-training

W Qi, Y Yan, Y Gong, D Liu, N Duan, J Chen… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper presents a new sequence-to-sequence pre-training model called ProphetNet,
which introduces a novel self-supervised objective named future n-gram prediction and the …

Dropblock: A regularization method for convolutional networks

G Ghiasi, TY Lin, QV Le - Advances in neural information …, 2018 - proceedings.neurips.cc
Deep neural networks often work well when they are over-parameterized and trained with a
massive amount of noise and regularization, such as weight decay and dropout. Although …

An empirical evaluation of generic convolutional and recurrent networks for sequence modeling

S Bai, JZ Kolter, V Koltun - arXiv preprint arXiv:1803.01271, 2018 - arxiv.org
For most deep learning practitioners, sequence modeling is synonymous with recurrent
networks. Yet recent results indicate that convolutional architectures can outperform …

Neural architecture optimization

R Luo, F Tian, T Qin, E Chen… - Advances in neural …, 2018 - proceedings.neurips.cc
Automatic neural architecture design has shown its potential in discovering powerful neural
network architectures. Existing methods, no matter based on reinforcement learning or …