X Hu, L Chu, J Pei, W Liu, J Bian - Knowledge and Information Systems, 2021 - Springer
Abstract Model complexity is a fundamental problem in deep learning. In this paper, we conduct a systematic overview of the latest studies on model complexity in deep learning …
P Kidger - arXiv preprint arXiv:2202.02435, 2022 - arxiv.org
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks …
A Rame, C Dancette, M Cord - International Conference on …, 2022 - proceedings.mlr.press
Learning robust models that generalize well under changes in the data distribution is critical for real-world applications. To this end, there has been a growing surge of interest to learn …
We show that a variety of modern deep learning tasks exhibit a'double- descent'phenomenon where, as we increase model size, performance first gets worse and …
We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …
S Liu, Z Zhu, Q Qu, C You - International Conference on …, 2022 - proceedings.mlr.press
Recently, over-parameterized deep networks, with increasingly more network parameters than training samples, have dominated the performances of modern machine learning …
M Andriushchenko… - Advances in Neural …, 2020 - proceedings.neurips.cc
A recent line of work focused on making adversarial training computationally efficient for deep learning models. In particular, Wong et al.(2020) showed that $\ell_\infty $-adversarial …
Several works have proposed Simplicity Bias (SB)---the tendency of standard training procedures such as Stochastic Gradient Descent (SGD) to find simple models---to justify why …
Pre-trained language models can be fine-tuned to solve diverse NLP tasks, including in few- shot settings. Thus fine-tuning allows the model to quickly pick up task-specific" skills," but …