Shortcut learning in deep neural networks

R Geirhos, JH Jacobsen, C Michaelis… - Nature Machine …, 2020 - nature.com
Deep learning has triggered the current rise of artificial intelligence and is the workhorse of
today's machine intelligence. Numerous success stories have rapidly spread all over …

Model complexity of deep learning: A survey

X Hu, L Chu, J Pei, W Liu, J Bian - Knowledge and Information Systems, 2021 - Springer
Abstract Model complexity is a fundamental problem in deep learning. In this paper, we
conduct a systematic overview of the latest studies on model complexity in deep learning …

On neural differential equations

P Kidger - arXiv preprint arXiv:2202.02435, 2022 - arxiv.org
The conjoining of dynamical systems and deep learning has become a topic of great
interest. In particular, neural differential equations (NDEs) demonstrate that neural networks …

Fishr: Invariant gradient variances for out-of-distribution generalization

A Rame, C Dancette, M Cord - International Conference on …, 2022 - proceedings.mlr.press
Learning robust models that generalize well under changes in the data distribution is critical
for real-world applications. To this end, there has been a growing surge of interest to learn …

Deep double descent: Where bigger models and more data hurt

P Nakkiran, G Kaplun, Y Bansal, T Yang… - Journal of Statistical …, 2021 - iopscience.iop.org
We show that a variety of modern deep learning tasks exhibit a'double-
descent'phenomenon where, as we increase model size, performance first gets worse and …

Gradient starvation: A learning proclivity in neural networks

M Pezeshki, O Kaba, Y Bengio… - Advances in …, 2021 - proceedings.neurips.cc
We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …

Robust training under label noise by over-parameterization

S Liu, Z Zhu, Q Qu, C You - International Conference on …, 2022 - proceedings.mlr.press
Recently, over-parameterized deep networks, with increasingly more network parameters
than training samples, have dominated the performances of modern machine learning …

Understanding and improving fast adversarial training

M Andriushchenko… - Advances in Neural …, 2020 - proceedings.neurips.cc
A recent line of work focused on making adversarial training computationally efficient for
deep learning models. In particular, Wong et al.(2020) showed that $\ell_\infty $-adversarial …

The pitfalls of simplicity bias in neural networks

H Shah, K Tamuly, A Raghunathan… - Advances in …, 2020 - proceedings.neurips.cc
Several works have proposed Simplicity Bias (SB)---the tendency of standard training
procedures such as Stochastic Gradient Descent (SGD) to find simple models---to justify why …

Task-specific skill localization in fine-tuned language models

A Panigrahi, N Saunshi, H Zhao… - … on Machine Learning, 2023 - proceedings.mlr.press
Pre-trained language models can be fine-tuned to solve diverse NLP tasks, including in few-
shot settings. Thus fine-tuning allows the model to quickly pick up task-specific" skills," but …