- 学术资源搜索

Shortcut learning in deep neural networks

R Geirhos, JH Jacobsen, C Michaelis… - Nature Machine …, 2020 - nature.com

Deep learning has triggered the current rise of artificial intelligence and is the workhorse of
today's machine intelligence. Numerous success stories have rapidly spread all over …

被引用次数：2203 相关文章所有 12 个版本

[PDF] arxiv.org

Model complexity of deep learning: A survey

X Hu, L Chu, J Pei, W Liu, J Bian - Knowledge and Information Systems, 2021 - Springer

Abstract Model complexity is a fundamental problem in deep learning. In this paper, we
conduct a systematic overview of the latest studies on model complexity in deep learning …

被引用次数：345 相关文章所有 6 个版本

[PDF] arxiv.org

On neural differential equations

P Kidger - arXiv preprint arXiv:2202.02435, 2022 - arxiv.org

The conjoining of dynamical systems and deep learning has become a topic of great
interest. In particular, neural differential equations (NDEs) demonstrate that neural networks …

被引用次数：368 相关文章所有 4 个版本

[PDF] mlr.press

Fishr: Invariant gradient variances for out-of-distribution generalization

A Rame, C Dancette, M Cord - International Conference on …, 2022 - proceedings.mlr.press

Learning robust models that generalize well under changes in the data distribution is critical
for real-world applications. To this end, there has been a growing surge of interest to learn …

被引用次数：236 相关文章所有 8 个版本

[PDF] arxiv.org

Deep double descent: Where bigger models and more data hurt

P Nakkiran, G Kaplun, Y Bansal, T Yang… - Journal of Statistical …, 2021 - iopscience.iop.org

We show that a variety of modern deep learning tasks exhibit a'double-
descent'phenomenon where, as we increase model size, performance first gets worse and …

被引用次数：1152 相关文章所有 10 个版本

[PDF] neurips.cc

Gradient starvation: A learning proclivity in neural networks

M Pezeshki, O Kaba, Y Bengio… - Advances in …, 2021 - proceedings.neurips.cc

We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …

被引用次数：307 相关文章所有 7 个版本

[PDF] mlr.press

Robust training under label noise by over-parameterization

S Liu, Z Zhu, Q Qu, C You - International Conference on …, 2022 - proceedings.mlr.press

Recently, over-parameterized deep networks, with increasingly more network parameters
than training samples, have dominated the performances of modern machine learning …

被引用次数：131 相关文章所有 6 个版本

[PDF] neurips.cc

Understanding and improving fast adversarial training

M Andriushchenko… - Advances in Neural …, 2020 - proceedings.neurips.cc

A recent line of work focused on making adversarial training computationally efficient for
deep learning models. In particular, Wong et al.(2020) showed that $\ell_\infty $-adversarial …

被引用次数：357 相关文章所有 7 个版本

[PDF] neurips.cc

The pitfalls of simplicity bias in neural networks

H Shah, K Tamuly, A Raghunathan… - Advances in …, 2020 - proceedings.neurips.cc

Several works have proposed Simplicity Bias (SB)---the tendency of standard training
procedures such as Stochastic Gradient Descent (SGD) to find simple models---to justify why …

被引用次数：401 相关文章所有 8 个版本

[PDF] mlr.press

Task-specific skill localization in fine-tuned language models

A Panigrahi, N Saunshi, H Zhao… - … on Machine Learning, 2023 - proceedings.mlr.press

Pre-trained language models can be fine-tuned to solve diverse NLP tasks, including in few-
shot settings. Thus fine-tuning allows the model to quickly pick up task-specific" skills," but …

被引用次数：67 相关文章所有 7 个版本

高级搜索

QQ 群

Shortcut learning in deep neural networks

Model complexity of deep learning: A survey

On neural differential equations

Fishr: Invariant gradient variances for out-of-distribution generalization

Deep double descent: Where bigger models and more data hurt

Gradient starvation: A learning proclivity in neural networks

Robust training under label noise by over-parameterization

Understanding and improving fast adversarial training

The pitfalls of simplicity bias in neural networks

Task-specific skill localization in fine-tuned language models

引用