Where is the information in a deep neural network?

M Jusup, P Holme, K Kanazawa, M Takayasu, I Romić… - Physics Reports, 2022 - Elsevier

Recent decades have seen a rise in the use of physics methods to study different societal
phenomena. This development has been due to physicists venturing outside of their …

被引用次数：341 相关文章所有 11 个版本

[PDF] neurips.cc

Merging models with fisher-weighted averaging

MS Matena, CA Raffel - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Averaging the parameters of models that have the same architecture and initialization can
provide a means of combining their respective capabilities. In this paper, we take the …

被引用次数：162 相关文章所有 6 个版本

[PDF] neurips.cc

Training neural networks with fixed sparse masks

YL Sung, V Nair, CA Raffel - Advances in Neural …, 2021 - proceedings.neurips.cc

During typical gradient-based training of deep neural networks, all of the model's
parameters are updated at each iteration. Recent work has shown that it is possible to …

被引用次数：139 相关文章所有 6 个版本

[PDF] arxiv.org

A general framework for uncertainty estimation in deep learning

A Loquercio, M Segu… - IEEE Robotics and …, 2020 - ieeexplore.ieee.org

Neural networks predictions are unreliable when the input sample is out of the training
distribution or corrupted by noise. Being able to detect such failures automatically is …

被引用次数：332 相关文章所有 10 个版本

[PDF] arxiv.org

Fast yet effective machine unlearning

AK Tarun, VS Chundawat, M Mandal… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Unlearning the data observed during the training of a machine learning (ML) model is an
important task that can play a pivotal role in fortifying the privacy and security of ML-based …

被引用次数：112 相关文章所有 6 个版本

[PDF] thecvf.com

Mixed-privacy forgetting in deep networks

A Golatkar, A Achille, A Ravichandran… - Proceedings of the …, 2021 - openaccess.thecvf.com

We show that the influence of a subset of the training samples can be removed--or"
forgotten"--from the weights of a network trained on large-scale image classification tasks …

被引用次数：123 相关文章所有 8 个版本

[PDF] nature.com

Towards provably efficient quantum algorithms for large-scale machine-learning models

J Liu, M Liu, JP Liu, Z Ye, Y Wang, Y Alexeev… - Nature …, 2024 - nature.com

Large machine learning models are revolutionary technologies of artificial intelligence
whose bottlenecks include huge computational expenses, power, and time used both in the …

被引用次数：26 相关文章所有 15 个版本

[PDF] arxiv.org

A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima

Z Xie, I Sato, M Sugiyama - arXiv preprint arXiv:2002.03495, 2020 - arxiv.org

Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training
deep networks in practice. SGD is known to find a flat minimum that often generalizes well …

被引用次数：134 相关文章所有 3 个版本

[PDF] ieee.org

The information bottleneck problem and its applications in machine learning

Z Goldfeld, Y Polyanskiy - IEEE Journal on Selected Areas in …, 2020 - ieeexplore.ieee.org

Inference capabilities of machine learning (ML) systems skyrocketed in recent years, now
playing a pivotal role in various aspect of society. The goal in statistical learning is to use …

被引用次数：121 相关文章所有 6 个版本

[PDF] neurips.cc

Fine-tuning pre-trained language models effectively by optimizing subnetworks adaptively

H Zhang, G Li, J Li, Z Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc

Large-scale pre-trained language models have achieved impressive results on a wide
range of downstream tasks recently. However, fine-tuning an extremely large-scale pre …

被引用次数：21 相关文章所有 7 个版本

高级搜索

QQ 群