Social physics

M Jusup, P Holme, K Kanazawa, M Takayasu, I Romić… - Physics Reports, 2022 - Elsevier
Recent decades have seen a rise in the use of physics methods to study different societal
phenomena. This development has been due to physicists venturing outside of their …

Merging models with fisher-weighted averaging

MS Matena, CA Raffel - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Averaging the parameters of models that have the same architecture and initialization can
provide a means of combining their respective capabilities. In this paper, we take the …

Training neural networks with fixed sparse masks

YL Sung, V Nair, CA Raffel - Advances in Neural …, 2021 - proceedings.neurips.cc
During typical gradient-based training of deep neural networks, all of the model's
parameters are updated at each iteration. Recent work has shown that it is possible to …

A general framework for uncertainty estimation in deep learning

A Loquercio, M Segu… - IEEE Robotics and …, 2020 - ieeexplore.ieee.org
Neural networks predictions are unreliable when the input sample is out of the training
distribution or corrupted by noise. Being able to detect such failures automatically is …

Fast yet effective machine unlearning

AK Tarun, VS Chundawat, M Mandal… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Unlearning the data observed during the training of a machine learning (ML) model is an
important task that can play a pivotal role in fortifying the privacy and security of ML-based …

Mixed-privacy forgetting in deep networks

A Golatkar, A Achille, A Ravichandran… - Proceedings of the …, 2021 - openaccess.thecvf.com
We show that the influence of a subset of the training samples can be removed--or"
forgotten"--from the weights of a network trained on large-scale image classification tasks …

Towards provably efficient quantum algorithms for large-scale machine-learning models

J Liu, M Liu, JP Liu, Z Ye, Y Wang, Y Alexeev… - Nature …, 2024 - nature.com
Large machine learning models are revolutionary technologies of artificial intelligence
whose bottlenecks include huge computational expenses, power, and time used both in the …

A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima

Z Xie, I Sato, M Sugiyama - arXiv preprint arXiv:2002.03495, 2020 - arxiv.org
Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training
deep networks in practice. SGD is known to find a flat minimum that often generalizes well …

The information bottleneck problem and its applications in machine learning

Z Goldfeld, Y Polyanskiy - IEEE Journal on Selected Areas in …, 2020 - ieeexplore.ieee.org
Inference capabilities of machine learning (ML) systems skyrocketed in recent years, now
playing a pivotal role in various aspect of society. The goal in statistical learning is to use …

Fine-tuning pre-trained language models effectively by optimizing subnetworks adaptively

H Zhang, G Li, J Li, Z Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Large-scale pre-trained language models have achieved impressive results on a wide
range of downstream tasks recently. However, fine-tuning an extremely large-scale pre …