Fishr: Invariant gradient variances for out-of-distribution generalization

A Rame, C Dancette, M Cord - International Conference on …, 2022 - proceedings.mlr.press
Learning robust models that generalize well under changes in the data distribution is critical
for real-world applications. To this end, there has been a growing surge of interest to learn …

Estimating example difficulty using variance of gradients

C Agarwal, D D'souza… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
In machine learning, a question of great interest is understanding what examples are
challenging for a model to classify. Identifying atypical examples ensures the safe …

An experimental study of byzantine-robust aggregation schemes in federated learning

S Li, ECH Ngai, T Voigt - IEEE Transactions on Big Data, 2023 - ieeexplore.ieee.org
Byzantine-robust federated learning aims at mitigating Byzantine failures during the
federated training process, where malicious participants (known as Byzantine clients) may …

Embarrassingly simple dataset distillation

Y Feng, SR Vedantam, J Kempe - The Twelfth International …, 2023 - openreview.net
Dataset distillation extracts a small set of synthetic training samples from a large dataset with
the goal of achieving competitive performance on test data when trained on this sample. In …

On the generalization of models trained with SGD: Information-theoretic bounds and implications

Z Wang, Y Mao - arXiv preprint arXiv:2110.03128, 2021 - arxiv.org
This paper follows up on a recent work of Neu et al.(2021) and presents some new
information-theoretic upper bounds for the generalization error of machine learning models …

A tale of two long tails

D D'souza, Z Nussbaum, C Agarwal… - arXiv preprint arXiv …, 2021 - arxiv.org
As machine learning models are increasingly employed to assist human decision-makers, it
becomes critical to communicate the uncertainty associated with these model predictions …

[HTML][HTML] Low-variance Forward Gradients using Direct Feedback Alignment and momentum

F Bacho, D Chu - Neural Networks, 2024 - Elsevier
Supervised learning in deep neural networks is commonly performed using error
backpropagation. However, the sequential propagation of errors during the backward pass …

Simigrad: Fine-grained adaptive batching for large scale training using gradient similarity measurement

H Qin, S Rajbhandari, O Ruwase… - Advances in Neural …, 2021 - proceedings.neurips.cc
Large scale training requires massive parallelism to finish the training within a reasonable
amount of time. To support massive parallelism, large batch training is the key enabler but …

On the interpretability of regularisation for neural networks through model gradient similarity

V Szolnoky, V Andersson, B Kulcsár… - Advances in Neural …, 2022 - proceedings.neurips.cc
Most complex machine learning and modelling techniques are prone to over-fitting and may
subsequently generalise poorly to future data. Artificial neural networks are no different in …

Rethinking adam: A twofold exponential moving average approach

Y Wang, Y Kang, C Qin, H Wang, Y Xu… - arXiv preprint arXiv …, 2021 - arxiv.org
Adaptive gradient methods, eg\textsc {Adam}, have achieved tremendous success in
machine learning. Scaling the learning rate element-wisely by a certain form of second …