- 学术资源搜索

Fishr: Invariant gradient variances for out-of-distribution generalization

A Rame, C Dancette, M Cord - International Conference on …, 2022 - proceedings.mlr.press

Learning robust models that generalize well under changes in the data distribution is critical
for real-world applications. To this end, there has been a growing surge of interest to learn …

被引用次数：177 相关文章所有 8 个版本

[PDF] thecvf.com

Estimating example difficulty using variance of gradients

C Agarwal, D D'souza… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

In machine learning, a question of great interest is understanding what examples are
challenging for a model to classify. Identifying atypical examples ensures the safe …

被引用次数：94 相关文章所有 9 个版本

[PDF] arxiv.org

An experimental study of byzantine-robust aggregation schemes in federated learning

S Li, ECH Ngai, T Voigt - IEEE Transactions on Big Data, 2023 - ieeexplore.ieee.org

Byzantine-robust federated learning aims at mitigating Byzantine failures during the
federated training process, where malicious participants (known as Byzantine clients) may …

被引用次数：32 相关文章所有 12 个版本

[PDF] openreview.net

Embarrassingly simple dataset distillation

Y Feng, SR Vedantam, J Kempe - The Twelfth International …, 2023 - openreview.net

Dataset distillation extracts a small set of synthetic training samples from a large dataset with
the goal of achieving competitive performance on test data when trained on this sample. In …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

On the generalization of models trained with SGD: Information-theoretic bounds and implications

Z Wang, Y Mao - arXiv preprint arXiv:2110.03128, 2021 - arxiv.org

This paper follows up on a recent work of Neu et al.(2021) and presents some new
information-theoretic upper bounds for the generalization error of machine learning models …

被引用次数：25 相关文章所有 4 个版本

[PDF] arxiv.org

A tale of two long tails

D D'souza, Z Nussbaum, C Agarwal… - arXiv preprint arXiv …, 2021 - arxiv.org

As machine learning models are increasingly employed to assist human decision-makers, it
becomes critical to communicate the uncertainty associated with these model predictions …

被引用次数：18 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Low-variance Forward Gradients using Direct Feedback Alignment and momentum

F Bacho, D Chu - Neural Networks, 2024 - Elsevier

Supervised learning in deep neural networks is commonly performed using error
backpropagation. However, the sequential propagation of errors during the backward pass …

被引用次数：3 相关文章所有 9 个版本

[PDF] neurips.cc

Simigrad: Fine-grained adaptive batching for large scale training using gradient similarity measurement

H Qin, S Rajbhandari, O Ruwase… - Advances in Neural …, 2021 - proceedings.neurips.cc

Large scale training requires massive parallelism to finish the training within a reasonable
amount of time. To support massive parallelism, large batch training is the key enabler but …

被引用次数：4 相关文章所有 8 个版本

[PDF] neurips.cc

On the interpretability of regularisation for neural networks through model gradient similarity

V Szolnoky, V Andersson, B Kulcsár… - Advances in Neural …, 2022 - proceedings.neurips.cc

Most complex machine learning and modelling techniques are prone to over-fitting and may
subsequently generalise poorly to future data. Artificial neural networks are no different in …

被引用次数：3 相关文章所有 5 个版本

[PDF] arxiv.org

Rethinking adam: A twofold exponential moving average approach

Y Wang, Y Kang, C Qin, H Wang, Y Xu… - arXiv preprint arXiv …, 2021 - arxiv.org

Adaptive gradient methods, eg\textsc {Adam}, have achieved tremendous success in
machine learning. Scaling the learning rate element-wisely by a certain form of second …

被引用次数：5 相关文章所有 2 个版本

高级搜索

QQ 群

Fishr: Invariant gradient variances for out-of-distribution generalization

Estimating example difficulty using variance of gradients

An experimental study of byzantine-robust aggregation schemes in federated learning

Embarrassingly simple dataset distillation

On the generalization of models trained with SGD: Information-theoretic bounds and implications

A tale of two long tails

[HTML][HTML] Low-variance Forward Gradients using Direct Feedback Alignment and momentum

Simigrad: Fine-grained adaptive batching for large scale training using gradient similarity measurement

On the interpretability of regularisation for neural networks through model gradient similarity

Rethinking adam: A twofold exponential moving average approach

引用