Sam as an optimal relaxation of bayes

J Arbel, K Pitas, M Vladimirova, V Fortuin - arXiv preprint arXiv:2309.16314, 2023 - arxiv.org

Neural networks have achieved remarkable performance across various problem domains,
but their widespread applicability is hindered by inherent limitations such as overconfidence …

被引用次数：20 相关文章所有 2 个版本

[PDF] neurips.cc

Normalization layers are all that sharpness-aware minimization needs

M Mueller, T Vlaar, D Rolnick… - Advances in Neural …, 2024 - proceedings.neurips.cc

Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and
has been shown to enhance generalization performance in various settings. In this work we …

被引用次数：16 相关文章所有 8 个版本

[PDF] mlr.press

Decentralized SGD and average-direction SAM are asymptotically equivalent

T Zhu, F He, K Chen, M Song… - … Conference on Machine …, 2023 - proceedings.mlr.press

Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on
massive devices simultaneously without the control of a central server. However, existing …

被引用次数：12 相关文章所有 11 个版本

[PDF] neurips.cc

Practical sharpness-aware minimization cannot converge all the way to optima

D Si, C Yun - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc

Abstract Sharpness-Aware Minimization (SAM) is an optimizer that takes a descent step
based on the gradient at a perturbation $ y_t= x_t+\rho\frac {\nabla f (x_t)}{\lVert\nabla f …

被引用次数：15 相关文章所有 7 个版本

[PDF] researchgate.net

Why does little robustness help? a further step towards understanding adversarial transferability

Y Zhang, S Hu, LY Zhang, J Shi, M Li… - … IEEE Symposium on …, 2024 - ieeexplore.ieee.org

Adversarial examples for deep neural networks (DNNs) are transferable: examples that
successfully fool one white-box surrogate model can also deceive other black-box models …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

The interpolating information criterion for overparameterized models

L Hodgkinson, C van der Heide, R Salomone… - arXiv preprint arXiv …, 2023 - arxiv.org

The problem of model selection is considered for the setting of interpolating estimators,
where the number of model parameters exceeds the size of the dataset. Classical …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Variational learning is effective for large deep networks

Y Shen, N Daheim, B Cong, P Nickl… - arXiv preprint arXiv …, 2024 - arxiv.org

We give extensive empirical evidence against the common belief that variational learning is
ineffective for large neural networks. We show that an optimizer called Improved Variational …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation

V Lai, H Chen, CCM Yeh, M Xu, Y Cai… - Proceedings of the 17th …, 2023 - dl.acm.org

Transformer and its variants are a powerful class of architectures for sequential
recommendation, owing to their ability of capturing a user's dynamic interests from their past …

被引用次数：7 相关文章所有 3 个版本

[PDF] neurips.cc

Flat seeking bayesian neural networks

VA Nguyen, TL Vuong, H Phan… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Bayesian Neural Networks (BNNs) provide a probabilistic interpretation for deep
learning models by imposing a prior distribution over model parameters and inferring a …

被引用次数：6 相关文章所有 7 个版本

[PDF] neurips.cc

Optimal transport model distributional robustness

VA Nguyen, T Le, A Bui, TT Do… - Advances in Neural …, 2024 - proceedings.neurips.cc

Distributional robustness is a promising framework for training deep learning models that
are less vulnerable to adversarial examples and data distribution shifts. Previous works …

被引用次数：4 相关文章所有 6 个版本

高级搜索

QQ 群