A primer on Bayesian neural networks: review and debates

J Arbel, K Pitas, M Vladimirova, V Fortuin - arXiv preprint arXiv:2309.16314, 2023 - arxiv.org
Neural networks have achieved remarkable performance across various problem domains,
but their widespread applicability is hindered by inherent limitations such as overconfidence …

Normalization layers are all that sharpness-aware minimization needs

M Mueller, T Vlaar, D Rolnick… - Advances in Neural …, 2024 - proceedings.neurips.cc
Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and
has been shown to enhance generalization performance in various settings. In this work we …

Decentralized SGD and average-direction SAM are asymptotically equivalent

T Zhu, F He, K Chen, M Song… - … Conference on Machine …, 2023 - proceedings.mlr.press
Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on
massive devices simultaneously without the control of a central server. However, existing …

Practical sharpness-aware minimization cannot converge all the way to optima

D Si, C Yun - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
Abstract Sharpness-Aware Minimization (SAM) is an optimizer that takes a descent step
based on the gradient at a perturbation $ y_t= x_t+\rho\frac {\nabla f (x_t)}{\lVert\nabla f …

Why does little robustness help? a further step towards understanding adversarial transferability

Y Zhang, S Hu, LY Zhang, J Shi, M Li… - … IEEE Symposium on …, 2024 - ieeexplore.ieee.org
Adversarial examples for deep neural networks (DNNs) are transferable: examples that
successfully fool one white-box surrogate model can also deceive other black-box models …

The interpolating information criterion for overparameterized models

L Hodgkinson, C van der Heide, R Salomone… - arXiv preprint arXiv …, 2023 - arxiv.org
The problem of model selection is considered for the setting of interpolating estimators,
where the number of model parameters exceeds the size of the dataset. Classical …

Variational learning is effective for large deep networks

Y Shen, N Daheim, B Cong, P Nickl… - arXiv preprint arXiv …, 2024 - arxiv.org
We give extensive empirical evidence against the common belief that variational learning is
ineffective for large neural networks. We show that an optimizer called Improved Variational …

Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation

V Lai, H Chen, CCM Yeh, M Xu, Y Cai… - Proceedings of the 17th …, 2023 - dl.acm.org
Transformer and its variants are a powerful class of architectures for sequential
recommendation, owing to their ability of capturing a user's dynamic interests from their past …

Flat seeking bayesian neural networks

VA Nguyen, TL Vuong, H Phan… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Bayesian Neural Networks (BNNs) provide a probabilistic interpretation for deep
learning models by imposing a prior distribution over model parameters and inferring a …

Optimal transport model distributional robustness

VA Nguyen, T Le, A Bui, TT Do… - Advances in Neural …, 2024 - proceedings.neurips.cc
Distributional robustness is a promising framework for training deep learning models that
are less vulnerable to adversarial examples and data distribution shifts. Previous works …