Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we …
T Zhu, F He, K Chen, M Song… - … Conference on Machine …, 2023 - proceedings.mlr.press
Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. However, existing …
D Si, C Yun - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
Abstract Sharpness-Aware Minimization (SAM) is an optimizer that takes a descent step based on the gradient at a perturbation $ y_t= x_t+\rho\frac {\nabla f (x_t)}{\lVert\nabla f …
Adversarial examples for deep neural networks (DNNs) are transferable: examples that successfully fool one white-box surrogate model can also deceive other black-box models …
The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical …
We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational …
Transformer and its variants are a powerful class of architectures for sequential recommendation, owing to their ability of capturing a user's dynamic interests from their past …
VA Nguyen, TL Vuong, H Phan… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Bayesian Neural Networks (BNNs) provide a probabilistic interpretation for deep learning models by imposing a prior distribution over model parameters and inferring a …
Distributional robustness is a promising framework for training deep learning models that are less vulnerable to adversarial examples and data distribution shifts. Previous works …