Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives

J Li, J Chen, Y Tang, C Wang, BA Landman… - Medical image …, 2023 - Elsevier
Transformer, one of the latest technological advances of deep learning, has gained
prevalence in natural language processing or computer vision. Since medical imaging bear …

Directional convergence and alignment in deep learning

Z Ji, M Telgarsky - Advances in Neural Information …, 2020 - proceedings.neurips.cc
In this paper, we show that although the minimizers of cross-entropy and related
classification losses are off at infinity, network weights learned by gradient flow converge in …

Towards understanding sharpness-aware minimization

M Andriushchenko… - … Conference on Machine …, 2022 - proceedings.mlr.press
Abstract Sharpness-Aware Minimization (SAM) is a recent training method that relies on
worst-case weight perturbations which significantly improves generalization in various …

Swad: Domain generalization by seeking flat minima

J Cha, S Chun, K Lee, HC Cho… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Domain generalization (DG) methods aim to achieve generalizability to an unseen
target domain by using only training data from the source domains. Although a variety of DG …

Effect of data encoding on the expressive power of variational quantum-machine-learning models

M Schuld, R Sweke, JJ Meyer - Physical Review A, 2021 - APS
Quantum computers can be used for supervised learning by treating parametrized quantum
circuits as models that map data inputs to predictions. While a lot of work has been done to …

What is being transferred in transfer learning?

B Neyshabur, H Sedghi… - Advances in neural …, 2020 - proceedings.neurips.cc
One desired capability for machines is the ability to transfer their understanding of one
domain to another domain where data is (usually) scarce. Despite ample adaptation of …

Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks

J Kwon, J Kim, H Park, IK Choi - International Conference on …, 2021 - proceedings.mlr.press
Recently, learning algorithms motivated from sharpness of loss surface as an effective
measure of generalization gap have shown state-of-the-art performances. Nevertheless …

Understanding gradient descent on the edge of stability in deep learning

S Arora, Z Li, A Panigrahi - International Conference on …, 2022 - proceedings.mlr.press
Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …

What neural networks memorize and why: Discovering the long tail via influence estimation

V Feldman, C Zhang - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Deep learning algorithms are well-known to have a propensity for fitting the training data
very well and often fit even outliers and mislabeled data points. Such fitting requires …

Bayesian deep learning and a probabilistic perspective of generalization

AG Wilson, P Izmailov - Advances in neural information …, 2020 - proceedings.neurips.cc
The key distinguishing property of a Bayesian approach is marginalization, rather than using
a single setting of weights. Bayesian marginalization can particularly improve the accuracy …