Pretrained models are the standard starting point for training. This approach consistently outperforms the use of a random initialization. However, pretraining is a costly endeavour …
The energy landscape of high-dimensional non-convex optimization problems is crucial to understanding the effectiveness of modern deep neural network architectures. Recent works …
In the last years, neural networks (NN) have evolved from laboratory environments to the state-of-the-art for many real-world problems. It was shown that NN models (ie, their weights …
TJ Vlaar, J Frankle - International Conference on Machine …, 2022 - proceedings.mlr.press
Studying neural network loss landscapes provides insights into the nature of the underlying optimization problems. Unfortunately, loss landscapes are notoriously difficult to visualize in …
Many algorithms and observed phenomena in deep learning appear to be affected by parameter symmetries--transformations of neural network parameters that do not change the …
T Feldman, A Peake - arXiv preprint arXiv:2104.02532, 2021 - arxiv.org
Machine Learning models have been deployed across many different aspects of society, often in situations that affect social welfare. Although these models offer streamlined …
Model merging aims to cheaply combine individual task-specific models into a single multitask model. In this work, we view past merging methods as leveraging different notions …
Successful deployment in uncertain, real-world environments requires that deep learning models can be efficiently and reliably modified in order to adapt to unexpected issues …
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^ 2$ in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth …