Robust fine-tuning of zero-shot models

M Wortsman, G Ilharco, JW Kim, M Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of
data distributions when performing zero-shot inference (ie, without fine-tuning on a specific …

Fusing finetuned models for better pretraining

L Choshen, E Venezian, N Slonim, Y Katz - arXiv preprint arXiv …, 2022 - arxiv.org
Pretrained models are the standard starting point for training. This approach consistently
outperforms the use of a random initialization. However, pretraining is a costly endeavour …

Proving linear mode connectivity of neural networks via optimal transport

D Ferbach, B Goujaud, G Gidel… - International …, 2024 - proceedings.mlr.press
The energy landscape of high-dimensional non-convex optimization problems is crucial to
understanding the effectiveness of modern deep neural network architectures. Recent works …

Model zoos: A dataset of diverse populations of neural network models

K Schürholt, D Taskiran, B Knyazev… - Advances in …, 2022 - proceedings.neurips.cc
In the last years, neural networks (NN) have evolved from laboratory environments to the
state-of-the-art for many real-world problems. It was shown that NN models (ie, their weights …

What can linear interpolation of neural network loss landscapes tell us?

TJ Vlaar, J Frankle - International Conference on Machine …, 2022 - proceedings.mlr.press
Studying neural network loss landscapes provides insights into the nature of the underlying
optimization problems. Unfortunately, loss landscapes are notoriously difficult to visualize in …

The empirical impact of neural parameter symmetries, or lack thereof

D Lim, TM Putterman, R Walters, H Maron… - arXiv preprint arXiv …, 2024 - arxiv.org
Many algorithms and observed phenomena in deep learning appear to be affected by
parameter symmetries--transformations of neural network parameters that do not change the …

End-to-end bias mitigation: Removing gender bias in deep learning

T Feldman, A Peake - arXiv preprint arXiv:2104.02532, 2021 - arxiv.org
Machine Learning models have been deployed across many different aspects of society,
often in situations that affect social welfare. Although these models offer streamlined …

Merging by matching models in task subspaces

D Tam, M Bansal, C Raffel - arXiv preprint arXiv:2312.04339, 2023 - arxiv.org
Model merging aims to cheaply combine individual task-specific models into a single
multitask model. In this work, we view past merging methods as leveraging different notions …

Robustness of edited neural networks

D Brown, C Godfrey, C Nizinski, J Tu… - ICLR 2023 Workshop on …, 2023 - openreview.net
Successful deployment in uncertain, real-world environments requires that deep learning
models can be efficiently and reliably modified in order to adapt to unexpected issues …

Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent

S Vaswani, B Dubois-Taine… - … on machine learning, 2022 - proceedings.mlr.press
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^ 2$ in
the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth …