Transformers learn to implement preconditioned gradient descent for in-context learning

K Ahn, X Cheng, H Daneshmand… - Advances in Neural …, 2023 - proceedings.neurips.cc
Several recent works demonstrate that transformers can implement algorithms like gradient
descent. By a careful construction of weights, these works show that multiple layers of …

[HTML][HTML] Going in circles is the way forward: the role of recurrence in visual inference

RS van Bergen, N Kriegeskorte - Current Opinion in Neurobiology, 2020 - Elsevier
Highlights•Neural network models of vision are dominated by feedforward architectures.•
Biological vision, by contrast, exhibits abundant recurrent processing.•The computational …

Representation engineering: A top-down approach to ai transparency

A Zou, L Phan, S Chen, J Campbell, P Guo… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we identify and characterize the emerging area of representation engineering
(RepE), an approach to enhancing the transparency of AI systems that draws on insights …

Eliciting latent predictions from transformers with the tuned lens

N Belrose, Z Furman, L Smith, D Halawi… - arXiv preprint arXiv …, 2023 - arxiv.org
We analyze transformers from the perspective of iterative inference, seeking to understand
how model predictions are refined layer by layer. To do so, we train an affine probe for each …

A disciplined approach to neural network hyper-parameters: Part 1--learning rate, batch size, momentum, and weight decay

LN Smith - arXiv preprint arXiv:1803.09820, 2018 - arxiv.org
Although deep learning has produced dazzling successes for applications of image, speech,
and video processing in the past few years, most trainings are with suboptimal hyper …

Codeslam—learning a compact, optimisable representation for dense visual slam

M Bloesch, J Czarnowski, R Clark… - Proceedings of the …, 2018 - openaccess.thecvf.com
The representation of geometry in real-time 3D perception systems continues to be a critical
research issue. Dense maps capture complete surface shape and can be augmented with …

Architecture matters in continual learning

SI Mirzadeh, A Chaudhry, D Yin, T Nguyen… - arXiv preprint arXiv …, 2022 - arxiv.org
A large body of research in continual learning is devoted to overcoming the catastrophic
forgetting of neural networks by designing new algorithms that are robust to the distribution …

Toward fast and accurate human pose estimation via soft-gated skip connections

A Bulat, J Kossaifi, G Tzimiropoulos… - 2020 15th IEEE …, 2020 - ieeexplore.ieee.org
This paper is on highly accurate and highly efficient human pose estimation. Recent works
based on Fully Convolutional Networks (FCNs) have demonstrated excellent results for this …

Multi-level residual networks from dynamical systems view

B Chang, L Meng, E Haber, F Tung… - arXiv preprint arXiv …, 2017 - arxiv.org
Deep residual networks (ResNets) and their variants are widely used in many computer
vision applications and natural language processing tasks. However, the theoretical …

Brain-like object recognition with high-performing shallow recurrent ANNs

J Kubilius, M Schrimpf, K Kar… - Advances in neural …, 2019 - proceedings.neurips.cc
Deep convolutional artificial neural networks (ANNs) are the leading class of candidate
models of the mechanisms of visual processing in the primate ventral stream. While initially …