- 学术资源搜索

Mechanistic Interpretability for AI Safety--A Review

L Bereska, E Gavves - arXiv preprint arXiv:2404.14082, 2024 - arxiv.org

Understanding AI systems' inner workings is critical for ensuring value alignment and safety.
This review explores mechanistic interpretability: reverse-engineering the computational …

被引用次数：65 相关文章所有 2 个版本

[PDF] arxiv.org

Beyond a*: Better planning with transformers via search dynamics bootstrapping

L Lehnert, S Sukhbaatar, DJ Su, Q Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org

While Transformers have enabled tremendous progress in various application settings, such
architectures still trail behind traditional symbolic planners for solving complex decision …

被引用次数：26 相关文章所有 3 个版本

[PDF] arxiv.org

On logical extrapolation for mazes with recurrent and implicit networks

B Knutson, AC Rabeendran, M Ivanitskiy… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent work has suggested that certain neural network architectures-particularly recurrent
neural networks (RNNs) and implicit neural networks (INNs) are capable of logical …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Transformers Can Navigate Mazes With Multi-Step Prediction

N Nolte, O Kitouni, A Williams, M Rabbat… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite their remarkable success in language modeling, transformers trained to predict the
next token in a sequence struggle with long-term planning. This limitation is particularly …

Linearly Structured World Representations in Maze-Solving Transformers

M Ivanitskiy, AF Spies, T Räuker… - … of UniReps: the …, 2024 - proceedings.mlr.press

The emergence of seemingly similar representations across tasks and neural architectures
suggests that convergent properties may underlie sophisticated behavior. One form of …

被引用次数：3 相关文章所有 3 个版本

[PDF] openreview.net

Planning behavior in a recurrent neural network that plays Sokoban

A Garriga-Alonso, M Taufeeque… - ICML 2024 Workshop on …, 2024 - openreview.net

To predict how advanced neural networks generalize to novel situations, it is essential to
understand how they reason. Guez et al.(2019," An investigation of model-free planning") …

高级搜索

QQ 群

Mechanistic Interpretability for AI Safety--A Review

Beyond a*: Better planning with transformers via search dynamics bootstrapping

On logical extrapolation for mazes with recurrent and implicit networks

Transformers Can Navigate Mazes With Multi-Step Prediction

Linearly Structured World Representations in Maze-Solving Transformers

Planning behavior in a recurrent neural network that plays Sokoban

引用