J Kim, T Suzuki - arXiv preprint arXiv:2410.08633, 2024 - arxiv.org
This work provides the first theoretical analysis of training transformers to solve complex
problems by recursively generating intermediate states, analogous to fine-tuning for chain-of …