Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu - arXiv preprint arXiv …, 2024 - arxiv.org
Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we …
The development of large language models (LLMs) has revolutionized automated code generation. However, their high demand of computation resources has hindered a broader …
M Wang - arXiv preprint arXiv:2402.00522, 2024 - arxiv.org
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms …