Resurrecting recurrent neural networks for long sequences A Orvieto, SL Smith, A Gu, A Fernando, C Gulcehre, R Pascanu, S De arXiv preprint arXiv:2303.06349, 2023 | 158 | 2023 |
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models S De, SL Smith, A Fernando, A Botev, G Cristian-Muraru, A Gu, R Haroun, ... arXiv preprint arXiv:2402.19427, 2024 | 34 | 2024 |