Scaling transformer to 1m tokens and beyond with rmt

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org

With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …

被引用次数：35 相关文章所有 2 个版本

[PDF] arxiv.org

Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arXiv preprint arXiv:2312.00752, 2023 - arxiv.org

Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

被引用次数：1933 相关文章所有 7 个版本

[PDF] arxiv.org

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

被引用次数：429 相关文章所有 9 个版本

[HTML] nih.gov

ChatGPT chemistry assistant for text mining and the prediction of MOF synthesis

Z Zheng, O Zhang, C Borgs, JT Chayes… - Journal of the …, 2023 - ACS Publications

We use prompt engineering to guide ChatGPT in the automation of text mining of metal–
organic framework (MOF) synthesis conditions from diverse formats and styles of the …

被引用次数：221 相关文章所有 10 个版本

[PDF] arxiv.org

RULER: What's the Real Context Size of Your Long-Context Language Models?

CP Hsieh, S Sun, S Kriman, S Acharya… - arXiv preprint arXiv …, 2024 - arxiv.org

The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of
information (the" needle") from long distractor texts (the" haystack"), has been widely …

被引用次数：97 相关文章所有 2 个版本

[PDF] arxiv.org

Longnet: Scaling transformers to 1,000,000,000 tokens

J Ding, S Ma, L Dong, X Zhang, S Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

Scaling sequence length has become a critical demand in the era of large language models.
However, existing methods struggle with either computational complexity or model …

被引用次数：147 相关文章所有 3 个版本

[PDF] oup.com Full View

<? sty\usepackage {wasysym}?> Bilingual language model for protein sequence and structure

M Heinzinger, K Weissenow… - NAR Genomics and …, 2024 - academic.oup.com

Adapting language models to protein sequences spawned the development of powerful
protein language models (pLMs). Concurrently, AlphaFold2 broke through in protein …

被引用次数：74 相关文章所有 2 个版本

[PDF] arxiv.org

In-context autoencoder for context compression in a large language model

T Ge, J Hu, L Wang, X Wang, SQ Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

We propose the In-context Autoencoder (ICAE), leveraging the power of a large language
model (LLM) to compress a long context into short compact memory slots that can be directly …

被引用次数：96 相关文章所有 3 个版本

[PDF] acm.org

Resource-efficient algorithms and systems of foundation models: A survey

M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024 - dl.acm.org

Large foundation models, including large language models, vision transformers, diffusion,
and LLM-based multimodal models, are revolutionizing the entire machine learning …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Language modeling is compression

G Delétang, A Ruoss, PA Duquenne, E Catt… - arXiv preprint arXiv …, 2023 - arxiv.org

It has long been established that predictive models can be transformed into lossless
compressors and vice versa. Incidentally, in recent years, the machine learning community …

被引用次数：133 相关文章所有 3 个版本

高级搜索

QQ 群