A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

H2o: Heavy-hitter oracle for efficient generative inference of large language models

Z Zhang, Y Sheng, T Zhou, T Chen… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …

Longlora: Efficient fine-tuning of long-context large language models

Y Chen, S Qian, H Tang, X Lai, Z Liu, S Han… - arXiv preprint arXiv …, 2023 - arxiv.org
We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of
pre-trained large language models (LLMs), with limited computation cost. Typically, training …

Advancing transformer architecture in long-context large language models: A comprehensive survey

Y Huang, J Xu, Z Jiang, J Lai, Z Li, Y Yao… - arXiv preprint arXiv …, 2023 - arxiv.org
With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …

Yarn: Efficient context window extension of large language models

B Peng, J Quesnelle, H Fan, E Shippole - arXiv preprint arXiv:2309.00071, 2023 - arxiv.org
Rotary Position Embeddings (RoPE) have been shown to effectively encode positional
information in transformer-based language models. However, these models fail to …

Eschernet: A generative model for scalable view synthesis

X Kong, S Liu, X Lyu, M Taher, X Qi… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce EscherNet a multi-view conditioned diffusion model for view synthesis.
EscherNet learns implicit and generative 3D representations coupled with a specialised …

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression

H Jiang, Q Wu, X Luo, D Li, CY Lin, Y Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
In long context scenarios, large language models (LLMs) face three main challenges: higher
computational/financial cost, longer latency, and inferior performance. Some studies reveal …

Llm maybe longlm: Self-extend llm context window without tuning

H Jin, X Han, J Yang, Z Jiang, Z Liu, CY Chang… - arXiv preprint arXiv …, 2024 - arxiv.org
This work elicits LLMs' inherent ability to handle long contexts without fine-tuning. The
limited length of the training sequence during training may limit the application of Large …

Hyperattention: Long-context attention in near-linear time

I Han, R Jayaram, A Karbasi, V Mirrokni… - arXiv preprint arXiv …, 2023 - arxiv.org
We present an approximate attention mechanism named HyperAttention to address the
computational challenges posed by the growing complexity of long contexts used in Large …