[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4

KS Kalyan - Natural Language Processing Journal, 2023 - Elsevier
Large language models (LLMs) are a special class of pretrained language models (PLMs)
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …

Evaluating large language models: A comprehensive survey

Z Guo, R Jin, C Liu, Y Huang, D Shi, L Yu, Y Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities across a broad
spectrum of tasks. They have attracted significant attention and been deployed in numerous …

Efficient streaming language models with attention sinks

G Xiao, Y Tian, B Chen, S Han, M Lewis - arXiv preprint arXiv:2309.17453, 2023 - arxiv.org
Deploying Large Language Models (LLMs) in streaming applications such as multi-round
dialogue, where long interactions are expected, is urgently needed but poses two major …

Efficient and effective text encoding for chinese llama and alpaca

Y Cui, Z Yang, X Yao - arXiv preprint arXiv:2304.08177, 2023 - arxiv.org
Large Language Models (LLMs), such as ChatGPT and GPT-4, have dramatically
transformed natural language processing research and shown promising strides towards …

Benchmarking foundation models with language-model-as-an-examiner

Y Bai, J Ying, Y Cao, X Lv, Y He… - Advances in …, 2024 - proceedings.neurips.cc
Numerous benchmarks have been established to assess the performance of foundation
models on open-ended question answering, which serves as a comprehensive test of a …

Retrieval meets long context large language models

P Xu, W Ping, X Wu, L McAfee, C Zhu, Z Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Extending the context window of large language models (LLMs) is getting popular recently,
while the solution of augmenting LLMs with retrieval has existed for years. The natural …

Lm-infinite: Simple on-the-fly length generalization for large language models

C Han, Q Wang, W Xiong, Y Chen, H Ji… - arXiv preprint arXiv …, 2023 - arxiv.org
In recent years, there have been remarkable advancements in the performance of
Transformer-based Large Language Models (LLMs) across various domains. As these LLMs …

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression

H Jiang, Q Wu, X Luo, D Li, CY Lin, Y Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
In long context scenarios, large language models (LLMs) face three main challenges: higher
computational/financial cost, longer latency, and inferior performance. Some studies reveal …

Llm maybe longlm: Self-extend llm context window without tuning

H Jin, X Han, J Yang, Z Jiang, Z Liu, CY Chang… - arXiv preprint arXiv …, 2024 - arxiv.org
This work elicits LLMs' inherent ability to handle long contexts without fine-tuning. The
limited length of the training sequence during training may limit the application of Large …

Chatgpt's one-year anniversary: are open-source large language models catching up?

H Chen, F Jiao, X Li, C Qin, M Ravaut, R Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org
Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of
AI, both in research and commerce. Through instruction-tuning a large language model …