MTEB: Massive text embedding benchmark

A Albalak, Y Elazar, SM Xie, S Longpre… - arXiv preprint arXiv …, 2024 - arxiv.org

A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …

被引用次数：67 相关文章所有 2 个版本

[PDF] acm.org

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2023 - dl.acm.org

The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …

被引用次数：704 相关文章所有 2 个版本

[PDF] arxiv.org

C-pack: Packed resources for general chinese embeddings

S Xiao, Z Liu, P Zhang, N Muennighoff, D Lian… - Proceedings of the 47th …, 2024 - dl.acm.org

We introduce C-Pack, a package of resources that significantly advances the field of general
text embeddings for Chinese. C-Pack includes three critical resources. 1) C-MTP is a …

被引用次数：367 相关文章所有 2 个版本

[PDF] hal.science

Bloom: A 176b-parameter open-access multilingual language model

T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow… - 2023 - inria.hal.science

Large language models (LLMs) have been shown to be able to perform new tasks based on
a few demonstrations or natural language instructions. While these capabilities have led to …

被引用次数：1667 相关文章所有 16 个版本

[PDF] neurips.cc

Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2023 - proceedings.neurips.cc

The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

被引用次数：209 相关文章所有 7 个版本

[PDF] arxiv.org

Large language models for information retrieval: A survey

Y Zhu, H Yuan, S Wang, J Liu, W Liu, C Deng… - arXiv preprint arXiv …, 2023 - arxiv.org

As a primary means of information acquisition, information retrieval (IR) systems, such as
search engines, have integrated themselves into our daily lives. These systems also serve …

被引用次数：261 相关文章所有 3 个版本

[PDF] arxiv.org

Crosslingual generalization through multitask finetuning

N Muennighoff, T Wang, L Sutawika, A Roberts… - arXiv preprint arXiv …, 2022 - arxiv.org

Multitask prompted finetuning (MTF) has been shown to help large language models
generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused …

被引用次数：645 相关文章所有 5 个版本

[PDF] arxiv.org

Exploring the potential of large language models (llms) in learning on graphs

Z Chen, H Mao, H Li, W Jin, H Wen, X Wei… - ACM SIGKDD …, 2024 - dl.acm.org

Learning on Graphs has attracted immense attention due to its wide real-world applications.
The most popular pipeline for learning on graphs with textual node attributes primarily relies …

被引用次数：252 相关文章所有 9 个版本

[PDF] arxiv.org

Text embeddings by weakly-supervised contrastive pre-training

L Wang, N Yang, X Huang, B Jiao, L Yang… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a
wide range of tasks. The model is trained in a contrastive manner with weak supervision …

被引用次数：381 相关文章所有 2 个版本

[PDF] arxiv.org

One embedder, any task: Instruction-finetuned text embeddings

H Su, W Shi, J Kasai, Y Wang, Y Hu… - arXiv preprint arXiv …, 2022 - arxiv.org

We introduce INSTRUCTOR, a new method for computing text embeddings given task
instructions: every text input is embedded together with instructions explaining the use case …

被引用次数：233 相关文章所有 4 个版本

高级搜索

QQ 群