Semantic models for the first-stage retrieval: A comprehensive review

J Guo, Y Cai, Y Fan, F Sun, R Zhang… - ACM Transactions on …, 2022 - dl.acm.org
Multi-stage ranking pipelines have been a practical solution in modern search systems,
where the first-stage retrieval is to return a subset of candidate documents and latter stages …

Colbertv2: Effective and efficient retrieval via lightweight late interaction

K Santhanam, O Khattab, J Saad-Falcon… - arXiv preprint arXiv …, 2021 - arxiv.org
Neural information retrieval (IR) has greatly advanced search and other knowledge-
intensive language tasks. While many neural IR methods encode queries and documents …

Autoregressive search engines: Generating substrings as document identifiers

M Bevilacqua, G Ottaviano, P Lewis… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Knowledge-intensive language tasks require NLP systems to both provide the
correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive …

Dense text retrieval based on pretrained language models: A survey

WX Zhao, J Liu, R Ren, JR Wen - ACM Transactions on Information …, 2024 - dl.acm.org
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …

[图书][B] Pretrained transformers for text ranking: Bert and beyond

J Lin, R Nogueira, A Yates - 2022 - books.google.com
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in
response to a query. Although the most common formulation of text ranking is search …

From distillation to hard negative sampling: Making sparse neural ir models more effective

T Formal, C Lassance, B Piwowarski… - Proceedings of the 45th …, 2022 - dl.acm.org
Neural retrievers based on dense representations combined with Approximate Nearest
Neighbors search have recently received a lot of attention, owing their success to distillation …

Query performance prediction for neural IR: Are we there yet?

G Faggioli, T Formal, S Marchesin, S Clinchant… - … on Information Retrieval, 2023 - Springer
Abstract Evaluation in Information Retrieval (IR) relies on post-hoc empirical procedures,
which are time-consuming and expensive operations. To alleviate this, Query Performance …

Query2doc: Query expansion with large language models

L Wang, N Yang, F Wei - arXiv preprint arXiv:2303.07678, 2023 - arxiv.org
This paper introduces a simple yet effective query expansion approach, denoted as
query2doc, to improve both sparse and dense retrieval systems. The proposed method first …

SPLADE v2: Sparse lexical and expansion model for information retrieval

T Formal, C Lassance, B Piwowarski… - arXiv preprint arXiv …, 2021 - arxiv.org
In neural Information Retrieval (IR), ongoing research is directed towards improving the first
retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using …

[HTML][HTML] Miracl: A multilingual retrieval dataset covering 18 diverse languages

X Zhang, N Thakur, O Ogundepo… - Transactions of the …, 2023 - direct.mit.edu
MIRACL is a multilingual dataset for ad hoc retrieval across 18 languages that collectively
encompass over three billion native speakers around the world. This resource is designed to …