Transfer learning approaches for building cross-language dense retrieval models

X Zhang, N Thakur, O Ogundepo… - Transactions of the …, 2023 - direct.mit.edu

MIRACL is a multilingual dataset for ad hoc retrieval across 18 languages that collectively
encompass over three billion native speakers around the world. This resource is designed to …

被引用次数：60 相关文章所有 5 个版本

[PDF] arxiv.org

WikiChat: Stopping the hallucination of large language model chatbots by few-shot grounding on Wikipedia

SJ Semnani, VZ Yao, HC Zhang, MS Lam - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents the first few-shot LLM-based chatbot that almost never hallucinates and
has high conversationality and low latency. WikiChat is grounded on the English Wikipedia …

被引用次数：62 相关文章所有 4 个版本

[PDF] arxiv.org

Making a miracl: Multilingual information retrieval across a continuum of languages

X Zhang, N Thakur, O Ogundepo, E Kamalloo… - arXiv preprint arXiv …, 2022 - arxiv.org

MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is a
multilingual dataset we have built for the WSDM 2023 Cup challenge that focuses on ad hoc …

被引用次数：50 相关文章所有 2 个版本

[PDF] arxiv.org

Cross-language information retrieval

P Galuščáková, DW Oard, S Nair - arXiv preprint arXiv:2111.05988, 2021 - arxiv.org

Two key assumptions shape the usual view of ranked retrieval:(1) that the searcher can
choose words for their query that might appear in the documents that they wish to see, and …

被引用次数：16 相关文章所有 7 个版本

[PDF] acm.org

Improving cross-lingual information retrieval on low-resource languages via optimal transport distillation

Z Huang, P Yu, J Allan - Proceedings of the Sixteenth ACM International …, 2023 - dl.acm.org

Benefiting from transformer-based pre-trained language models, neural ranking models
have made significant progress. More recently, the advent of multilingual pre-trained …

被引用次数：30 相关文章所有 3 个版本

[PDF] arxiv.org

C3: Continued pretraining with contrastive weak supervision for cross language ad-hoc retrieval

E Yang, S Nair, R Chandradevan… - Proceedings of the 45th …, 2022 - dl.acm.org

Pretrained language models have improved effectiveness on numerous tasks, including ad-
hoc retrieval. Recent work has shown that continuing to pretrain a language model with …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

HC4: A new suite of test collections for ad hoc CLIR

D Lawrie, J Mayfield, DW Oard, E Yang - European Conference on …, 2022 - Springer

HC4 is a new suite of test collections for ad hoc Cross-Language Information Retrieval
(CLIR), with Common Crawl News documents in Chinese, Persian, and Russian, topics in …

被引用次数：33 相关文章所有 4 个版本

[PDF] springer.com

Semantic matching based legal information retrieval system for COVID-19 pandemic

J Zhu, J Wu, X Luo, J Liu - Artificial intelligence and law, 2024 - Springer

Recently, the pandemic caused by COVID-19 is severe in the entire world. The prevention
and control of crimes associated with COVID-19 are critical for controlling the pandemic …

被引用次数：16 相关文章所有 7 个版本

[PDF] acm.org

BLADE: combining vocabulary pruning and intermediate pretraining for scaleable neural CLIR

S Nair, E Yang, D Lawrie, J Mayfield… - Proceedings of the 46th …, 2023 - dl.acm.org

Learning sparse representations using pretrained language models enhances the
monolingual ranking effectiveness. Such representations are sparse vectors in the …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

An experimental study on pretraining transformers from scratch for IR

C Lassance, H Déjean, S Clinchant - European Conference on …, 2023 - Springer

Abstract Finetuning Pretrained Language Models (PLM) for IR has been de facto the
standard practice since their breakthrough effectiveness few years ago. But, is this approach …

被引用次数：14 相关文章所有 4 个版本

高级搜索

QQ 群