Blanc: Learning evaluation metrics for mt

J Gao, M Galley, L Li - The 41st international ACM SIGIR conference on …, 2018 - dl.acm.org

This tutorial surveys neural approaches to conversational AI that were developed in the last
few years. We group conversational systems into three categories:(1) question answering …

被引用次数：824 相关文章所有 16 个版本

[PDF] arxiv.org

Challenges in building intelligent open-domain dialog systems

M Huang, X Zhu, J Gao - ACM Transactions on Information Systems …, 2020 - dl.acm.org

There is a resurgent interest in developing intelligent open-domain dialog systems due to
the availability of large amounts of conversational data and the recent progress on neural …

被引用次数：332 相关文章所有 4 个版本

[PDF] arxiv.org

Zero-shot nlg evaluation through pairware comparisons with llms

A Liusie, P Manakul, MJF Gales - arXiv preprint arXiv:2307.07889, 2023 - arxiv.org

Current developments in large language models (LLMs) have enabled impressive zero-shot
capabilities across various natural language tasks. An interesting application of these …

被引用次数：25 相关文章所有 5 个版本

[PDF] academia.edu

Statistical machine translation

A Lopez - ACM Computing Surveys (CSUR), 2008 - dl.acm.org

Statistical machine translation (SMT) treats the translation of natural language as a machine
learning problem. By examining many samples of human-produced translation, SMT …

被引用次数：697 相关文章所有 10 个版本

How to evaluate machine translation: A review of automated and human metrics

E Chatzikoumi - Natural Language Engineering, 2020 - cambridge.org

This article presents the most up-to-date, influential automated, semiautomated and human
metrics used to evaluate the quality of machine translation (MT) output and provides the …

被引用次数：99 相关文章所有 4 个版本

[PDF] arxiv.org

Automatic text evaluation through the lens of Wasserstein barycenters

P Colombo, G Staerman, C Clavel… - arXiv preprint arXiv …, 2021 - arxiv.org

A new metric\texttt {BaryScore} to evaluate text generation based on deep contextualized
embeddings eg, BERT, Roberta, ELMo) is introduced. This metric is motivated by a new …

被引用次数：49 相关文章所有 12 个版本

[PDF] arxiv.org

CLAIR: Evaluating image captions with large language models

D Chan, S Petryk, JE Gonzalez, T Darrell… - arXiv preprint arXiv …, 2023 - arxiv.org

The evaluation of machine-generated image captions poses an interesting yet persistent
challenge. Effective evaluation measures must consider numerous dimensions of similarity …

被引用次数：9 相关文章所有 7 个版本

[PDF] academia.edu

Ter-plus: paraphrase, semantic, and alignment enhancements to translation edit rate

MG Snover, N Madnani, B Dorr, R Schwartz - Machine Translation, 2009 - Springer

This paper describes a new evaluation metric, TER-Plus (TERp) for automatic evaluation of
machine translation (MT). TERp is an extension of Translation Edit Rate (TER). It builds on …

被引用次数：216 相关文章所有 15 个版本

[HTML] nih.gov

Personalized impression generation for PET reports using large language models

X Tie, M Shin, A Pirasteh, N Ibrahim… - Journal of Imaging …, 2024 - Springer

Large language models (LLMs) have shown promise in accelerating radiology reporting by
summarizing clinical findings into impressions. However, automatic impression generation …

被引用次数：3 相关文章所有 5 个版本

[PDF] academia.edu

Linguistic measures for automatic machine translation evaluation

J Giménez, L Màrquez - Machine Translation, 2010 - Springer

Assessing the quality of candidate translations involves diverse linguistic facets. However,
most automatic evaluation methods in use today rely on limited quality assumptions, such as …

被引用次数：63 相关文章所有 10 个版本

高级搜索

QQ 群