Neural approaches to conversational AI

J Gao, M Galley, L Li - The 41st international ACM SIGIR conference on …, 2018 - dl.acm.org
This tutorial surveys neural approaches to conversational AI that were developed in the last
few years. We group conversational systems into three categories:(1) question answering …

Challenges in building intelligent open-domain dialog systems

M Huang, X Zhu, J Gao - ACM Transactions on Information Systems …, 2020 - dl.acm.org
There is a resurgent interest in developing intelligent open-domain dialog systems due to
the availability of large amounts of conversational data and the recent progress on neural …

Zero-shot nlg evaluation through pairware comparisons with llms

A Liusie, P Manakul, MJF Gales - arXiv preprint arXiv:2307.07889, 2023 - arxiv.org
Current developments in large language models (LLMs) have enabled impressive zero-shot
capabilities across various natural language tasks. An interesting application of these …

Statistical machine translation

A Lopez - ACM Computing Surveys (CSUR), 2008 - dl.acm.org
Statistical machine translation (SMT) treats the translation of natural language as a machine
learning problem. By examining many samples of human-produced translation, SMT …

How to evaluate machine translation: A review of automated and human metrics

E Chatzikoumi - Natural Language Engineering, 2020 - cambridge.org
This article presents the most up-to-date, influential automated, semiautomated and human
metrics used to evaluate the quality of machine translation (MT) output and provides the …

Automatic text evaluation through the lens of Wasserstein barycenters

P Colombo, G Staerman, C Clavel… - arXiv preprint arXiv …, 2021 - arxiv.org
A new metric\texttt {BaryScore} to evaluate text generation based on deep contextualized
embeddings eg, BERT, Roberta, ELMo) is introduced. This metric is motivated by a new …

CLAIR: Evaluating image captions with large language models

D Chan, S Petryk, JE Gonzalez, T Darrell… - arXiv preprint arXiv …, 2023 - arxiv.org
The evaluation of machine-generated image captions poses an interesting yet persistent
challenge. Effective evaluation measures must consider numerous dimensions of similarity …

Ter-plus: paraphrase, semantic, and alignment enhancements to translation edit rate

MG Snover, N Madnani, B Dorr, R Schwartz - Machine Translation, 2009 - Springer
This paper describes a new evaluation metric, TER-Plus (TERp) for automatic evaluation of
machine translation (MT). TERp is an extension of Translation Edit Rate (TER). It builds on …

Personalized impression generation for PET reports using large language models

X Tie, M Shin, A Pirasteh, N Ibrahim… - Journal of Imaging …, 2024 - Springer
Large language models (LLMs) have shown promise in accelerating radiology reporting by
summarizing clinical findings into impressions. However, automatic impression generation …

Linguistic measures for automatic machine translation evaluation

J Giménez, L Màrquez - Machine Translation, 2010 - Springer
Assessing the quality of candidate translations involves diverse linguistic facets. However,
most automatic evaluation methods in use today rely on limited quality assumptions, such as …