INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback

W Xu, D Wang, L Pan, Z Song, M Freitag… - arXiv preprint arXiv …, 2023 - arxiv.org
Automatically evaluating the quality of language generation is critical. Although recent
learned metrics show high correlation with human judgement, these metrics can not explain …

Multilingual machine translation with large language models: Empirical results and analysis

W Zhu, H Liu, Q Dong, J Xu, S Huang, L Kong… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable potential in handling
multilingual machine translation (MMT). In this paper, we systematically investigate the …

Toward human-like evaluation for natural language generation with error analysis

Q Lu, L Ding, L Xie, K Zhang, DF Wong… - arXiv preprint arXiv …, 2022 - arxiv.org
The state-of-the-art language model-based automatic metrics, eg BARTScore, benefiting
from large-scale contextualized pre-training, have been successfully used in a wide range of …

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Y Wada, K Kaneda, D Saito… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Establishing an automatic evaluation metric that closely aligns with human judgments is
essential for effectively developing image captioning models. Recent data-driven metrics …

SESCORE2: Learning text generation evaluation via synthesizing realistic mistakes

W Xu, X Qian, M Wang, L Li, WY Wang - arXiv preprint arXiv:2212.09305, 2022 - arxiv.org
Is it possible to train a general metric for evaluating text generation quality without human
annotated ratings? Existing learned metrics either perform unsatisfactorily across text …

Multilingual conceptual coverage in text-to-image models

M Saxon, WY Wang - arXiv preprint arXiv:2306.01735, 2023 - arxiv.org
We propose" Conceptual Coverage Across Languages"(CoCo-CroLa), a technique for
benchmarking the degree to which any generative text-to-image system provides …

Towards fine-grained information: Identifying the type and location of translation errors

K Bao, Y Wan, D Liu, B Yang, W Lei, X He… - arXiv preprint arXiv …, 2023 - arxiv.org
Fine-grained information on translation errors is helpful for the translation evaluation
community. Existing approaches can not synchronously consider error position and type …

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models

Q Lu, B Qiu, L Ding, K Zhang, T Kocmi… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative large language models (LLMs), eg, ChatGPT, have demonstrated remarkable
proficiency across several NLP tasks, such as machine translation, text summarization …

Do origin and facts identify automatically generated text?

J Preiss, ML Paramita - Proceedings of the Iberian …, 2023 - eprints.whiterose.ac.uk
We present a proof of concept investigating whether native language identification and fact
checking information improves a language model (GPT-2) classifier which determines …

Detecting Fine-Grained Semantic Divergences to Improve Translation Understanding Across Languages

E Briakou - 2023 - search.proquest.com
One of the core goals of Natural Language Processing (NLP) is to develop computational
representations and methods to compare and contrast text meaning across languages. Such …