Not all errors are equal: Learning text generation metrics using stratified error synthesis

W Xu, D Wang, L Pan, Z Song, M Freitag… - arXiv preprint arXiv …, 2023 - arxiv.org

Automatically evaluating the quality of language generation is critical. Although recent
learned metrics show high correlation with human judgement, these metrics can not explain …

被引用次数：57 相关文章所有 10 个版本

[PDF] arxiv.org

Multilingual machine translation with large language models: Empirical results and analysis

W Zhu, H Liu, Q Dong, J Xu, S Huang, L Kong… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated remarkable potential in handling
multilingual machine translation (MMT). In this paper, we systematically investigate the …

被引用次数：51 相关文章所有 4 个版本

[PDF] arxiv.org

Toward human-like evaluation for natural language generation with error analysis

Q Lu, L Ding, L Xie, K Zhang, DF Wong… - arXiv preprint arXiv …, 2022 - arxiv.org

The state-of-the-art language model-based automatic metrics, eg BARTScore, benefiting
from large-scale contextualized pre-training, have been successfully used in a wide range of …

被引用次数：26 相关文章所有 5 个版本

[PDF] thecvf.com

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Y Wada, K Kaneda, D Saito… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Establishing an automatic evaluation metric that closely aligns with human judgments is
essential for effectively developing image captioning models. Recent data-driven metrics …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

SESCORE2: Learning text generation evaluation via synthesizing realistic mistakes

W Xu, X Qian, M Wang, L Li, WY Wang - arXiv preprint arXiv:2212.09305, 2022 - arxiv.org

Is it possible to train a general metric for evaluating text generation quality without human
annotated ratings? Existing learned metrics either perform unsatisfactorily across text …

被引用次数：9 相关文章所有 8 个版本

[PDF] arxiv.org

Multilingual conceptual coverage in text-to-image models

M Saxon, WY Wang - arXiv preprint arXiv:2306.01735, 2023 - arxiv.org

We propose" Conceptual Coverage Across Languages"(CoCo-CroLa), a technique for
benchmarking the degree to which any generative text-to-image system provides …

被引用次数：11 相关文章所有 8 个版本

[PDF] arxiv.org

Towards fine-grained information: Identifying the type and location of translation errors

K Bao, Y Wan, D Liu, B Yang, W Lei, X He… - arXiv preprint arXiv …, 2023 - arxiv.org

Fine-grained information on translation errors is helpful for the translation evaluation
community. Existing approaches can not synchronously consider error position and type …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models

Q Lu, B Qiu, L Ding, K Zhang, T Kocmi… - arXiv preprint arXiv …, 2023 - arxiv.org

Generative large language models (LLMs), eg, ChatGPT, have demonstrated remarkable
proficiency across several NLP tasks, such as machine translation, text summarization …

被引用次数：2 相关文章所有 2 个版本

[PDF] whiterose.ac.uk

Do origin and facts identify automatically generated text?

J Preiss, ML Paramita - Proceedings of the Iberian …, 2023 - eprints.whiterose.ac.uk

We present a proof of concept investigating whether native language identification and fact
checking information improves a language model (GPT-2) classifier which determines …

被引用次数：1 相关文章所有 2 个版本

Detecting Fine-Grained Semantic Divergences to Improve Translation Understanding Across Languages

E Briakou - 2023 - search.proquest.com

One of the core goals of Natural Language Processing (NLP) is to develop computational
representations and methods to compare and contrast text meaning across languages. Such …

高级搜索

QQ 群