[PDF][PDF] Evaluation metrics for generation

S Bangalore, O Rambow… - … Language Generation, 2000 - aclanthology.org
evaluation metrics that we are using during the development of FERGUS (Flexible
Empiricist/Rationalist Generation … The metrics are useful to us as relative quantitative assessments …

BLEURT: Learning robust metrics for text generation

T Sellam, D Das, AP Parikh - arXiv preprint arXiv:2004.04696, 2020 - arxiv.org
… English and evaluate it under different generalization regimes… all recent years of the WMT
Metrics Shared task (2017 to 2019, … We evaluate the agreement between the automatic metrics

A survey of evaluation metrics used for NLG systems

AB Sai, AK Mohankumar, MM Khapra - ACM Computing Surveys (CSUR …, 2022 - dl.acm.org
… We also provide examples containing an input coupled with correct and incorrect generations
and show that distinguishing them is a nuanced task requiring knowledge about the …

An empirical study on evaluation metrics of generative adversarial networks

Q Xu, G Huang, Y Yuan, C Guo, Y Sun, F Wu… - arXiv preprint arXiv …, 2018 - arxiv.org
… sample-based evaluation metrics for GANs, and address the problem of how to evaluate
the evaluation metrics. We start with a few necessary conditions for metrics to produce …

Curious case of language generation evaluation metrics: A cautionary tale

O Caglayan, P Madhyastha, L Specia - arXiv preprint arXiv:2010.13588, 2020 - arxiv.org
generation evaluation metrics exhibit counter-intuitive behaviour. Although the main goal in
language generation tasks … in § 3.1 shows that metrics have a preference towards machine-…

[HTML][HTML] Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

N Mulla, P Gharpure - Progress in Artificial Intelligence, 2023 - Springer
generation techniques broadly based on three different use-cases. We analyze the datasets
and metrics used in question generation for … the evaluation metrics for automated evaluation

The gem benchmark: Natural language generation, its evaluation and metrics

S Gehrmann, T Adewumi, K Aggarwal… - arXiv preprint arXiv …, 2021 - arxiv.org
… We propose a living benchmark called GEM (Generation, Evaluation, and Metrics) that aims
to … in-depth evaluation of model outputs across human and automatic evaluation that aims to …

[HTML][HTML] Evaluation metrics for conditional image generation

Y Benny, T Galanti, S Benaim, L Wolf - International Journal of Computer …, 2021 - Springer
… two new metrics for evaluating generative models in the class-conditional image generation
… We provide an extensive empirical evaluation, comparing the metrics to their unconditional …

Why we need new evaluation metrics for NLG

J Novikova, O Dušek, AC Curry, V Rieser - arXiv preprint arXiv …, 2017 - arxiv.org
… We consider the following crowdsourced datasets, which target utterance generation for
spoken dialogue systems. Table 1 shows the number of system outputs for each dataset. Each …

Towards explainable evaluation metrics for natural language generation

C Leiter, P Lertvittayakumjorn, M Fomicheva… - arXiv preprint arXiv …, 2022 - arxiv.org
… future approaches to explainable evaluation metrics and their evaluation. We hope that …
evaluation metrics and, mediately, also contribute to better and more transparent text generation